Skip to content

Commit 5e611eb

Browse files
committed
Resolve multiple concurrency issues
## Race condition in pathing jar manifest creation A race condition exists when setting up the classpath during container launch. During container launch using samza-yarn, run-class.sh creates a pathing jar file (which holds the classpath for the container launch). However, during the creation of this pathing jar, temporary files, as well as the pathing jar itself is not placed in a location unique to the container. This results in multiple containers writing to the same pathing jar location and temporary file location, which results in a race condition. This race condition may show up in several ways, such as when Yarn removes jars from a finished container (other containers will point to a classpath which no longer exists) or when multiple run-class.sh scripts attempt to write the manifest.txt or pathing jar at the same time. Note that host affinity being enabled will make this problem worse. The pathing.jar is written to the usercache, so when the container which created the pathing.jar is finished and removed, any new container which launches on that host will point to jar files which do not exist anymore. When host affinity is enabled, it will not move to a new host and just keep failing. ## Container logging directory fallback is not unique for each container The fallback log directory is the same among all containers running on the same host. It should be unique per-container. ## Container tmp dir is not unique per-container The JAVA_TMP_DIR directory is the same for all containers. We should make sure that it's safe to use the same directory for all containers.
1 parent 2ff7e41 commit 5e611eb

1 file changed

Lines changed: 18 additions & 19 deletions

File tree

samza-shell/src/main/bash/run-class.sh

Lines changed: 18 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -28,12 +28,14 @@ cd $base_dir
2828
base_dir=`pwd`
2929
cd $home_dir
3030

31-
# Note: When using samza-yarn, base_dir and pwd here looks something like:
32-
# /<hadoop path>/usercache/<linux account>/appcache/application_1745893616511_0059/container_e64_1745893616511_0059_01_002027/__package
33-
3431
echo "Current time: $(date '+%Y-%m-%d %H:%M:%S')"
3532

33+
# Note: When using samza-yarn, home_dir looks like:
34+
# /<hadoop dir>/usercache/<linux account>/appcache/application_1745893616511_0059/container_e64_1745893616511_0059_01_002027
3635
echo home_dir=$home_dir
36+
37+
# Note: When using samza-yarn, base_dir looks like:
38+
# /<hadoop path>/usercache/<linux account>/appcache/application_1745893616511_0059/container_e64_1745893616511_0059_01_002027/__package
3739
echo "framework base (location of this script). base_dir=$base_dir"
3840

3941
if [ ! -d "$base_dir/lib" ]; then
@@ -82,25 +84,20 @@ fi
8284
# this is helpful is when using container images which might have predefined permissions for certain
8385
# directories.
8486

85-
# FIXME(SAMZA-2804): CLASSPATH_WORKSPACE_DIR is shared among all containers running on the host when using samza-yarn.
86-
# Using the same path for all containers running on the host for manifest.txt and pathing.jar is a race condition.
87-
# e.g. "/<hadoop dir>/usercache/<linux account>/appcache/application_1745893616511_0059/container_e64_1745893616511_0059_01_002027/__package/classpath_workspace/pathing.jar"
88-
CLASSPATH_WORKSPACE_DIR=$base_dir/classpath_workspace
87+
# Note: When on samza-yarn, CLASSPATH_WORKSPACE_DIR looks like:
88+
# /<hadoop dir>/usercache/<linux account>/appcache/application_1745893616511_0059/container_e64_1745893616511_0059_01_002027/classpath_workspace
89+
CLASSPATH_WORKSPACE_DIR=$home_dir/classpath_workspace
8990
mkdir -p $CLASSPATH_WORKSPACE_DIR
9091

91-
# FIXME(SAMZA-2804): This is a race condition when using samza-yarn.
9292
# file containing the classpath string; used to avoid passing long classpaths directly to the jar command
9393
PATHING_MANIFEST_FILE=$CLASSPATH_WORKSPACE_DIR/manifest.txt
9494

95-
# FIXME(SAMZA-2804): This is a race condition when using samza-yarn.
9695
# jar file to include on the classpath for running the main class
9796
PATHING_JAR_FILE=$CLASSPATH_WORKSPACE_DIR/pathing.jar
9897

99-
# FIXME(SAMZA-2804): This is a race condition when using samza-yarn.
10098
# Newlines and spaces are intended to ensure proper parsing of manifest in pathing jar
10199
printf "Class-Path: \n $CLASSPATH \n" > $PATHING_MANIFEST_FILE
102100

103-
# FIXME(SAMZA-2804): This is a race condition when using samza-yarn.
104101
# Creates a new archive and adds custom manifest information to pathing.jar
105102
eval "$JAR -cvmf $PATHING_MANIFEST_FILE $PATHING_JAR_FILE"
106103

@@ -110,17 +107,19 @@ else
110107
JAVA="$JAVA_HOME/bin/java"
111108
fi
112109

113-
# FIXME(SAMZA-2804): This log directory is shared among all containers running on the host when using samza-yarn.
114110
if [ -z "$SAMZA_LOG_DIR" ]; then
115-
SAMZA_LOG_DIR="$base_dir"
111+
# When on samza-yarn, SAMZA_LOG_DIR will point to the symlink located at:
112+
# /<hadoop dir>/usercache/<linux account>/appcache/application_1745893616511_0059/container_e64_1745893616511_0059_01_002027/logs
113+
#
114+
# When the symlink is resolved, this path will point to:
115+
# /<hadoop dir>/userlogs/application_1745893616511_0059/container_e64_1745893616511_0059_01_002027
116+
SAMZA_LOG_DIR="$home_dir/logs"
116117
fi
117118

118-
# FIXME(SAMZA-2804): This directory is shared among all containers running on the host when using samza-yarn. We should
119-
# likely be using a per-container tmp directory instead.
120-
#
121-
# add usercache directory
122-
mkdir -p $base_dir/tmp
123-
JAVA_TEMP_DIR=$base_dir/tmp
119+
# When on samza-yarn, JAVA_TEMP_DIR will point to a path similar to:
120+
# /<hadoop dir>/usercache/<linux account>/appcache/application_1745893616511_0059/container_e64_1745893616511_0059_01_002027/tmp
121+
mkdir -p $home_dir/tmp
122+
JAVA_TEMP_DIR=$home_dir/tmp
124123

125124
# Check whether the JVM supports GC Log rotation, and enable it if so.
126125
function check_and_enable_gc_log_rotation {

0 commit comments

Comments
 (0)