Skip to content

Commit 942e0d1

Browse files
committed
Resolve multiple concurrency issues
## Race condition in pathing jar manifest creation A race condition exists when setting up the classpath during container launch. During container launch using samza-yarn, run-class.sh creates a pathing jar file (which holds the classpath for the container launch). However, during the creation of this pathing jar, temporary files, as well as the pathing jar itself is not placed in a location unique to the container. This results in multiple containers writing to the same pathing jar location and temporary file location, which results in a race condition. This race condition may show up in several ways, such as when Yarn removes jars from a finished container (other containers will point to a classpath which no longer exists) or when multiple run-class.sh scripts attempt to write the manifest.txt or pathing jar at the same time. Note that host affinity being enabled will make this problem worse. The pathing.jar is written to the usercache, so when the container which created the pathing.jar is finished and removed, any new container which launches on that host will point to jar files which do not exist anymore. When host affinity is enabled, it will not move to a new host and just keep failing. ## Container logging directory fallback is not unique for each container The fallback log directory is the same among all containers running on the same host. It should be unique per-container. ## Container tmp dir is not unique per-container The JAVA_TMP_DIR directory is the same for all containers. We should make sure that it's safe to use the same directory for all containers.
1 parent 2ff7e41 commit 942e0d1

2 files changed

Lines changed: 39 additions & 24 deletions

File tree

samza-shell/src/main/bash/run-class.sh

Lines changed: 18 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -28,12 +28,14 @@ cd $base_dir
2828
base_dir=`pwd`
2929
cd $home_dir
3030

31-
# Note: When using samza-yarn, base_dir and pwd here looks something like:
32-
# /<hadoop path>/usercache/<linux account>/appcache/application_1745893616511_0059/container_e64_1745893616511_0059_01_002027/__package
33-
3431
echo "Current time: $(date '+%Y-%m-%d %H:%M:%S')"
3532

33+
# Note: When using samza-yarn, home_dir looks like:
34+
# /<hadoop dir>/usercache/<linux account>/appcache/application_1745893616511_0059/container_e64_1745893616511_0059_01_002027
3635
echo home_dir=$home_dir
36+
37+
# Note: When using samza-yarn, base_dir looks like:
38+
# /<hadoop path>/usercache/<linux account>/appcache/application_1745893616511_0059/container_e64_1745893616511_0059_01_002027/__package
3739
echo "framework base (location of this script). base_dir=$base_dir"
3840

3941
if [ ! -d "$base_dir/lib" ]; then
@@ -82,25 +84,20 @@ fi
8284
# this is helpful is when using container images which might have predefined permissions for certain
8385
# directories.
8486

85-
# FIXME(SAMZA-2804): CLASSPATH_WORKSPACE_DIR is shared among all containers running on the host when using samza-yarn.
86-
# Using the same path for all containers running on the host for manifest.txt and pathing.jar is a race condition.
87-
# e.g. "/<hadoop dir>/usercache/<linux account>/appcache/application_1745893616511_0059/container_e64_1745893616511_0059_01_002027/__package/classpath_workspace/pathing.jar"
88-
CLASSPATH_WORKSPACE_DIR=$base_dir/classpath_workspace
87+
# Note: When on samza-yarn, CLASSPATH_WORKSPACE_DIR looks like:
88+
# /<hadoop dir>/usercache/<linux account>/appcache/application_1745893616511_0059/container_e64_1745893616511_0059_01_002027/classpath_workspace
89+
CLASSPATH_WORKSPACE_DIR=$home_dir/classpath_workspace
8990
mkdir -p $CLASSPATH_WORKSPACE_DIR
9091

91-
# FIXME(SAMZA-2804): This is a race condition when using samza-yarn.
9292
# file containing the classpath string; used to avoid passing long classpaths directly to the jar command
9393
PATHING_MANIFEST_FILE=$CLASSPATH_WORKSPACE_DIR/manifest.txt
9494

95-
# FIXME(SAMZA-2804): This is a race condition when using samza-yarn.
9695
# jar file to include on the classpath for running the main class
9796
PATHING_JAR_FILE=$CLASSPATH_WORKSPACE_DIR/pathing.jar
9897

99-
# FIXME(SAMZA-2804): This is a race condition when using samza-yarn.
10098
# Newlines and spaces are intended to ensure proper parsing of manifest in pathing jar
10199
printf "Class-Path: \n $CLASSPATH \n" > $PATHING_MANIFEST_FILE
102100

103-
# FIXME(SAMZA-2804): This is a race condition when using samza-yarn.
104101
# Creates a new archive and adds custom manifest information to pathing.jar
105102
eval "$JAR -cvmf $PATHING_MANIFEST_FILE $PATHING_JAR_FILE"
106103

@@ -110,17 +107,19 @@ else
110107
JAVA="$JAVA_HOME/bin/java"
111108
fi
112109

113-
# FIXME(SAMZA-2804): This log directory is shared among all containers running on the host when using samza-yarn.
114110
if [ -z "$SAMZA_LOG_DIR" ]; then
115-
SAMZA_LOG_DIR="$base_dir"
111+
# When on samza-yarn, SAMZA_LOG_DIR will point to the symlink located at:
112+
# /<hadoop dir>/usercache/<linux account>/appcache/application_1745893616511_0059/container_e64_1745893616511_0059_01_002027/logs
113+
#
114+
# When the symlink is resolved, this path will point to:
115+
# /<hadoop dir>/userlogs/application_1745893616511_0059/container_e64_1745893616511_0059_01_002027
116+
SAMZA_LOG_DIR="$home_dir/logs"
116117
fi
117118

118-
# FIXME(SAMZA-2804): This directory is shared among all containers running on the host when using samza-yarn. We should
119-
# likely be using a per-container tmp directory instead.
120-
#
121-
# add usercache directory
122-
mkdir -p $base_dir/tmp
123-
JAVA_TEMP_DIR=$base_dir/tmp
119+
# When on samza-yarn, JAVA_TEMP_DIR will point to a path similar to:
120+
# /<hadoop dir>/usercache/<linux account>/appcache/application_1745893616511_0059/container_e64_1745893616511_0059_01_002027/tmp
121+
mkdir -p $home_dir/tmp
122+
JAVA_TEMP_DIR=$home_dir/tmp
124123

125124
# Check whether the JVM supports GC Log rotation, and enable it if so.
126125
function check_and_enable_gc_log_rotation {

samza-shell/src/main/bash/run-framework-class.sh

100644100755
Lines changed: 21 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,12 @@ cd $base_dir
2828
base_dir=`pwd`
2929
cd $home_dir
3030

31+
# Note: When using samza-yarn, home_dir looks like:
32+
# /<hadoop dir>/usercache/<linux account>/appcache/application_1745893616511_0059/container_e64_1745893616511_0059_01_002027
3133
echo home_dir=$home_dir
34+
35+
# Note: When using samza-yarn, base_dir looks like:
36+
# /<hadoop path>/usercache/<linux account>/appcache/application_1745893616511_0059/container_e64_1745893616511_0059_01_002027/__package
3237
echo "framework base (location of this script). base_dir=$base_dir"
3338

3439
if [ ! -d "$base_dir/lib" ]; then
@@ -107,10 +112,15 @@ fi
107112
# permissions for the classpath-related files when they are in their own directory. An example of where
108113
# this is helpful is when using container images which might have predefined permissions for certain
109114
# directories.
110-
CLASSPATH_WORKSPACE_DIR=$base_dir/classpath_workspace
115+
116+
# Note: When on samza-yarn, CLASSPATH_WORKSPACE_DIR looks like:
117+
# /<hadoop dir>/usercache/<linux account>/appcache/application_1745893616511_0059/container_e64_1745893616511_0059_01_002027/classpath_workspace
118+
CLASSPATH_WORKSPACE_DIR=$home_dir/classpath_workspace
111119
mkdir -p $CLASSPATH_WORKSPACE_DIR
120+
112121
# file containing the classpath string; used to avoid passing long classpaths directly to the jar command
113122
PATHING_MANIFEST_FILE=$CLASSPATH_WORKSPACE_DIR/manifest.txt
123+
114124
# jar file to include on the classpath for running the main class
115125
PATHING_JAR_FILE=$CLASSPATH_WORKSPACE_DIR/pathing.jar
116126

@@ -126,12 +136,18 @@ else
126136
fi
127137

128138
if [ -z "$SAMZA_LOG_DIR" ]; then
129-
SAMZA_LOG_DIR="$base_dir"
139+
# When on samza-yarn, SAMZA_LOG_DIR will point to the symlink located at:
140+
# /<hadoop dir>/usercache/<linux account>/appcache/application_1745893616511_0059/container_e64_1745893616511_0059_01_002027/logs
141+
#
142+
# When the symlink is resolved, this path will point to:
143+
# /<hadoop dir>/userlogs/application_1745893616511_0059/container_e64_1745893616511_0059_01_002027
144+
SAMZA_LOG_DIR="$home_dir"
130145
fi
131146

132-
# add usercache directory
133-
mkdir -p $base_dir/tmp
134-
JAVA_TEMP_DIR=$base_dir/tmp
147+
# When on samza-yarn, JAVA_TEMP_DIR will point to a path similar to:
148+
# /<hadoop dir>/usercache/<linux account>/appcache/application_1745893616511_0059/container_e64_1745893616511_0059_01_002027/tmp
149+
mkdir -p $home_dir/tmp
150+
JAVA_TEMP_DIR=$home_dir/tmp
135151

136152
# Check whether the JVM supports GC Log rotation, and enable it if so.
137153
function check_and_enable_gc_log_rotation {

0 commit comments

Comments
 (0)