Skip to content

fix(executor): switch process working dir via native chdir#475

Open
my-vegetable-has-exploded wants to merge 2 commits into
ray-project:masterfrom
my-vegetable-has-exploded:ch-dir
Open

fix(executor): switch process working dir via native chdir#475
my-vegetable-has-exploded wants to merge 2 commits into
ray-project:masterfrom
my-vegetable-has-exploded:ch-dir

Conversation

@my-vegetable-has-exploded

Copy link
Copy Markdown
Contributor

Motivation

The RayDP executor process CWD remains the container default directory (e.g., / or /opt) rather than the executor's workingDir. This causes Spark distributed files (--files) and archives (--archives) to be extracted to the wrong location, making it impossible for executor code to find them at the expected paths.

Approach

  • JNA native chdir call: Add JNA dependency to pom.xml, define a LibC interface to load the chdir() syscall from libc. After setUserDir() (which only changes the user.dir system property), add switchProcessWorkingDirBestEffort() to actually switch the process-level CWD
  • Align SparkEnv.driverTmpDir: Change driverTmpDir from workingDir/_tmp to workingDir itself, so the root directory for Spark distributed files matches the executor's workingDir, and files/archives are extracted to the correct path
  • Fault tolerance: chdir failure only logs a warning and does not interrupt executor startup; reads cwd before and after the switch for diagnostic logging

wangyi added 2 commits May 17, 2026 03:27
Add JNA dependency to invoke native chdir syscall for switching the
executor process working directory. Align SparkEnv.driverTmpDir with
workingDir to ensure distributed files and archives are extracted to
the correct root directory.

Signed-off-by: wangyi <epsilonwang@didiglobal.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates RayDP executor startup so the executor process attempts to switch its native working directory to the executor workingDir, aligning Spark distributed file/archive placement with executor-local expectations.

Changes:

  • Adds JNA and a native libc chdir call during executor startup.
  • Adds diagnostic logging around process CWD switching.
  • Changes SparkEnv.driverTmpDir to point at workingDir instead of workingDir/_tmp.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
core/raydp-main/src/main/scala/org/apache/spark/executor/RayDPExecutor.scala Adds best-effort native CWD switching and aligns Spark distributed file root with executor working directory.
core/raydp-main/pom.xml Adds the JNA dependency required for native libc access.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +201 to +202
logWarning(s"Failed to switch executor process cwd from ${beforeCwd} to ${targetDir}, " +
s"chdir returned rc=${rc}, errno=${Native.getLastError}")
assert(workerTmpDir.exists() && workerTmpDir.isDirectory)
SparkEnv.get.driverTmpDir = Some(workerTmpDir.getAbsolutePath)
// Keep Spark's distributed file/archive root aligned with executor workingDir.
SparkEnv.get.driverTmpDir = Some(workingDir.getAbsolutePath)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will only do this solve the problem?


private def getProcessWorkingDir: String = {
try {
val procCwd = Paths.get("/proc/self/cwd")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is linux specific.

@pang-wu

pang-wu commented May 23, 2026

Copy link
Copy Markdown
Collaborator

Can we add some tests to verify the fix work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants