Skip to content

WindowsProcessLauncher causes ERROR_ACCESS_DENIED (errno 5) due to race on concurrent spawns #397

Description

@jomof

Before every Windows process spawn, WindowsProcessLauncher clears the inherit flag on the JVM's three standard handles and never restores them (WindowsProcessLauncher.java#L14-L29).

JVMs since Java 8 already do this, and they do it under a lock.

The WindowsProcessLauncher version layers on top of that but runs without the lock, races the JVM for the same handles, and intermittently fails with ERROR_ACCESS_DENIED (errno 5) due to the resulting race for global access to those handles.

What is 'greedy-grandchild' anyway?

The code in WindowsProcessLauncher dates to 2013 and, unfortunately, the commit message doesn't describe the rationale for it. But, it calls SetHandleInformation(HANDLE_FLAG_INHERIT, 0) on STD_INPUT, STD_OUTPUT, and STD_ERROR so that child processes cannot leak the parent's handles, so it is surely trying to address the "greedy grandchild" problem. That's when a spawned process leaks inherited stdio handles to its own spawned child processes, so the pipe never reaches EOF. Prior to JDK-7147084, without WindowsProcessLauncher, this would have caused hangs.

Since Java 8, ProcessImpl.create drops and restores the standard handles' inherit flags around CreateProcess, and the method is static synchronized, so the JVM serializes the change on ProcessImpl.class (so no racing for handles).

Why do I care?

It's been causing flakes in my project's CI for several years and I just now figured out the cause. Our flake rate for tests that spawn processes that, themselves, spawn other processes has hovered around 1%, but only on Windows. The error message indicates ERROR_ACCESS_DENIED and the callstacks lead back here, to WindowsProcessLauncher . Looking at our telemetry, this is almost certainly being hit by our users, though unfortunately we don't segment by the exception message text so I can't know the relative frequency that way. But since this represents the vast majority of test flakes, I think it's probably also a majority of actual user failures in similar scenarios.

Proposed fix

Assuming Gradle needs to support Javas 8 and before, the fix would be along the lines of:

if (isJava9OrAbove()) {
    return type.cast(new WrapperProcessLauncher(new DefaultProcessLauncher()));
} else {
    return type.cast(new WrapperProcessLauncher(new WindowsProcessLauncher(new DefaultProcessLauncher())));
}

I'm using Java 9 here because it's not clear to me that all Java 8 versions had both JDK-7147084 and JDK-6921885. I suspect they do as I even found evidence these fixes were ported back to Java 7 but it's hard to know for sure.

I'd be happy to send a PR for review, but I thought it might be worth discussing in this bug first.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions