Before every Windows process spawn, WindowsProcessLauncher clears the inherit flag on the JVM's three standard handles and never restores them (WindowsProcessLauncher.java#L14-L29).
JVMs since Java 8 already do this, and they do it under a lock.
The WindowsProcessLauncher version layers on top of that but runs without the lock, races the JVM for the same handles, and intermittently fails with ERROR_ACCESS_DENIED (errno 5) due to the resulting race for global access to those handles.
What is 'greedy-grandchild' anyway?
The code in WindowsProcessLauncher dates to 2013 and, unfortunately, the commit message doesn't describe the rationale for it. But, it calls SetHandleInformation(HANDLE_FLAG_INHERIT, 0) on STD_INPUT, STD_OUTPUT, and STD_ERROR so that child processes cannot leak the parent's handles, so it is surely trying to address the "greedy grandchild" problem. That's when a spawned process leaks inherited stdio handles to its own spawned child processes, so the pipe never reaches EOF. Prior to JDK-7147084, without WindowsProcessLauncher, this would have caused hangs.
Since Java 8, ProcessImpl.create drops and restores the standard handles' inherit flags around CreateProcess, and the method is static synchronized, so the JVM serializes the change on ProcessImpl.class (so no racing for handles).
Why do I care?
It's been causing flakes in my project's CI for several years and I just now figured out the cause. Our flake rate for tests that spawn processes that, themselves, spawn other processes has hovered around 1%, but only on Windows. The error message indicates ERROR_ACCESS_DENIED and the callstacks lead back here, to WindowsProcessLauncher . Looking at our telemetry, this is almost certainly being hit by our users, though unfortunately we don't segment by the exception message text so I can't know the relative frequency that way. But since this represents the vast majority of test flakes, I think it's probably also a majority of actual user failures in similar scenarios.
Proposed fix
Assuming Gradle needs to support Javas 8 and before, the fix would be along the lines of:
if (isJava9OrAbove()) {
return type.cast(new WrapperProcessLauncher(new DefaultProcessLauncher()));
} else {
return type.cast(new WrapperProcessLauncher(new WindowsProcessLauncher(new DefaultProcessLauncher())));
}
I'm using Java 9 here because it's not clear to me that all Java 8 versions had both JDK-7147084 and JDK-6921885. I suspect they do as I even found evidence these fixes were ported back to Java 7 but it's hard to know for sure.
I'd be happy to send a PR for review, but I thought it might be worth discussing in this bug first.
Before every Windows process spawn,
WindowsProcessLauncherclears the inherit flag on the JVM's three standard handles and never restores them (WindowsProcessLauncher.java#L14-L29).JVMs since Java 8 already do this, and they do it under a lock.
create()is synchronized)The
WindowsProcessLauncherversion layers on top of that but runs without the lock, races the JVM for the same handles, and intermittently fails withERROR_ACCESS_DENIED(errno 5) due to the resulting race for global access to those handles.What is 'greedy-grandchild' anyway?
The code in
WindowsProcessLauncherdates to 2013 and, unfortunately, the commit message doesn't describe the rationale for it. But, it callsSetHandleInformation(HANDLE_FLAG_INHERIT, 0)on STD_INPUT, STD_OUTPUT, and STD_ERROR so that child processes cannot leak the parent's handles, so it is surely trying to address the "greedy grandchild" problem. That's when a spawned process leaks inherited stdio handles to its own spawned child processes, so the pipe never reaches EOF. Prior to JDK-7147084, withoutWindowsProcessLauncher, this would have caused hangs.Since Java 8, ProcessImpl.create drops and restores the standard handles' inherit flags around CreateProcess, and the method is
static synchronized, so the JVM serializes the change on ProcessImpl.class (so no racing for handles).Why do I care?
It's been causing flakes in my project's CI for several years and I just now figured out the cause. Our flake rate for tests that spawn processes that, themselves, spawn other processes has hovered around 1%, but only on Windows. The error message indicates
ERROR_ACCESS_DENIEDand the callstacks lead back here, toWindowsProcessLauncher. Looking at our telemetry, this is almost certainly being hit by our users, though unfortunately we don't segment by the exception message text so I can't know the relative frequency that way. But since this represents the vast majority of test flakes, I think it's probably also a majority of actual user failures in similar scenarios.Proposed fix
Assuming Gradle needs to support Javas 8 and before, the fix would be along the lines of:
I'm using Java 9 here because it's not clear to me that all Java 8 versions had both JDK-7147084 and JDK-6921885. I suspect they do as I even found evidence these fixes were ported back to Java 7 but it's hard to know for sure.
I'd be happy to send a PR for review, but I thought it might be worth discussing in this bug first.