Bug report
Bug description:
If the last reference to a ProcessPoolExecutor is dropped without calling shutdown() while queued work remains, and a worker exits upon reaching its max_tasks_per_child limit, the remaining futures never complete and interpreter exit hangs joining the executor manager thread.
from concurrent.futures import ProcessPoolExecutor
def work(i):
return i * i
def make_futures():
exe = ProcessPoolExecutor(1, max_tasks_per_child=1)
return [exe.submit(work, i) for i in range(4)]
if __name__ == "__main__":
futs = make_futures()
print([f.result(timeout=30) for f in futs])
Result: TimeoutError after 30 seconds, then the process hangs at exit.
The executor manager thread replaces a worker that exited at its task limit through a weakref to the executor:
if executor := self.executor_reference():
if process_exited:
with self.shutdown_lock:
executor._replace_dead_worker()
Once the executor object has been collected, the weakref returns None and no replacement can ever be spawned. The manager thread then waits forever: pending work items remain (their calls are queued but no worker exists), so it never reaches the empty-pending exit condition, the futures are never resolved and never marked broken, and _python_exit() blocks joining the manager thread at interpreter exit.
Dropping an executor without shutdown() while holding its futures is otherwise supported: with max_tasks_per_child unset, the same program completes normally because the workers stay alive and drain the queue. Only the worker-exits-at-limit case strands work.
Suggested direction: when the manager observes a process exit while the executor weakref is dead, and no live workers remain while work items are pending, fail the pending work items with BrokenProcessPool instead of waiting forever.
Found independently by Claude Fable 5 while I had it test and update the gh-115634 fix (GH-140900).
This reproduces identically before and after that fix -- the replacement path cannot help because there is no executor left to spawn through. The relevant code shape is unchanged since max_tasks_per_child was added in 3.11.
See also #83386 for an older exit-hang of a different mechanism.
CPython versions tested on:
CPython main branch
Operating systems tested on:
Linux
Linked PRs
Bug report
Bug description:
If the last reference to a ProcessPoolExecutor is dropped without calling shutdown() while queued work remains, and a worker exits upon reaching its max_tasks_per_child limit, the remaining futures never complete and interpreter exit hangs joining the executor manager thread.
Result: TimeoutError after 30 seconds, then the process hangs at exit.
The executor manager thread replaces a worker that exited at its task limit through a weakref to the executor:
Once the executor object has been collected, the weakref returns None and no replacement can ever be spawned. The manager thread then waits forever: pending work items remain (their calls are queued but no worker exists), so it never reaches the empty-pending exit condition, the futures are never resolved and never marked broken, and _python_exit() blocks joining the manager thread at interpreter exit.
Dropping an executor without shutdown() while holding its futures is otherwise supported: with max_tasks_per_child unset, the same program completes normally because the workers stay alive and drain the queue. Only the worker-exits-at-limit case strands work.
Suggested direction: when the manager observes a process exit while the executor weakref is dead, and no live workers remain while work items are pending, fail the pending work items with BrokenProcessPool instead of waiting forever.
Found independently by Claude Fable 5 while I had it test and update the gh-115634 fix (GH-140900).
This reproduces identically before and after that fix -- the replacement path cannot help because there is no executor left to spawn through. The relevant code shape is unchanged since max_tasks_per_child was added in 3.11.
See also #83386 for an older exit-hang of a different mechanism.
CPython versions tested on:
CPython main branch
Operating systems tested on:
Linux
Linked PRs