Skip to content

ProcessPoolExecutor: pending futures never complete if executor is gc'd while a max_tasks_per_child= worker exits #152967

Description

@gpshead

Bug report

Bug description:

If the last reference to a ProcessPoolExecutor is dropped without calling shutdown() while queued work remains, and a worker exits upon reaching its max_tasks_per_child limit, the remaining futures never complete and interpreter exit hangs joining the executor manager thread.

from concurrent.futures import ProcessPoolExecutor

def work(i):
    return i * i

def make_futures():
    exe = ProcessPoolExecutor(1, max_tasks_per_child=1)
    return [exe.submit(work, i) for i in range(4)]

if __name__ == "__main__":
    futs = make_futures()
    print([f.result(timeout=30) for f in futs])

Result: TimeoutError after 30 seconds, then the process hangs at exit.

The executor manager thread replaces a worker that exited at its task limit through a weakref to the executor:

if executor := self.executor_reference():
    if process_exited:
        with self.shutdown_lock:
            executor._replace_dead_worker()

Once the executor object has been collected, the weakref returns None and no replacement can ever be spawned. The manager thread then waits forever: pending work items remain (their calls are queued but no worker exists), so it never reaches the empty-pending exit condition, the futures are never resolved and never marked broken, and _python_exit() blocks joining the manager thread at interpreter exit.

Dropping an executor without shutdown() while holding its futures is otherwise supported: with max_tasks_per_child unset, the same program completes normally because the workers stay alive and drain the queue. Only the worker-exits-at-limit case strands work.

Suggested direction: when the manager observes a process exit while the executor weakref is dead, and no live workers remain while work items are pending, fail the pending work items with BrokenProcessPool instead of waiting forever.

Found independently by Claude Fable 5 while I had it test and update the gh-115634 fix (GH-140900).
This reproduces identically before and after that fix -- the replacement path cannot help because there is no executor left to spawn through. The relevant code shape is unchanged since max_tasks_per_child was added in 3.11.

See also #83386 for an older exit-hang of a different mechanism.

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Linked PRs

Metadata

Metadata

Assignees

Labels

stdlibStandard Library Python modules in the Lib/ directorytopic-multiprocessingtype-bugAn unexpected behavior, bug, or error

Fields

No fields configured for issues without a type.

Projects

Status
No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions