gh-119592: gh-152967: Fix ProcessPoolExecutor stranding submitted work when a max_tasks_per_child worker exits by gpshead · Pull Request #152978 · python/cpython

gpshead · 2026-07-03T19:11:39Z

Worker replacement went through the executor object: the manager thread read executor attributes that shutdown(wait=False) clears concurrently, and could not replace workers at all once the executor was garbage collected. A worker exiting at its max_tasks_per_child limit in those states left the remaining submitted work permanently unexecuted and hung interpreter exit; the racing case could crash the manager thread.

Replace workers from the executor manager thread using its own state plus configuration read through the live executor weakref, which shutdown() never clears:

After shutdown(wait=False) with the executor still referenced, a replacement is spawned and the remaining work is executed as documented.
Once the executor has been garbage collected (ProcessPoolExecutor: pending futures never complete if executor is gc'd while a max_tasks_per_child= worker exits #152967), or a replacement worker cannot be started and no workers remain, the remaining futures now fail with BrokenProcessPool instead of never resolving.
A new _force_shutting_down flag stops both spawn paths from starting workers that would escape terminate_workers()/kill_workers().

Drafted and investigated entirely by Claude Fable 5 based on the issues. I'm putting this up as a draft to better iterate on review to see what shape this should take and how feasible backporting this further as a bugfix could even be. edit: Looks to be in good shape. Undrafting.

Issue: ProcessPoolExecutor executor manager thread crashes if task fails in some circumstances #119592

…n a max_tasks_per_child worker exits Worker replacement went through the executor object: the manager thread read executor attributes that shutdown(wait=False) clears concurrently, and could not replace workers at all once the executor was garbage collected. A worker exiting at its max_tasks_per_child limit in those states left the remaining submitted work permanently unexecuted and hung interpreter exit; the racing case could crash the manager thread. Replace workers from the executor manager thread using its own state plus configuration read through the live executor weakref, which shutdown() never clears: - After shutdown(wait=False) with the executor still referenced, a replacement is spawned and the remaining work is executed as documented. - Once the executor has been garbage collected (pythongh-152967), or a replacement worker cannot be started and no workers remain, the remaining futures now fail with BrokenProcessPool instead of never resolving. - A new _force_shutting_down flag stops both spawn paths from starting workers that would escape terminate_workers()/kill_workers(). Co-authored-by: Claude Fable 5 <noreply@anthropic.com>

gpshead · 2026-07-03T19:19:55Z

        self._join_executor_internals(broken=True)

-    def terminate_broken(self, cause):
+    def terminate_broken(self, cause, bpe_message=None):


While this might look like a public API change in the diff... it's on the _ExecutorManagerThread internal use only class. Fine to backport.

gpshead self-assigned this Jul 3, 2026

gpshead added the needs backport to 3.15 pre-release feature fixes, bugs and security fixes label Jul 3, 2026

bedevere-app Bot mentioned this pull request Jul 3, 2026

ProcessPoolExecutor executor manager thread crashes if task fails in some circumstances #119592

Open

gpshead mentioned this pull request Jul 3, 2026

ProcessPoolExecutor: pending futures never complete if executor is gc'd while a max_tasks_per_child= worker exits #152967

Open

gpshead commented Jul 3, 2026

View reviewed changes

gpshead marked this pull request as ready for review July 4, 2026 05:04

bedevere-app Bot added the awaiting core review label Jul 4, 2026

gpshead added the needs backport to 3.14 bugs and security fixes label Jul 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-119592: gh-152967: Fix ProcessPoolExecutor stranding submitted work when a max_tasks_per_child worker exits#152978

gh-119592: gh-152967: Fix ProcessPoolExecutor stranding submitted work when a max_tasks_per_child worker exits#152978
gpshead wants to merge 1 commit into
python:mainfrom
gpshead:fix-gh-119592-worker-replacement

gpshead commented Jul 3, 2026 •

edited

Loading

Uh oh!

gpshead Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

gpshead commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gpshead Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gpshead commented Jul 3, 2026 •

edited

Loading