Refactor: fold DistChipProcess/DistSubWorker into WorkerThread PROCESS mode#575
Conversation
…S mode
WorkerThread gains Mode::THREAD | PROCESS. In PROCESS mode the parent
thread encodes (callable, config, args_blob) into a unified shm mailbox,
signals TASK_READY, spin-polls TASK_DONE, and resets IDLE — absorbing
the logic that lived in the standalone DistChipProcess and DistSubWorker
IWorker subclasses (now deleted).
The unified mailbox layout matches the former chip layout byte-for-byte
(state/error/callable/config/blob at fixed offsets); sub children read
callable as a uint64 encoding the callable_id and ignore config + blob.
- WorkerThread: dispatch_thread (unchanged) / dispatch_process (new);
mailbox read/write helpers with aarch64/x86/fallback memory ordering;
shutdown_child writes SHUTDOWN to mailbox.
- DistWorkerManager: stores WorkerEntry {worker, mode, mailbox} instead
of bare IWorker*. New add_next_level_process / add_sub_process /
shutdown_children. start() creates WorkerThreads with the stored mode.
- DistWorker: add_process_worker(type, mailbox) convenience.
- Nanobind: DistChipProcess / DistSubWorker bindings deleted; replaced
by add_next_level_process(mailbox_ptr) / add_sub_process(mailbox_ptr)
on DistWorker. DIST_MAILBOX_SIZE exported (replaces the two old size
constants).
- Python worker.py: one set of mailbox offsets (_OFF_STATE, _OFF_ERROR,
_OFF_CALLABLE, ...) replaces _CHIP_OFF_* and _OFF_CALLABLE_ID.
_sub_worker_loop reads callable as uint64 from offset 8.
_start_level3 calls dw.add_next_level_process / add_sub_process
instead of wrapping in proxy classes. close() writes SHUTDOWN to
mailboxes directly.
- Deleted: dist_chip_process.{h,cpp}, dist_sub_worker.{h,cpp}.
CMakeLists updated to drop them from both builds.
Follow-up: consolidate PROCESS-mode polling into a single multiplexed
thread (currently 1:1 thread-per-child, same as before).
There was a problem hiding this comment.
Code Review
This pull request refactors the distributed worker system by unifying the shared-memory mailbox layout for both chip and sub-workers. It removes the explicit DistChipProcess and DistSubWorker C++ classes and their Python bindings. Instead, the WorkerThread in DistWorkerManager now handles both "THREAD mode" (direct IWorker calls) and "PROCESS mode" (IPC via the unified mailbox to pre-forked Python child processes). Python-side code (worker.py) is updated to reflect these changes, using the unified mailbox size and new add_process_worker methods. Review feedback indicates that the WorkerThread::loop in dist_worker_manager.cpp needs exception handling to prevent crashes, and a spin-poll loop in the same file could cause deadlocks if child processes become unresponsive, requiring a shutdown_ flag check. Additionally, if child process initialization fails in python/simpler/worker.py, the child should signal completion with an error code to prevent parent deadlocks.
Summary
Absorbs the parent-side PROCESS-mode dispatch protocol (write mailbox → TASK_READY → poll TASK_DONE → IDLE) into
WorkerThread, eliminating the standaloneDistChipProcessandDistSubWorkerIWorker subclasses. Net −208 lines.Second half of the old PR-D (PR-D-1 landed Strict-4 ready queues in #572; this is PR-D-2). Follow-up: consolidate PROCESS-mode polling into a single multiplexed thread (currently 1:1 thread-per-child, same as before the refactor).
What changed
WorkerThreadgainsMode::THREAD | PROCESS. Newdispatch_process()encodes(callable, config, args_blob)into a unified shm mailbox layout (byte-compatible with the old chip layout), signalsTASK_READY, spin-pollsTASK_DONE.shutdown_child()writesSHUTDOWN. Memory-ordering helpers (aarch64ldar/stlr, x86 compiler fence, fallback__atomic) moved in from the deleted classes.DistWorkerManagerstoresWorkerEntry {worker, mode, mailbox}instead of bareIWorker *. Newadd_next_level_process(mailbox)/add_sub_process(mailbox)/shutdown_children().DistWorkergainsadd_process_worker(type, mailbox).DistChipProcess/DistSubWorkerbindings deleted. Replaced byadd_next_level_process(mailbox_ptr)/add_sub_process(mailbox_ptr)onDistWorker.DIST_MAILBOX_SIZEexported (replaces oldDIST_CHIP_MAILBOX_SIZE+DIST_SUB_MAILBOX_SIZE).worker.py— unified mailbox offsets (_OFF_STATE/_OFF_CALLABLE/ …);_sub_worker_loopreadscallableasuint64at offset 8;_start_level3callsdw.add_next_level_process(addr)/dw.add_sub_process(addr);close()writesSHUTDOWNdirectly.dist_chip_process.{h,cpp},dist_sub_worker.{h,cpp}; CMakeLists updated.Test plan
pip install --no-build-isolation -e .builds cleanly.ctest --test-dir tests/ut/cpp/build -LE requires_hardware— 7 targets pass.pytest tests/ut/py/test_dist_worker/— 21 pass.