Skip to content

Refactor: fold DistChipProcess/DistSubWorker into WorkerThread PROCESS mode#575

Merged
ChaoWao merged 1 commit intohw-native-sys:mainfrom
ChaoWao:refactor/workerthread-process-mode
Apr 16, 2026
Merged

Refactor: fold DistChipProcess/DistSubWorker into WorkerThread PROCESS mode#575
ChaoWao merged 1 commit intohw-native-sys:mainfrom
ChaoWao:refactor/workerthread-process-mode

Conversation

@ChaoWao
Copy link
Copy Markdown
Collaborator

@ChaoWao ChaoWao commented Apr 16, 2026

Summary

Absorbs the parent-side PROCESS-mode dispatch protocol (write mailbox → TASK_READY → poll TASK_DONE → IDLE) into WorkerThread, eliminating the standalone DistChipProcess and DistSubWorker IWorker subclasses. Net −208 lines.

Second half of the old PR-D (PR-D-1 landed Strict-4 ready queues in #572; this is PR-D-2). Follow-up: consolidate PROCESS-mode polling into a single multiplexed thread (currently 1:1 thread-per-child, same as before the refactor).

What changed

  • WorkerThread gains Mode::THREAD | PROCESS. New dispatch_process() encodes (callable, config, args_blob) into a unified shm mailbox layout (byte-compatible with the old chip layout), signals TASK_READY, spin-polls TASK_DONE. shutdown_child() writes SHUTDOWN. Memory-ordering helpers (aarch64 ldar/stlr, x86 compiler fence, fallback __atomic) moved in from the deleted classes.
  • DistWorkerManager stores WorkerEntry {worker, mode, mailbox} instead of bare IWorker *. New add_next_level_process(mailbox) / add_sub_process(mailbox) / shutdown_children().
  • DistWorker gains add_process_worker(type, mailbox).
  • NanobindDistChipProcess / DistSubWorker bindings deleted. Replaced by add_next_level_process(mailbox_ptr) / add_sub_process(mailbox_ptr) on DistWorker. DIST_MAILBOX_SIZE exported (replaces old DIST_CHIP_MAILBOX_SIZE + DIST_SUB_MAILBOX_SIZE).
  • Python worker.py — unified mailbox offsets (_OFF_STATE / _OFF_CALLABLE / …); _sub_worker_loop reads callable as uint64 at offset 8; _start_level3 calls dw.add_next_level_process(addr) / dw.add_sub_process(addr); close() writes SHUTDOWN directly.
  • Deleteddist_chip_process.{h,cpp}, dist_sub_worker.{h,cpp}; CMakeLists updated.

Test plan

  • pip install --no-build-isolation -e . builds cleanly.
  • ctest --test-dir tests/ut/cpp/build -LE requires_hardware — 7 targets pass.
  • pytest tests/ut/py/test_dist_worker/ — 21 pass.
  • CI sim pipeline — run by reviewers / CI.
  • Hardware L3 scene tests — run by reviewers / CI.

…S mode

WorkerThread gains Mode::THREAD | PROCESS. In PROCESS mode the parent
thread encodes (callable, config, args_blob) into a unified shm mailbox,
signals TASK_READY, spin-polls TASK_DONE, and resets IDLE — absorbing
the logic that lived in the standalone DistChipProcess and DistSubWorker
IWorker subclasses (now deleted).

The unified mailbox layout matches the former chip layout byte-for-byte
(state/error/callable/config/blob at fixed offsets); sub children read
callable as a uint64 encoding the callable_id and ignore config + blob.

- WorkerThread: dispatch_thread (unchanged) / dispatch_process (new);
  mailbox read/write helpers with aarch64/x86/fallback memory ordering;
  shutdown_child writes SHUTDOWN to mailbox.

- DistWorkerManager: stores WorkerEntry {worker, mode, mailbox} instead
  of bare IWorker*. New add_next_level_process / add_sub_process /
  shutdown_children. start() creates WorkerThreads with the stored mode.

- DistWorker: add_process_worker(type, mailbox) convenience.

- Nanobind: DistChipProcess / DistSubWorker bindings deleted; replaced
  by add_next_level_process(mailbox_ptr) / add_sub_process(mailbox_ptr)
  on DistWorker. DIST_MAILBOX_SIZE exported (replaces the two old size
  constants).

- Python worker.py: one set of mailbox offsets (_OFF_STATE, _OFF_ERROR,
  _OFF_CALLABLE, ...) replaces _CHIP_OFF_* and _OFF_CALLABLE_ID.
  _sub_worker_loop reads callable as uint64 from offset 8.
  _start_level3 calls dw.add_next_level_process / add_sub_process
  instead of wrapping in proxy classes. close() writes SHUTDOWN to
  mailboxes directly.

- Deleted: dist_chip_process.{h,cpp}, dist_sub_worker.{h,cpp}.
  CMakeLists updated to drop them from both builds.

Follow-up: consolidate PROCESS-mode polling into a single multiplexed
thread (currently 1:1 thread-per-child, same as before).
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the distributed worker system by unifying the shared-memory mailbox layout for both chip and sub-workers. It removes the explicit DistChipProcess and DistSubWorker C++ classes and their Python bindings. Instead, the WorkerThread in DistWorkerManager now handles both "THREAD mode" (direct IWorker calls) and "PROCESS mode" (IPC via the unified mailbox to pre-forked Python child processes). Python-side code (worker.py) is updated to reflect these changes, using the unified mailbox size and new add_process_worker methods. Review feedback indicates that the WorkerThread::loop in dist_worker_manager.cpp needs exception handling to prevent crashes, and a spin-poll loop in the same file could cause deadlocks if child processes become unresponsive, requiring a shutdown_ flag check. Additionally, if child process initialization fails in python/simpler/worker.py, the child should signal completion with an error code to prevent parent deadlocks.

Comment thread src/common/distributed/dist_worker_manager.cpp
Comment thread src/common/distributed/dist_worker_manager.cpp
Comment thread python/simpler/worker.py
@ChaoWao ChaoWao merged commit 92155aa into hw-native-sys:main Apr 16, 2026
29 of 30 checks passed
@ChaoWao ChaoWao deleted the refactor/workerthread-process-mode branch April 16, 2026 02:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant