Skip to content

Refactor: partition DistRing into 4 per-scope rings, expose user scope API#570

Merged
ChaoWao merged 1 commit intohw-native-sys:mainfrom
ChaoWao:refactor/per-scope-rings-and-user-scope
Apr 15, 2026
Merged

Refactor: partition DistRing into 4 per-scope rings, expose user scope API#570
ChaoWao merged 1 commit intohw-native-sys:mainfrom
ChaoWao:refactor/per-scope-rings-and-user-scope

Conversation

@ChaoWao
Copy link
Copy Markdown
Collaborator

@ChaoWao ChaoWao commented Apr 15, 2026

Summary

Strict-1 alignment with L2 (matches PTO2_MAX_RING_DEPTH = 4): a long-running outer-scope task no longer holds the FIFO head against inner-scope churn. Each scope-layer heap is its own mmap(MAP_SHARED) with its own mu/cv/last_alive; ring selection is driven by scope depth min(scope_depth, DIST_MAX_RING_DEPTH - 1). DistOrchestrator::scope_begin / scope_end are now user-facing, plus a with orch.scope(): context manager on the Python side. Resolves Q10 / audit correction 2026-04-15 in .claude/plans/HIERARCHICAL_RUNTIME_REFACTOR.md.

What changed

  • DistRing — array of 4 HeapRing structs (per-ring heap_base/top/tail, released[], slot_heap_end[], mu/cv). Shared: one std::deque<unique_ptr<DistTaskSlotState>> + monotonic next_task_id_. alloc(bytes, scope_depth) picks the ring; release(slot) reads slot.ring_idx / slot.ring_slot_idx to advance the right ring. reset_to_empty rewinds every ring. Locking is split into two phases so ring.mu and slots_mu_ are never nested the same direction twice — per-ring back-pressure cannot block other rings or slot-state readers.
  • DistTaskSlotState — gains ring_idx / ring_slot_idx, stamped by DistRing::alloc before return; reset() preserves them.
  • DistScope — adds current_depth() (0-based, L2-style) used for ring selection.
  • DistOrchestratorscope_begin / scope_end made public; alloc() and reserve_outputs_and_slot() now pass scope_->current_depth() so allocs inside a nested scope land in the nested ring.
  • Nanobind — binds scope_begin / scope_end (no underscore); keeps _scope_begin / _scope_end aliases for the Worker.run facade. Exposes DIST_MAX_RING_DEPTH / DIST_MAX_SCOPE_DEPTH.
  • Python Orchestrator — adds scope_begin / scope_end plus with orch.scope(): context manager.
  • Worker(heap_ring_size=N) — is now per-ring. Total VA reservation = 4 × N (default 4 GiB across 4 × 1 GiB). Physical pages stay lazy under MAP_ANONYMOUS.
  • Docs — rewrites orchestrator.md §5 (per-scope ring) and §6 (Scope: user-facing API + ring table + non-blocking scope_end rationale); updates task-flow.md §7 memory partitioning to describe 4 separate MAP_SHARED mmaps; moves "per-scope HeapRing / user nested scope" from "In flight" to "Landed" in roadmap.md.

Design decisions (recorded in plan Design Decision Log)

  1. scope_end is non-blocking — matches L2; drain() is the synchronous wait.
  2. Python API: both raw scope_begin / scope_end and with orch.scope(): context manager.
  3. heap_ring_size is per-ring (total VA = 4 × N).
  4. alloc(shape, dtype) inherits caller's scope depth.
  5. reset_to_empty rewinds every ring.

Test plan

  • pip install --no-build-isolation -e . builds cleanly in a fresh worktree venv.
  • ctest --test-dir tests/ut/cpp/build -LE requires_hardware — 7 targets pass (including new DistRing scope tests: ring-idx mapping, distinct mmaps per ring, routing by scope_depth, independent reclamation while another ring is full, inner-ring churn while outer holds, reset rewinds all rings, slot metadata).
  • pytest tests/ut/py/test_dist_worker/ — 21 pass (including 3 new scope tests: binding exposed, 3-deep nesting completes, basic with-scope run).
  • CI sim pipeline (ci.py -p a2a3sim) — run by reviewers / CI.
  • Hardware L3 scene tests (pytest tests/st/a2a3/tensormap_and_ringbuffer/test_l3_*.py --platform a2a3) — run by reviewers / CI.

…e API

Strict-1 alignment with L2 (PTO2_MAX_RING_DEPTH = 4). A long-running
outer-scope task no longer holds the FIFO head against inner-scope
churn: each scope-layer ring has its own mmap(MAP_SHARED) heap,
mutex/cv, and last_alive frontier, so inner scopes reclaim
independently of outer ones.

- DistRing: array of 4 HeapRing structs (per-ring heap_base/top/tail,
  released[]/slot_heap_end[], mu/cv). Shared: std::deque of slot states
  + monotonic next_task_id_. alloc(bytes, scope_depth) picks ring via
  min(scope_depth, DIST_MAX_RING_DEPTH-1); release(slot) reads
  slot.ring_idx/ring_slot_idx to advance the correct ring.
  reset_to_empty rewinds every ring. Locking: alloc splits into ring.mu
  (phase 1) and slots_mu_ (phase 2) — never nested — so per-ring
  back-pressure waits cannot block other rings or slot-state readers.

- DistTaskSlotState gains ring_idx / ring_slot_idx, stamped by
  DistRing::alloc before return; reset() deliberately preserves them.

- DistScope adds current_depth() (L2-style 0-based) used for ring
  selection.

- DistOrchestrator: scope_begin / scope_end made public. alloc(shape,
  dtype) and reserve_outputs_and_slot now pass scope_->current_depth()
  so allocs inside a nested scope land in the nested ring.

- Nanobind: bind scope_begin / scope_end (no underscore). Keep
  _scope_begin / _scope_end aliases for the Python Worker.run facade.
  Expose DIST_MAX_RING_DEPTH and DIST_MAX_SCOPE_DEPTH.

- Python Orchestrator: add scope_begin / scope_end plus
  `with orch.scope():` context manager.

- Worker(heap_ring_size=N) is now PER-RING; total VA reservation is
  4 × N (default 4 GiB across 4 × 1 GiB). Physical pages stay lazy
  under MAP_ANONYMOUS.

- Tests: 7 new DistRing cpp tests (ring-idx mapping, distinct mmaps
  per ring, routing by scope_depth, independent reclamation while
  another ring is full, inner-ring churn while outer holds, reset
  rewinds all rings, slot metadata). 3 new Python scope tests
  (binding exposed, 3-deep nesting completes, basic with-scope run).

- Docs: rewrite orchestrator.md §5 (per-scope ring) and §6 (Scope —
  user-facing API + ring table, non-blocking scope_end rationale);
  task-flow.md §7 memory partitioning now describes 4 separate
  MAP_SHARED mmaps; roadmap.md moves "per-scope HeapRing / user
  nested scope" from "In flight" to "Landed".
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a per-scope heap allocation mechanism by refactoring the single DistRing into four independent heap rings. Tasks are now assigned to a specific ring based on their nesting depth, allowing inner-scope tasks to reclaim memory independently of outer-scope tasks. The changes include exposing scope_begin, scope_end, and a Python with orch.scope() context manager to users, along with updated documentation and comprehensive tests for the new hierarchical reclamation logic. I have no feedback to provide.

@ChaoWao ChaoWao merged commit 58424d0 into hw-native-sys:main Apr 15, 2026
15 checks passed
@ChaoWao ChaoWao deleted the refactor/per-scope-rings-and-user-scope branch April 16, 2026 03:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant