Refactor: partition DistRing into 4 per-scope rings, expose user scope API#570
Merged
ChaoWao merged 1 commit intohw-native-sys:mainfrom Apr 15, 2026
Conversation
…e API Strict-1 alignment with L2 (PTO2_MAX_RING_DEPTH = 4). A long-running outer-scope task no longer holds the FIFO head against inner-scope churn: each scope-layer ring has its own mmap(MAP_SHARED) heap, mutex/cv, and last_alive frontier, so inner scopes reclaim independently of outer ones. - DistRing: array of 4 HeapRing structs (per-ring heap_base/top/tail, released[]/slot_heap_end[], mu/cv). Shared: std::deque of slot states + monotonic next_task_id_. alloc(bytes, scope_depth) picks ring via min(scope_depth, DIST_MAX_RING_DEPTH-1); release(slot) reads slot.ring_idx/ring_slot_idx to advance the correct ring. reset_to_empty rewinds every ring. Locking: alloc splits into ring.mu (phase 1) and slots_mu_ (phase 2) — never nested — so per-ring back-pressure waits cannot block other rings or slot-state readers. - DistTaskSlotState gains ring_idx / ring_slot_idx, stamped by DistRing::alloc before return; reset() deliberately preserves them. - DistScope adds current_depth() (L2-style 0-based) used for ring selection. - DistOrchestrator: scope_begin / scope_end made public. alloc(shape, dtype) and reserve_outputs_and_slot now pass scope_->current_depth() so allocs inside a nested scope land in the nested ring. - Nanobind: bind scope_begin / scope_end (no underscore). Keep _scope_begin / _scope_end aliases for the Python Worker.run facade. Expose DIST_MAX_RING_DEPTH and DIST_MAX_SCOPE_DEPTH. - Python Orchestrator: add scope_begin / scope_end plus `with orch.scope():` context manager. - Worker(heap_ring_size=N) is now PER-RING; total VA reservation is 4 × N (default 4 GiB across 4 × 1 GiB). Physical pages stay lazy under MAP_ANONYMOUS. - Tests: 7 new DistRing cpp tests (ring-idx mapping, distinct mmaps per ring, routing by scope_depth, independent reclamation while another ring is full, inner-ring churn while outer holds, reset rewinds all rings, slot metadata). 3 new Python scope tests (binding exposed, 3-deep nesting completes, basic with-scope run). - Docs: rewrite orchestrator.md §5 (per-scope ring) and §6 (Scope — user-facing API + ring table, non-blocking scope_end rationale); task-flow.md §7 memory partitioning now describes 4 separate MAP_SHARED mmaps; roadmap.md moves "per-scope HeapRing / user nested scope" from "In flight" to "Landed".
There was a problem hiding this comment.
Code Review
This pull request implements a per-scope heap allocation mechanism by refactoring the single DistRing into four independent heap rings. Tasks are now assigned to a specific ring based on their nesting depth, allowing inner-scope tasks to reclaim memory independently of outer-scope tasks. The changes include exposing scope_begin, scope_end, and a Python with orch.scope() context manager to users, along with updated documentation and comprehensive tests for the new hierarchical reclamation logic. I have no feedback to provide.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Strict-1 alignment with L2 (matches
PTO2_MAX_RING_DEPTH = 4): a long-running outer-scope task no longer holds the FIFO head against inner-scope churn. Each scope-layer heap is its ownmmap(MAP_SHARED)with its ownmu/cv/last_alive; ring selection is driven by scope depthmin(scope_depth, DIST_MAX_RING_DEPTH - 1).DistOrchestrator::scope_begin/scope_endare now user-facing, plus awith orch.scope():context manager on the Python side. Resolves Q10 / audit correction 2026-04-15 in.claude/plans/HIERARCHICAL_RUNTIME_REFACTOR.md.What changed
DistRing— array of 4HeapRingstructs (per-ringheap_base/top/tail,released[],slot_heap_end[],mu/cv). Shared: onestd::deque<unique_ptr<DistTaskSlotState>>+ monotonicnext_task_id_.alloc(bytes, scope_depth)picks the ring;release(slot)readsslot.ring_idx/slot.ring_slot_idxto advance the right ring.reset_to_emptyrewinds every ring. Locking is split into two phases soring.muandslots_mu_are never nested the same direction twice — per-ring back-pressure cannot block other rings or slot-state readers.DistTaskSlotState— gainsring_idx/ring_slot_idx, stamped byDistRing::allocbefore return;reset()preserves them.DistScope— addscurrent_depth()(0-based, L2-style) used for ring selection.DistOrchestrator—scope_begin/scope_endmade public;alloc()andreserve_outputs_and_slot()now passscope_->current_depth()so allocs inside a nested scope land in the nested ring.scope_begin/scope_end(no underscore); keeps_scope_begin/_scope_endaliases for theWorker.runfacade. ExposesDIST_MAX_RING_DEPTH/DIST_MAX_SCOPE_DEPTH.Orchestrator— addsscope_begin/scope_endpluswith orch.scope():context manager.Worker(heap_ring_size=N)— is now per-ring. Total VA reservation =4 × N(default 4 GiB across 4 × 1 GiB). Physical pages stay lazy underMAP_ANONYMOUS.orchestrator.md§5 (per-scope ring) and §6 (Scope: user-facing API + ring table + non-blockingscope_endrationale); updatestask-flow.md§7 memory partitioning to describe 4 separateMAP_SHAREDmmaps; moves "per-scope HeapRing / user nested scope" from "In flight" to "Landed" inroadmap.md.Design decisions (recorded in plan Design Decision Log)
scope_endis non-blocking — matches L2;drain()is the synchronous wait.scope_begin/scope_endandwith orch.scope():context manager.heap_ring_sizeis per-ring (total VA = 4 × N).alloc(shape, dtype)inherits caller's scope depth.reset_to_emptyrewinds every ring.Test plan
pip install --no-build-isolation -e .builds cleanly in a fresh worktree venv.ctest --test-dir tests/ut/cpp/build -LE requires_hardware— 7 targets pass (including newDistRingscope tests: ring-idx mapping, distinct mmaps per ring, routing by scope_depth, independent reclamation while another ring is full, inner-ring churn while outer holds, reset rewinds all rings, slot metadata).pytest tests/ut/py/test_dist_worker/— 21 pass (including 3 new scope tests: binding exposed, 3-deep nesting completes, basic with-scope run).ci.py -p a2a3sim) — run by reviewers / CI.pytest tests/st/a2a3/tensormap_and_ringbuffer/test_l3_*.py --platform a2a3) — run by reviewers / CI.