Skip to content

Add: parallel for iteration isolation in tensormap and orchestrator#551

Draft
zhusy54 wants to merge 1 commit intohw-native-sys:mainfrom
zhusy54:parallel-for
Draft

Add: parallel for iteration isolation in tensormap and orchestrator#551
zhusy54 wants to merge 1 commit intohw-native-sys:mainfrom
zhusy54:parallel-for

Conversation

@zhusy54
Copy link
Copy Markdown
Contributor

@zhusy54 zhusy54 commented Apr 14, 2026

Summary

  • Add PTO2_PARALLEL_FOR / PTO2_PARALLEL_SCOPE macros and RAII guards that bracket each loop iteration with a scope-level dependency filter
  • Record iter_start_local_ids per ring in PTO2TensorMap so that tensor-map lookups skip entries produced in prior iterations on the same ring, preventing false cross-iteration dependencies when independent loop iterations submit tasks concurrently
  • Wire new parallel_for_begin/end and parallel_scope_begin/end ops into PTO2RuntimeOps vtable

Changes

  • pto_orchestration_api.h: new parallel_for_begin/end and parallel_scope_begin/end ops, inline wrappers, RAII guards, PTO2_PARALLEL_FOR / PTO2_PARALLEL_SCOPE macros (a2a3 + a5)
  • pto_orchestrator.h/.cpp: implement pto2_parallel_for/scope_begin/end using existing scope stack + iter_start filter bookkeeping
  • pto_tensormap.h/.cpp: add iter_start_local_ids[ring] array, initialise to -1, filter stale entries during lookup
  • pto_ring_buffer.h: expose next_local_id() for snapshot at scope entry
  • pto_runtime2.h/.cpp: wire new ops into PTO2RuntimeOps vtable

Testing

  • All tests pass
  • Code review completed

🤖 Generated with Claude Code

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces parallel-for iteration isolation for the PTO2 runtime. It adds lifecycle hooks to the runtime operations, implements RAII guards and macros for parallel scopes, and updates the PTO2TensorMap lookup logic to filter out tensor entries from previous iterations using per-ring local task IDs. These changes are applied consistently across the a2a3 and a5 runtime paths. I have no feedback to provide.

@zhusy54 zhusy54 marked this pull request as draft April 14, 2026 10:49
@zhusy54 zhusy54 marked this pull request as ready for review April 15, 2026 09:15
@zhusy54 zhusy54 marked this pull request as draft April 15, 2026 09:16
…a2a3/a5)

Introduces PTO2_PARALLEL_FOR macro and supporting orchestrator APIs
(pto2_parallel_for_begin/end, pto2_parallel_scope_begin/end) to isolate
tensormap lookups per loop iteration. Initializes iter_start_local_ids
to -1 in PTO2TensorMap::init. Updates alternating_matmul_add,
batch_paged_attention, benchmark_bgemm, and paged_attention_unroll
scene tests to use PTO2_PARALLEL_FOR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant