Add: parallel for iteration isolation in tensormap and orchestrator#551
Draft
zhusy54 wants to merge 1 commit intohw-native-sys:mainfrom
Draft
Add: parallel for iteration isolation in tensormap and orchestrator#551zhusy54 wants to merge 1 commit intohw-native-sys:mainfrom
zhusy54 wants to merge 1 commit intohw-native-sys:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces parallel-for iteration isolation for the PTO2 runtime. It adds lifecycle hooks to the runtime operations, implements RAII guards and macros for parallel scopes, and updates the PTO2TensorMap lookup logic to filter out tensor entries from previous iterations using per-ring local task IDs. These changes are applied consistently across the a2a3 and a5 runtime paths. I have no feedback to provide.
…a2a3/a5) Introduces PTO2_PARALLEL_FOR macro and supporting orchestrator APIs (pto2_parallel_for_begin/end, pto2_parallel_scope_begin/end) to isolate tensormap lookups per loop iteration. Initializes iter_start_local_ids to -1 in PTO2TensorMap::init. Updates alternating_matmul_add, batch_paged_attention, benchmark_bgemm, and paged_attention_unroll scene tests to use PTO2_PARALLEL_FOR.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PTO2_PARALLEL_FOR/PTO2_PARALLEL_SCOPEmacros and RAII guards that bracket each loop iteration with a scope-level dependency filteriter_start_local_idsper ring inPTO2TensorMapso that tensor-map lookups skip entries produced in prior iterations on the same ring, preventing false cross-iteration dependencies when independent loop iterations submit tasks concurrentlyparallel_for_begin/endandparallel_scope_begin/endops intoPTO2RuntimeOpsvtableChanges
pto_orchestration_api.h: newparallel_for_begin/endandparallel_scope_begin/endops, inline wrappers, RAII guards,PTO2_PARALLEL_FOR/PTO2_PARALLEL_SCOPEmacros (a2a3 + a5)pto_orchestrator.h/.cpp: implementpto2_parallel_for/scope_begin/endusing existing scope stack + iter_start filter bookkeepingpto_tensormap.h/.cpp: additer_start_local_ids[ring]array, initialise to -1, filter stale entries during lookuppto_ring_buffer.h: exposenext_local_id()for snapshot at scope entrypto_runtime2.h/.cpp: wire new ops intoPTO2RuntimeOpsvtableTesting
🤖 Generated with Claude Code