Skip to content

Fix: unify tensor dump control under profiling flags#567

Merged
ChaoWao merged 1 commit intohw-native-sys:mainfrom
ChaoZheng109:fix/dump-tensor
Apr 15, 2026
Merged

Fix: unify tensor dump control under profiling flags#567
ChaoWao merged 1 commit intohw-native-sys:mainfrom
ChaoZheng109:fix/dump-tensor

Conversation

@ChaoZheng109
Copy link
Copy Markdown
Collaborator

  • add enable_profiling_flag to the AICPU/AICore handshake and initialize the dump bit in onboard and sim device runners
  • replace PTO2_DUMP_TENSOR guards with PTO2_PROFILING and remove the old per-runtime dump macro definitions
  • add an AICore pipe barrier before completion when dumping tensors to preserve write visibility for dumps

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request unifies profiling and tensor dump controls by introducing a bitmask flag (enable_profiling_flag) in the Handshake structure shared between the host, AICPU, and AICore. It replaces the PTO2_DUMP_TENSOR macro with PTO2_PROFILING and implements pipe_barrier calls in AICore execution loops to ensure memory visibility when tensor dumping is enabled. Review feedback highlights the need to extend pipe_barrier usage to general profiling to prevent race conditions on weak memory model architectures and suggests that the AICPU should utilize the new handshake flag for better configuration consistency.

Comment thread src/a2a3/runtime/aicpu_build_graph/aicore/aicore_executor.cpp
Comment thread src/a5/runtime/host_build_graph/aicore/aicore_executor.cpp
Comment thread src/a2a3/runtime/aicpu_build_graph/aicpu/aicpu_executor.cpp
@ChaoZheng109 ChaoZheng109 marked this pull request as ready for review April 15, 2026 08:31
@ChaoZheng109 ChaoZheng109 force-pushed the fix/dump-tensor branch 3 times, most recently from c57e775 to 2422bb4 Compare April 15, 2026 10:31
  - add `enable_profiling_flag` to the AICPU/AICore handshake and
    initialize the dump bit in onboard and sim device runners
  - replace `PTO2_DUMP_TENSOR` guards with `PTO2_PROFILING` and remove
    the old per-runtime dump macro definitions
  - add an AICore pipe barrier before completion when dumping tensors to
    preserve write visibility for dumps
@ChaoWao ChaoWao merged commit 0745dee into hw-native-sys:main Apr 15, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants