Skip to content

Refactor: rename spmd_paged_attention_tpush and optimize pipe barriers#629

Merged
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
chenshengxin2026:refactor/spmd-paged-attention-pipe-barriers
Apr 22, 2026
Merged

Refactor: rename spmd_paged_attention_tpush and optimize pipe barriers#629
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
chenshengxin2026:refactor/spmd-paged-attention-pipe-barriers

Conversation

@chenshengxin2026
Copy link
Copy Markdown
Contributor

Summary

  • Rename test directory from spmd_paged_attention_tpush/ to spmd_paged_attention/ and update benchmark_rounds.sh references
  • Replace coarse pipe_barrier(PIPE_ALL/PIPE_MTE3) with fine-grained set_flag/wait_flag synchronization in AIC TPUSH sequences
  • Reorder TPOP after TLOAD in aic_pv_step to overlap DMA with consume
  • Remove redundant pipe_barrier(PIPE_V) calls in AIV softmax and online-update steps where data dependencies already serialize
  • Precompute is_last_partial boolean instead of recomputing per-step

Testing

  • Simulation tests pass
  • Hardware tests pass

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request optimizes the paged attention kernel by replacing coarse-grained pipe barriers with fine-grained flags and refactoring block processing logic. The reviewer noted a high-severity issue where using PIPE_FIX instead of PIPE_MTE3 after TPUSH operations in aic_qk_step and aic_pv_step could result in race conditions, as the consumer might read data before the DMA write to global memory is fully complete.

@chenshengxin2026 chenshengxin2026 force-pushed the refactor/spmd-paged-attention-pipe-barriers branch from 9eb1736 to 8db35eb Compare April 21, 2026 12:15
- Rename test directory from spmd_paged_attention_tpush/ to
  spmd_paged_attention/ and update benchmark_rounds.sh references
- Replace coarse pipe_barrier(PIPE_ALL/PIPE_MTE3) with fine-grained
  set_flag/wait_flag synchronization in AIC TPUSH sequences
- Reorder TPOP after TLOAD in aic_pv_step to overlap DMA with consume
- Remove redundant pipe_barrier(PIPE_V) calls in AIV softmax and
  online-update steps where data dependencies already serialize
- Precompute is_last_partial boolean instead of recomputing per-step
@chenshengxin2026 chenshengxin2026 force-pushed the refactor/spmd-paged-attention-pipe-barriers branch from 8db35eb to 875c958 Compare April 22, 2026 01:48
@ChaoWao ChaoWao merged commit 8aee302 into hw-native-sys:main Apr 22, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants