Skip to content

[Performance] Runtime performance optimization tracking #545

@zhusy54

Description

@zhusy54

Overview

This issue tracks ongoing runtime performance optimization work for the tensormap_and_ringbuffer runtime on the a2a3 (Ascend 910B/C) platform. Each subtask below represents an independent optimization point.

Platform

All / Unknown

Runtime Variant

tensormap_and_ringbuffer

Git Commit ID

6644bc7

CANN Version

8.5.0.alpha001

Host Platform

Linux (aarch64)


Optimization Tasks

Subtask 1: Parallel for dependence optimization

Optimize parallel_for loops by analyzing data dependences to enable more aggressive parallelism.

Currently, parallel_for constructs may be overly conservative in their dependence assumptions, preventing loop iterations from running in parallel when they could safely do so. By introducing dependence analysis, we can identify loops with no loop-carried dependences and schedule them with full parallelism.

Status: Open


Subtask 2: Dual slot scheduling for mix subgraph tasks

Support dual slot scheduling for mix subgraphs (subgraphs that contain both AIC and AIV tasks).

Currently, mix subgraph tasks are scheduled conservatively with a single slot, serializing AIC and AIV work even when they could be dispatched concurrently into two hardware slots. Enabling dual slot scheduling for mix subgraphs would allow AIC and AIV kernels to overlap in execution, reducing end-to-end latency.

Status: ✅ Done

Related PRs:


Reproduction

python examples/scripts/run_example.py \
    -k tests/st/a2a3/tensormap_and_ringbuffer/paged_attention/kernels \
    -g tests/st/a2a3/tensormap_and_ringbuffer/paged_attention/golden.py \
    -p a2a3 -d 5 -n 10

Expected Performance

Each subtask is expected to reduce end-to-end latency. Specific numbers TBD after profiling each optimization.

Actual Performance

Current baseline (before optimizations). No regression — these are proactive optimization opportunities.

Metadata

Metadata

Assignees

Labels

performancePerformance regression or optimization

Type

No type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions