Skip to content

Refactor: introduce tiered profiling levels for a2a3 tensormap_and_ringbuffer swimlane export#500

Open
indigo1973 wants to merge 1 commit intohw-native-sys:mainfrom
indigo1973:prof_0409
Open

Refactor: introduce tiered profiling levels for a2a3 tensormap_and_ringbuffer swimlane export#500
indigo1973 wants to merge 1 commit intohw-native-sys:mainfrom
indigo1973:prof_0409

Conversation

@indigo1973
Copy link
Copy Markdown
Contributor

@indigo1973 indigo1973 commented Apr 9, 2026

Replace the boolean enable_profiling flag with an integer perf_level

throughout the profiling pipeline (CLI → Python bindings → C++ runtime →

AICPU executor → PerformanceCollector → JSON export).

Profiling levels:

0 = off

1 = AICore task start/end timing only (JSON version 0)

2 = + dispatch timestamps, finish timestamps, fanout edges (JSON version 1)

3 = + AICPU scheduler/orchestrator phase buffers (JSON version 2)

Key changes:

ChipCallConfig: bool enable_profiling → int perf_level

CLI --enable-profiling: store_true → optional int (bare flag defaults to 3)

nanobind property: backward-compatible bool→3 coercion for legacy callers

AICPU executor: split into task_recording_enabled (>0) vs

phase_recording_enabled (>=3) to skip phase overhead at lower levels

PerformanceCollector: skip phase buffer allocation when perf_level < 3;

version selection based on perf_level and presence of phase data

swimlane_converter.py: accept version 0, tolerate missing fanout field

Fix scene_test.py: val and cond truncated int to bool; use ternary

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a more granular performance profiling system by replacing the boolean 'enable_profiling' flag with an integer 'perf_level' across the codebase. This change allows for multiple profiling modes (0=off, 1=AICore-only, 2=task+fanout, 3=full) and includes updates to the runtime, device runner, and performance collector to handle these levels. Additionally, it updates the swimlane JSON export logic to support versioning based on the profiling level and improves robustness in the swimlane converter by handling optional fanout data. I have no feedback to provide as there are no review comments to evaluate.

@indigo1973 indigo1973 removed this from pto project Apr 10, 2026
@indigo1973 indigo1973 force-pushed the prof_0409 branch 2 times, most recently from 0b39cb0 to 5b10ea4 Compare April 15, 2026 07:02
…ngbuffer swimlane export

Replace the boolean enable_profiling flag with an integer perf_level
throughout the profiling pipeline (CLI → Python bindings → C++ runtime →
AICPU executor → PerformanceCollector → JSON export).

Profiling levels:
  0 = off
  1 = AICore task start/end timing only (JSON version 0)
  2 = + dispatch timestamps, finish timestamps, fanout edges (JSON version 1)
  3 = + AICPU scheduler/orchestrator phase buffers (JSON version 2)

Key changes:
- ChipCallConfig: bool enable_profiling → int perf_level
- CLI --enable-profiling: store_true → optional int (bare flag defaults to 3)
- nanobind property: backward-compatible bool→3 coercion for legacy callers
- AICPU executor: split into task_recording_enabled (>0) vs
  phase_recording_enabled (>=3) to skip phase overhead at lower levels
- PerformanceCollector: skip phase buffer allocation when perf_level < 3;
  version selection based on perf_level and presence of phase data
- swimlane_converter.py: accept version 0, tolerate missing fanout field
- Fix scene_test.py: `val and cond` truncated int to bool; use ternary
@indigo1973
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request replaces the boolean profiling flag with a multi-level integer perf_level (0-3) to provide granular control over performance data collection, ranging from AICore-only to full profiling with phase data. The changes span Python CLI tools, C++ runtime structures, and shared memory allocation logic in the PerformanceCollector. Review feedback highlights opportunities to simplify redundant logical checks in the performance-critical scheduler path where perf_level is validated alongside derived boolean flags.

Comment on lines +857 to 859
if (task_recording_enabled && runtime->perf_level >= 2) {
core_exec_state.pending_dispatch_timestamp = get_sys_cnt_aicpu();
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The condition task_recording_enabled && runtime->perf_level >= 2 is redundant. Since task_recording_enabled is true when runtime->perf_level > 0, this check is equivalent to just runtime->perf_level >= 2. Simplifying this avoids redundant checks on a performance-critical path.

            if (runtime->perf_level >= 2) {
                core_exec_state.pending_dispatch_timestamp = get_sys_cnt_aicpu();
            }
References
  1. On performance-critical paths, avoid redundant validation checks for variables that are guaranteed to be valid by the logic at their point of entry.

Comment on lines +866 to 868
if (task_recording_enabled && runtime->perf_level >= 2) {
core_exec_state.running_dispatch_timestamp = get_sys_cnt_aicpu();
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the previous check, the condition task_recording_enabled && runtime->perf_level >= 2 is redundant and can be simplified to runtime->perf_level >= 2 to improve performance and readability on this hot path.

            if (runtime->perf_level >= 2) {
                core_exec_state.running_dispatch_timestamp = get_sys_cnt_aicpu();
            }
References
  1. Avoid redundant data validation checks on performance-critical paths if the data has already been validated or is logically guaranteed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant