Feature: AICPU scheduler phase profiling and orchestrator summary by ChaoWao · Pull Request #150 · hw-native-sys/simpler

ChaoWao · 2026-03-01T13:02:59Z

Summary

Add phase profiling data structures (AicpuPhaseRecord, AicpuOrchSummary, AicpuPhaseHeader) appended after DoubleBuffer array in shared memory
Implement AICPU-side recording API with cached pointers for hot-path efficiency
Instrument 4 scheduler phases (COMPLETE, DISPATCH, SCAN, EARLY_READY) and orchestrator cumulative summary in aicpu_executor
Host-side collection (collect_phase_data) and version 2 JSON export with phase_us (microseconds)
Perfetto visualization: pid=3 scheduler phase bars, pid=4 orchestrator with phase sub-events

Testing

Simulation tests pass (10/10)
Hardware tests pass (if applicable)

gemini-code-assist · 2026-03-01T13:03:19Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the performance profiling capabilities for the AICPU scheduler and orchestrator. It introduces new shared memory data structures and APIs to capture fine-grained timing information for critical scheduler phases and the orchestrator's internal operations. The collected data is then integrated into the existing host-side performance collection and export mechanism, enabling detailed visualization in Perfetto. This allows developers to gain deeper insights into the performance bottlenecks and behavior of the AICPU scheduling process.

Highlights

AICPU Phase Profiling Data Structures: Added new data structures (AicpuPhaseRecord, AicpuOrchSummary, AicpuPhaseHeader) to shared memory for detailed AICPU scheduler phase and orchestrator profiling.
AICPU-side Recording APIs and Instrumentation: Implemented AICPU-side APIs to record scheduler phases (COMPLETE, DISPATCH, SCAN, EARLY_READY) and orchestrator cumulative summary, with cached pointers for hot-path efficiency.
Host-side Data Collection and JSON Export: Introduced host-side logic to collect the new phase profiling data and updated the swimlane JSON export to version 2, including phase_us (microseconds) for visualization.
Perfetto Visualization Support: Enhanced the swimlane_converter.py script to visualize AICPU scheduler phases (pid=3) and orchestrator activity (pid=4 with phase sub-events) in Perfetto.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

src/platform/a2a3/host/device_runner.cpp
- Invoked perf_collector_.collect_phase_data() when profiling is enabled.
src/platform/a2a3sim/host/device_runner.cpp
- Invoked perf_collector_.collect_phase_data() when profiling is enabled in the simulator.
src/platform/include/aicpu/performance_collector_aicpu.h
- Added perf_aicpu_init_phase_profiling function to initialize AICPU phase profiling.
- Introduced perf_aicpu_record_phase function to record individual scheduler phases.
- Added perf_aicpu_write_orch_summary function to write orchestrator cumulative summary data.
src/platform/include/common/perf_profiling.h
- Updated memory layout documentation to include the optional phase profiling region.
- Defined AicpuPhaseId enum for scheduler phase identification.
- Introduced AicpuPhaseRecord struct for single scheduler phase records.
- Added AicpuOrchSummary struct for orchestrator cumulative profiling data.
- Defined AicpuPhaseHeader struct for phase profiling metadata.
- Provided helper functions (calc_perf_data_size_with_phases, get_phase_header, get_phase_records) for accessing phase profiling data in shared memory.
src/platform/include/host/performance_collector.h
- Declared collect_phase_data method to gather AICPU phase profiling data.
- Added collected_phase_records_, collected_orch_summary_, and has_phase_data_ members to store collected phase profiling data.
src/platform/src/aicpu/performance_collector_aicpu.cpp
- Implemented perf_aicpu_init_phase_profiling to set up phase header and clear record buffers.
- Implemented perf_aicpu_record_phase to append phase records to thread-specific buffers.
- Implemented perf_aicpu_write_orch_summary to store orchestrator summary in shared memory.
- Introduced static cached pointers (s_phase_header, s_phase_records) for efficient hot-path access.
src/platform/src/host/performance_collector.cpp
- Updated initialize to calculate total shared memory size including the new phase profiling region.
- Implemented collect_phase_data to read and validate phase records and orchestrator summary from shared memory.
- Modified export_swimlane_json to include phase record timestamps in base time calculation and export version 2 JSON with aicpu_scheduler_phases and aicpu_orchestrator data.
- Cleared collected phase data in the finalize method.
src/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp
- Called perf_aicpu_init_phase_profiling during one-time initialization for scheduler threads.
- Added local counters (phase_complete_count, phase_dispatch_count, phase_scan_count, phase_early_ready_count) for tasks processed in each phase.
- Instrumented scheduler loop phases (COMPLETE, DISPATCH, SCAN, EARLY_READY) with perf_aicpu_record_phase calls.
- Wrote orchestrator cumulative summary to shared memory using perf_aicpu_write_orch_summary.
tools/swimlane_converter.py
- Updated read_perf_data to support version 2 of the JSON format.
- Modified generate_chrome_trace_json to accept scheduler_phases and orchestrator_data.
- Added logic to generate Perfetto trace events for AICPU scheduler phases (pid=3) with distinct colors.
- Added logic to generate Perfetto trace events for AICPU orchestrator summary (pid=4) including sub-events for different internal phases.
- Updated main function to extract and pass version 2 profiling data to the trace generation.

Activity

Simulation tests have passed (10/10).
Hardware tests are marked as pending or not applicable.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces detailed phase profiling for the AICPU scheduler and a cumulative summary for the orchestrator. A critical security vulnerability was identified where the host-side collection logic trusts num_sched_threads from shared memory for loop bounds and array indexing without validation, potentially leading to an out-of-bounds read. It is essential to implement bounds checking for all data retrieved from shared memory. Additionally, there are a few suggestions to enhance code conciseness and idiomatic C++ usage.

…nd dependency arrows - Add scheduler phase profiling: record COMPLETE/DISPATCH/SCAN/EARLY_READY phases per loop iteration with per-thread buffers in shared memory - Add per-task orchestrator phase recording (sync/alloc/params/lookup/heap/ insert/fanin/finalize/scope_end) using AicpuPhaseRecord with dedicated buffer slot, exported as aicpu_orchestrator_phases JSON array - Write cumulative AicpuOrchSummary to shared memory for backward compat - Host-side collection reads both scheduler and orchestrator phase records, exports version 2 JSON with scheduler phases, orchestrator summary, and per-task orchestrator phases - Swimlane converter renders scheduler phases as color-coded bars on pid=3, per-task orchestrator phases on pid=4 with per-phase colors - Add AICPU View (pid=2) fanout dependency arrows mirroring AICore View - Add Scheduler DISPATCH to AICore/AICPU task execution flow arrows - Set process sort order: Orchestrator, Scheduler, AICPU View, AICore View

ChaoWao · 2026-03-02T01:43:16Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces detailed phase profiling for the AICPU scheduler and a summary for the orchestrator. The changes include new data structures for profiling, instrumentation in the AICPU executor and orchestrator, and updates to the host-side collector and visualization script to support the new data.

My review focuses on C++ best practices and code maintainability. I've suggested replacing a memset call with modern C++ value initialization, which is safer and also resolves a compiler warning that was being suppressed. I've also recommended a small refactoring to reduce code duplication when calculating the base timestamp for profiling data.

Overall, the changes are well-structured and add valuable profiling capabilities.

gemini-code-assist · 2026-03-02T01:45:41Z

+        s_phase_header->buffer_counts[i] = 0;
+    }
+
+    memset(&s_phase_header->orch_summary, 0, sizeof(AicpuOrchSummary));


Using memset to zero-out a C++ struct can be problematic if the struct is not a POD (Plain Old Data) type and can trigger compiler warnings like -Wclass-memaccess. A safer and more idiomatic C++ way to zero-initialize the struct is to use value initialization. This change will also allow you to remove the -Wno-error=class-memaccess flag from src/platform/a2a3sim/aicpu/CMakeLists.txt.

Suggested change

memset(&s_phase_header->orch_summary, 0, sizeof(AicpuOrchSummary));

s_phase_header->orch_summary = {};

gemini-code-assist · 2026-03-02T01:45:41Z

+    if (has_phase_data_) {
+        for (const auto& thread_records : collected_phase_records_) {
+            for (const auto& pr : thread_records) {
+                if (pr.start_time > 0 && pr.start_time < base_time_cycles) {
+                    base_time_cycles = pr.start_time;
+                }
+            }
+        }
+        for (const auto& pr : collected_orch_phase_records_) {
+            if (pr.start_time > 0 && pr.start_time < base_time_cycles) {
+                base_time_cycles = pr.start_time;
+            }
+        }
+        if (collected_orch_summary_.magic == AICPU_PHASE_MAGIC &&
+            collected_orch_summary_.start_time > 0 &&
+            collected_orch_summary_.start_time < base_time_cycles) {
+            base_time_cycles = collected_orch_summary_.start_time;
+        }
+    }


The logic to find the minimum base_time_cycles is repeated across several loops. To improve readability and maintainability, you could extract this logic into a small helper function or a lambda.

Suggested change

if (has_phase_data_) {

for (const auto& thread_records : collected_phase_records_) {

for (const auto& pr : thread_records) {

if (pr.start_time > 0 && pr.start_time < base_time_cycles) {

base_time_cycles = pr.start_time;

}

}

}

for (const auto& pr : collected_orch_phase_records_) {

if (pr.start_time > 0 && pr.start_time < base_time_cycles) {

base_time_cycles = pr.start_time;

}

}

if (collected_orch_summary_.magic == AICPU_PHASE_MAGIC &&

collected_orch_summary_.start_time > 0 &&

collected_orch_summary_.start_time < base_time_cycles) {

base_time_cycles = collected_orch_summary_.start_time;

}

}

if (has_phase_data_) {

auto update_base_time = [&](uint64_t new_time) {

if (new_time > 0 && new_time < base_time_cycles) {

base_time_cycles = new_time;

}

};

for (const auto& thread_records : collected_phase_records_) {

for (const auto& pr : thread_records) {

update_base_time(pr.start_time);

}

}

for (const auto& pr : collected_orch_phase_records_) {

update_base_time(pr.start_time);

}

if (collected_orch_summary_.magic == AICPU_PHASE_MAGIC) {

update_base_time(collected_orch_summary_.start_time);

}

}

…nd dependency arrows (hw-native-sys#150) - Add scheduler phase profiling: record COMPLETE/DISPATCH/SCAN/EARLY_READY phases per loop iteration with per-thread buffers in shared memory - Add per-task orchestrator phase recording (sync/alloc/params/lookup/heap/ insert/fanin/finalize/scope_end) using AicpuPhaseRecord with dedicated buffer slot, exported as aicpu_orchestrator_phases JSON array - Write cumulative AicpuOrchSummary to shared memory for backward compat - Host-side collection reads both scheduler and orchestrator phase records, exports version 2 JSON with scheduler phases, orchestrator summary, and per-task orchestrator phases - Swimlane converter renders scheduler phases as color-coded bars on pid=3, per-task orchestrator phases on pid=4 with per-phase colors - Add AICPU View (pid=2) fanout dependency arrows mirroring AICore View - Add Scheduler DISPATCH to AICore/AICPU task execution flow arrows - Set process sort order: Orchestrator, Scheduler, AICPU View, AICore View

gemini-code-assist Bot reviewed Mar 1, 2026

View reviewed changes

Comment thread src/platform/src/host/performance_collector.cpp

Comment thread src/platform/src/aicpu/performance_collector_aicpu.cpp Outdated

Comment thread src/platform/src/host/performance_collector.cpp Outdated

ChaoWao force-pushed the profiling-refactor-and-phase branch 3 times, most recently from 8be37ca to 9cb82e0 Compare March 1, 2026 13:13

ChaoWao force-pushed the profiling-refactor-and-phase branch from 9cb82e0 to a7b6126 Compare March 2, 2026 01:41

gemini-code-assist Bot reviewed Mar 2, 2026

View reviewed changes

ChaoWao merged commit 7df48e9 into main Mar 2, 2026
3 checks passed

ChaoWao deleted the profiling-refactor-and-phase branch March 2, 2026 01:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: AICPU scheduler phase profiling and orchestrator summary#150

Feature: AICPU scheduler phase profiling and orchestrator summary#150
ChaoWao merged 1 commit into
mainfrom
profiling-refactor-and-phase

ChaoWao commented Mar 1, 2026

Uh oh!

gemini-code-assist Bot commented Mar 1, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ChaoWao commented Mar 2, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Mar 2, 2026

Uh oh!

gemini-code-assist Bot Mar 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	memset(&s_phase_header->orch_summary, 0, sizeof(AicpuOrchSummary));
	s_phase_header->orch_summary = {};

Conversation

ChaoWao commented Mar 1, 2026

Summary

Testing

Uh oh!

gemini-code-assist Bot commented Mar 1, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ChaoWao commented Mar 2, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant