perf(capacity): collapse build_canonical_state's reverse scans to one pass by HUQIANTAO · Pull Request #2633 · Hmbown/CodeWhale

HUQIANTAO · 2026-06-03T10:46:49Z

Summary

build_canonical_state previously did two independent reverse walks of session.messages — one to extract the most recent user goal, and one to collect up to four confirmed-fact snippets. apply_verify_and_replan then added a third and fourth reverse scan to locate the latest user message and the latest [verification replay] user message for the re-plan path.

All four reverse scans collect disjoint facts about the same most-recent-first view of the conversation. This PR folds them into a single helper, scan_canonical_inputs, that walks messages once in reverse, fills a CanonicalStateScan, and short-circuits as soon as every collector is satisfied. The helper exposes the latest-message indices so apply_verify_and_replan can clone the full Message values after the scan (eliminating the two independent find().cloned() walks).

Why now

The four scans ran on every capacity intervention. Long sessions (200+ messages) pay non-trivial CPU each time the controller decides to refresh, replan, or replay — exactly when the user is already waiting for an LLM call to complete. Collapsing to one walk also turns the early-exit into a real optimization: once the goal, the verified marker, and 4 facts are found, the scan stops; the rest of the conversation is irrelevant.

Output parity

The output CanonicalState is byte-identical to the prior implementation: same goal, same confirmed facts (newest first, errors filtered), same fallback string when no user text exists. The re-plan path's keep-messages set is identical: latest user + latest verified.

Changes

File	Change
`crates/tui/src/core/engine/capacity_flow.rs`	New `scan_canonical_inputs` helper and `CanonicalStateScan` struct. `build_canonical_state` now consumes the scan. `apply_verify_and_replan` uses the scan indices to look up the full messages. 6 new unit tests.

Tests

running 6 tests
test core::engine::capacity_flow::canonical_scan_tests::scan_handles_empty_input ... ok
test core::engine::capacity_flow::canonical_scan_tests::scan_returns_goal_for_latest_user_text ... ok
test core::engine::capacity_flow::canonical_scan_tests::scan_collects_up_to_four_facts_newest_first ... ok
test core::engine::capacity_flow::canonical_scan_tests::scan_skips_error_results ... ok
test core::engine::capacity_flow::canonical_scan_tests::scan_finds_latest_verified_user_message ... ok
test core::engine::capacity_flow::canonical_scan_tests::scan_early_exits_when_complete ... ok

test result: ok. 6 passed; 0 failed

The full engine test suite (153 tests) still passes.

greptile-apps

HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

github-actions · 2026-06-03T10:46:59Z

Thanks @HUQIANTAO for taking the time to contribute.

This repository is currently observing a maintainer-managed contribution gate in dry-run mode, so this pull request is staying open. When enforcement is enabled, pull requests from contributors who are not listed in .github/APPROVED_CONTRIBUTORS will be closed automatically.

Please read CONTRIBUTING.md for the expected contribution shape. A maintainer can grant PR access by commenting /lgtm on a pull request.

gemini-code-assist

Code Review

This pull request consolidates multiple reverse iterations over session messages into a single-pass helper function, scan_canonical_inputs, to improve efficiency. The reviewer identified a performance issue where the early-exit optimization is bypassed in standard runs because the helper always expects to find a verified user message. To address this, the reviewer suggested introducing a find_verified boolean parameter to conditionally skip searching for the verified message, allowing the scan to exit early once the goal and facts are collected, and provided corresponding updates for the caller sites and unit tests.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-03T10:48:34Z

+        // The replan path needs the *full* messages, not summaries.
+        // `scan_canonical_inputs` already located the indices in a single
+        // reverse pass; clone from the live `messages` slice once.
+        let scan = scan_canonical_inputs(&self.session.messages);


Pass true to scan_canonical_inputs here to indicate that we need to locate the latest verified user message index for the replan path.

Suggested change

let scan = scan_canonical_inputs(&self.session.messages);

let scan = scan_canonical_inputs(&self.session.messages, true);

gemini-code-assist · 2026-06-03T10:48:35Z

+        // `.iter().rev().find_map()` walks and a third for facts; the
+        // dedicated scan below replaces all three with one pass that
+        // also early-exits once every collector is satisfied.
+        let scan = scan_canonical_inputs(&self.session.messages);


Pass false to scan_canonical_inputs here since build_canonical_state does not use latest_verified_user_idx. This allows the scan to early-exit immediately once the goal and 4 facts are found, avoiding scanning the rest of the history.

Suggested change

let scan = scan_canonical_inputs(&self.session.messages);

let scan = scan_canonical_inputs(&self.session.messages, false);

gemini-code-assist · 2026-06-03T10:48:35Z

+impl CanonicalStateScan {
+    /// `true` once every collector is satisfied. The single-pass
+    /// caller can use this to break out of the reverse iteration.
+    fn is_complete(&self) -> bool {
+        self.goal.is_some()
+            && self.latest_verified_user_idx.is_some()
+            && self.facts_collected >= CANONICAL_SCAN_MAX_FACTS
+    }
+}
+
+/// Walk `messages` once (in reverse) and collect everything the canonical
+/// state and re-plan paths need. Replaces the previous pattern of three
+/// independent reverse scans: one for the goal, one for confirmed facts,
+/// and one for the latest verified user message.
+fn scan_canonical_inputs(messages: &[Message]) -> CanonicalStateScan {
+    let mut scan = CanonicalStateScan::default();
+    for (idx, msg) in messages.iter().enumerate().rev() {
+        if msg.role == "user" {
+            if scan.goal.is_none() {
+                if let Some(text) = msg.content.iter().find_map(|b| match b {
+                    ContentBlock::Text { text, .. } => Some(text.as_str()),
+                    _ => None,
+                }) {
+                    scan.goal = Some(summarize_text(text, 220));
+                    scan.latest_user_text_idx = Some(idx);
+                }
+            }
+            if scan.latest_verified_user_idx.is_none() {
+                let verified = msg.content.iter().any(|b| match b {
+                    ContentBlock::ToolResult { content, .. } => {
+                        content.contains("[verification replay]")
+                    }
+                    _ => false,
+                });
+                if verified {
+                    scan.latest_verified_user_idx = Some(idx);
+                }
+            }
+        }
+        if scan.facts_collected < CANONICAL_SCAN_MAX_FACTS {
+            for block in &msg.content {
+                if let ContentBlock::ToolResult { content, .. } = block
+                    && !content.starts_with("Error:")
+                {
+                    scan.confirmed_facts.push(summarize_text(content, 180));
+                    scan.facts_collected = scan.facts_collected.saturating_add(1);
+                    if scan.facts_collected >= CANONICAL_SCAN_MAX_FACTS {
+                        break;
+                    }
+                }
+            }
+        }
+        if scan.is_complete() {
+            break;
+        }
+    }
+    scan
+}


Performance Issue: Early-Exit Optimization Bypassed in Common Cases

In the current implementation, is_complete() requires latest_verified_user_idx.is_some() to return true:

fn is_complete(&self) -> bool { self.goal.is_some() && self.latest_verified_user_idx.is_some() && self.facts_collected >= CANONICAL_SCAN_MAX_FACTS }

However, in the vast majority of normal runs, there is no verified user message in the session history (since verification replays only happen under specific high-risk conditions).

This leads to three major issues:

Bypassed Optimization in build_canonical_state: build_canonical_state does not use latest_verified_user_idx at all, yet it calls scan_canonical_inputs which is forced to scan the entire message history because is_complete() will never return true.

Bypassed Optimization in apply_verify_and_replan: Even when we do want to find the verified message, if none exists, the loop will scan the entire history instead of early-exiting once the goal and 4 facts are found.

Broken Test: The test scan_early_exits_when_complete actually does not early-exit in the current PR code because there is no verified message in the test input, meaning it scans all 1000 messages.

Solution

Introduce a find_verified: bool parameter to scan_canonical_inputs and is_complete. When find_verified is false (such as in build_canonical_state), we can skip searching for the verified message and allow the loop to early-exit immediately once the goal and 4 facts are collected.

impl CanonicalStateScan { /// `true` once every collector is satisfied. The single-pass /// caller can use this to break out of the reverse iteration. fn is_complete(&self, find_verified: bool) -> bool { self.goal.is_some() && (!find_verified || self.latest_verified_user_idx.is_some()) && self.facts_collected >= CANONICAL_SCAN_MAX_FACTS } } /// Walk `messages` once (in reverse) and collect everything the canonical /// state and re-plan paths need. Replaces the previous pattern of three /// independent reverse scans: one for the goal, one for confirmed facts, /// and one for the latest verified user message. fn scan_canonical_inputs(messages: &[Message], find_verified: bool) -> CanonicalStateScan { let mut scan = CanonicalStateScan::default(); for (idx, msg) in messages.iter().enumerate().rev() { if msg.role == "user" { if scan.goal.is_none() { if let Some(text) = msg.content.iter().find_map(|b| match b { ContentBlock::Text { text, .. } => Some(text.as_str()), _ => None, }) { scan.goal = Some(summarize_text(text, 220)); scan.latest_user_text_idx = Some(idx); } } if find_verified && scan.latest_verified_user_idx.is_none() { let verified = msg.content.iter().any(|b| match b { ContentBlock::ToolResult { content, .. } => { content.contains("[verification replay]") } _ => false, }); if verified { scan.latest_verified_user_idx = Some(idx); } } } if scan.facts_collected < CANONICAL_SCAN_MAX_FACTS { for block in &msg.content { if let ContentBlock::ToolResult { content, .. } = block && !content.starts_with("Error:") { scan.confirmed_facts.push(summarize_text(content, 180)); scan.facts_collected = scan.facts_collected.saturating_add(1); if scan.facts_collected >= CANONICAL_SCAN_MAX_FACTS { break; } } } } if scan.is_complete(find_verified) { break; } } scan }

gemini-code-assist · 2026-06-03T10:48:35Z

+            tool_result_msg("b"),
+            user_text_msg("third"),
+        ];
+        let scan = scan_canonical_inputs(&messages);


Update the test call to pass false for find_verified.

Suggested change

let scan = scan_canonical_inputs(&messages);

let scan = scan_canonical_inputs(&messages, false);

gemini-code-assist · 2026-06-03T10:48:35Z

+            tool_result_msg("fact-D"),
+            tool_result_msg("fact-E"),
+        ];
+        let scan = scan_canonical_inputs(&messages);


Update the test call to pass false for find_verified.

Suggested change

let scan = scan_canonical_inputs(&messages);

let scan = scan_canonical_inputs(&messages, false);

gemini-code-assist · 2026-06-03T10:48:35Z

+            tool_result_msg("Error: bad"),
+            tool_result_msg("good-B"),
+        ];
+        let scan = scan_canonical_inputs(&messages);


Update the test call to pass false for find_verified.

Suggested change

let scan = scan_canonical_inputs(&messages);

let scan = scan_canonical_inputs(&messages, false);

gemini-code-assist · 2026-06-03T10:48:35Z

+            user_with_verified_replay("verified"),
+            user_text_msg("third"),
+        ];
+        let scan = scan_canonical_inputs(&messages);


Update the test call to pass true for find_verified since this test specifically verifies locating the latest verified user message.

Suggested change

let scan = scan_canonical_inputs(&messages);

let scan = scan_canonical_inputs(&messages, true);

gemini-code-assist · 2026-06-03T10:48:35Z

+
+    #[test]
+    fn scan_handles_empty_input() {
+        let scan = scan_canonical_inputs(&[]);


Update the test call to pass false for find_verified.

Suggested change

let scan = scan_canonical_inputs(&[]);

let scan = scan_canonical_inputs(&[], false);

gemini-code-assist · 2026-06-03T10:48:35Z

+            .collect();
+        // Most recent user message comes last.
+        messages.push(user_text_msg("goal"));
+        let scan = scan_canonical_inputs(&messages);


Update the test call to pass false for find_verified. This ensures the early-exit optimization is actually triggered and tested (previously, it scanned all 1000 messages because no verified message was present).

Suggested change

let scan = scan_canonical_inputs(&messages);

let scan = scan_canonical_inputs(&messages, false);

Hmbown · 2026-06-03T10:57:20Z

Thanks @HUQIANTAO. I checked this during the v0.8.52 release freeze. The platform tests are green, but lint is failing on cargo fmt --all -- --check in capacity_flow.rs. Since this is a perf cleanup rather than a release-fix regression, I am keeping it out of 0.8.52 and will treat it as next-pass review once formatting is clean.

… pass build_canonical_state previously did two independent reverse walks of session.messages — one to extract the most recent user goal, and one to collect up to four confirmed-fact snippets. apply_verify_and_replan then added a third and fourth reverse scan to locate the latest user message and the latest [verification replay] user message for the re-plan path. All four reverse scans collect disjoint facts about the same most- recent-first view of the conversation. This PR folds them into a single helper, scan_canonical_inputs, that walks messages once in reverse, fills a CanonicalStateScan, and short-circuits as soon as every collector is satisfied. The helper exposes the latest-message indices so apply_verify_and_replan can clone the full Message values after the scan (eliminating the two independent find().cloned() walks). The output CanonicalState is byte-identical to the prior implementation: same goal, same confirmed facts (newest first, errors filtered), same fallback string when no user text exists. The re-plan path's keep-messages set is identical: latest user + latest verified. Tests: 6 new unit tests cover the goal lookup, fact cap, error-result filter, verified-marker scan, empty input, and the early-exit condition. The full engine test suite (153 tests) still passes.

…-user lookup The build_canonical_state path never reads CanonicalStateScan::latest_verified_user_idx, but the previous patch required is_complete() to find a verified user message before it would short-circuit. On a long history with no verification replay — the common case — the scan walked the entire message list looking for a match that could not exist. Add a find_verified: bool parameter to scan_canonical_inputs and CanonicalStateScan::is_complete. build_canonical_state now passes false, so the loop stops as soon as the goal and CANONICAL_SCAN_MAX_FACTS facts are found. The replan path (apply_verify_and_replan) keeps the existing true behavior so it still locates the latest verified user message. Test calls are updated to match; no behavior change for any test.

greptile-apps

HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

greptile-apps Bot reviewed Jun 3, 2026

View reviewed changes

gemini-code-assist Bot reviewed Jun 3, 2026

View reviewed changes

HUQIANTAO added 2 commits June 3, 2026 19:54

HUQIANTAO force-pushed the perf/canonical-single-pass branch from 281b9d7 to a2474f7 Compare June 3, 2026 12:00

greptile-apps Bot reviewed Jun 3, 2026

View reviewed changes

	let scan = scan_canonical_inputs(&self.session.messages);
	let scan = scan_canonical_inputs(&self.session.messages, true);

	let scan = scan_canonical_inputs(&messages);
	let scan = scan_canonical_inputs(&messages, false);

	let scan = scan_canonical_inputs(&[]);
	let scan = scan_canonical_inputs(&[], false);

Conversation

HUQIANTAO commented Jun 3, 2026

Summary

Why now

Output parity

Changes

Tests

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Choose a reason for hiding this comment

Performance Issue: Early-Exit Optimization Bypassed in Common Cases

Solution

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Hmbown commented Jun 3, 2026

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants