perf(capacity): collapse build_canonical_state's reverse scans to one pass#2633
perf(capacity): collapse build_canonical_state's reverse scans to one pass#2633HUQIANTAO wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
|
Thanks @HUQIANTAO for taking the time to contribute. This repository is currently observing a maintainer-managed contribution gate in dry-run mode, so this pull request is staying open. When enforcement is enabled, pull requests from contributors who are not listed in Please read |
There was a problem hiding this comment.
Code Review
This pull request consolidates multiple reverse iterations over session messages into a single-pass helper function, scan_canonical_inputs, to improve efficiency. The reviewer identified a performance issue where the early-exit optimization is bypassed in standard runs because the helper always expects to find a verified user message. To address this, the reviewer suggested introducing a find_verified boolean parameter to conditionally skip searching for the verified message, allowing the scan to exit early once the goal and facts are collected, and provided corresponding updates for the caller sites and unit tests.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| // The replan path needs the *full* messages, not summaries. | ||
| // `scan_canonical_inputs` already located the indices in a single | ||
| // reverse pass; clone from the live `messages` slice once. | ||
| let scan = scan_canonical_inputs(&self.session.messages); |
There was a problem hiding this comment.
| // `.iter().rev().find_map()` walks and a third for facts; the | ||
| // dedicated scan below replaces all three with one pass that | ||
| // also early-exits once every collector is satisfied. | ||
| let scan = scan_canonical_inputs(&self.session.messages); |
There was a problem hiding this comment.
Pass false to scan_canonical_inputs here since build_canonical_state does not use latest_verified_user_idx. This allows the scan to early-exit immediately once the goal and 4 facts are found, avoiding scanning the rest of the history.
| let scan = scan_canonical_inputs(&self.session.messages); | |
| let scan = scan_canonical_inputs(&self.session.messages, false); |
| impl CanonicalStateScan { | ||
| /// `true` once every collector is satisfied. The single-pass | ||
| /// caller can use this to break out of the reverse iteration. | ||
| fn is_complete(&self) -> bool { | ||
| self.goal.is_some() | ||
| && self.latest_verified_user_idx.is_some() | ||
| && self.facts_collected >= CANONICAL_SCAN_MAX_FACTS | ||
| } | ||
| } | ||
|
|
||
| /// Walk `messages` once (in reverse) and collect everything the canonical | ||
| /// state and re-plan paths need. Replaces the previous pattern of three | ||
| /// independent reverse scans: one for the goal, one for confirmed facts, | ||
| /// and one for the latest verified user message. | ||
| fn scan_canonical_inputs(messages: &[Message]) -> CanonicalStateScan { | ||
| let mut scan = CanonicalStateScan::default(); | ||
| for (idx, msg) in messages.iter().enumerate().rev() { | ||
| if msg.role == "user" { | ||
| if scan.goal.is_none() { | ||
| if let Some(text) = msg.content.iter().find_map(|b| match b { | ||
| ContentBlock::Text { text, .. } => Some(text.as_str()), | ||
| _ => None, | ||
| }) { | ||
| scan.goal = Some(summarize_text(text, 220)); | ||
| scan.latest_user_text_idx = Some(idx); | ||
| } | ||
| } | ||
| if scan.latest_verified_user_idx.is_none() { | ||
| let verified = msg.content.iter().any(|b| match b { | ||
| ContentBlock::ToolResult { content, .. } => { | ||
| content.contains("[verification replay]") | ||
| } | ||
| _ => false, | ||
| }); | ||
| if verified { | ||
| scan.latest_verified_user_idx = Some(idx); | ||
| } | ||
| } | ||
| } | ||
| if scan.facts_collected < CANONICAL_SCAN_MAX_FACTS { | ||
| for block in &msg.content { | ||
| if let ContentBlock::ToolResult { content, .. } = block | ||
| && !content.starts_with("Error:") | ||
| { | ||
| scan.confirmed_facts.push(summarize_text(content, 180)); | ||
| scan.facts_collected = scan.facts_collected.saturating_add(1); | ||
| if scan.facts_collected >= CANONICAL_SCAN_MAX_FACTS { | ||
| break; | ||
| } | ||
| } | ||
| } | ||
| } | ||
| if scan.is_complete() { | ||
| break; | ||
| } | ||
| } | ||
| scan | ||
| } |
There was a problem hiding this comment.
Performance Issue: Early-Exit Optimization Bypassed in Common Cases
In the current implementation, is_complete() requires latest_verified_user_idx.is_some() to return true:
fn is_complete(&self) -> bool {
self.goal.is_some()
&& self.latest_verified_user_idx.is_some()
&& self.facts_collected >= CANONICAL_SCAN_MAX_FACTS
}However, in the vast majority of normal runs, there is no verified user message in the session history (since verification replays only happen under specific high-risk conditions).
This leads to three major issues:
- Bypassed Optimization in
build_canonical_state:build_canonical_statedoes not uselatest_verified_user_idxat all, yet it callsscan_canonical_inputswhich is forced to scan the entire message history becauseis_complete()will never returntrue. - Bypassed Optimization in
apply_verify_and_replan: Even when we do want to find the verified message, if none exists, the loop will scan the entire history instead of early-exiting once the goal and 4 facts are found. - Broken Test: The test
scan_early_exits_when_completeactually does not early-exit in the current PR code because there is no verified message in the test input, meaning it scans all 1000 messages.
Solution
Introduce a find_verified: bool parameter to scan_canonical_inputs and is_complete. When find_verified is false (such as in build_canonical_state), we can skip searching for the verified message and allow the loop to early-exit immediately once the goal and 4 facts are collected.
impl CanonicalStateScan {
/// `true` once every collector is satisfied. The single-pass
/// caller can use this to break out of the reverse iteration.
fn is_complete(&self, find_verified: bool) -> bool {
self.goal.is_some()
&& (!find_verified || self.latest_verified_user_idx.is_some())
&& self.facts_collected >= CANONICAL_SCAN_MAX_FACTS
}
}
/// Walk `messages` once (in reverse) and collect everything the canonical
/// state and re-plan paths need. Replaces the previous pattern of three
/// independent reverse scans: one for the goal, one for confirmed facts,
/// and one for the latest verified user message.
fn scan_canonical_inputs(messages: &[Message], find_verified: bool) -> CanonicalStateScan {
let mut scan = CanonicalStateScan::default();
for (idx, msg) in messages.iter().enumerate().rev() {
if msg.role == "user" {
if scan.goal.is_none() {
if let Some(text) = msg.content.iter().find_map(|b| match b {
ContentBlock::Text { text, .. } => Some(text.as_str()),
_ => None,
}) {
scan.goal = Some(summarize_text(text, 220));
scan.latest_user_text_idx = Some(idx);
}
}
if find_verified && scan.latest_verified_user_idx.is_none() {
let verified = msg.content.iter().any(|b| match b {
ContentBlock::ToolResult { content, .. } => {
content.contains("[verification replay]")
}
_ => false,
});
if verified {
scan.latest_verified_user_idx = Some(idx);
}
}
}
if scan.facts_collected < CANONICAL_SCAN_MAX_FACTS {
for block in &msg.content {
if let ContentBlock::ToolResult { content, .. } = block
&& !content.starts_with("Error:")
{
scan.confirmed_facts.push(summarize_text(content, 180));
scan.facts_collected = scan.facts_collected.saturating_add(1);
if scan.facts_collected >= CANONICAL_SCAN_MAX_FACTS {
break;
}
}
}
}
if scan.is_complete(find_verified) {
break;
}
}
scan
}| tool_result_msg("b"), | ||
| user_text_msg("third"), | ||
| ]; | ||
| let scan = scan_canonical_inputs(&messages); |
| tool_result_msg("fact-D"), | ||
| tool_result_msg("fact-E"), | ||
| ]; | ||
| let scan = scan_canonical_inputs(&messages); |
| tool_result_msg("Error: bad"), | ||
| tool_result_msg("good-B"), | ||
| ]; | ||
| let scan = scan_canonical_inputs(&messages); |
| user_with_verified_replay("verified"), | ||
| user_text_msg("third"), | ||
| ]; | ||
| let scan = scan_canonical_inputs(&messages); |
|
|
||
| #[test] | ||
| fn scan_handles_empty_input() { | ||
| let scan = scan_canonical_inputs(&[]); |
| .collect(); | ||
| // Most recent user message comes last. | ||
| messages.push(user_text_msg("goal")); | ||
| let scan = scan_canonical_inputs(&messages); |
There was a problem hiding this comment.
Update the test call to pass false for find_verified. This ensures the early-exit optimization is actually triggered and tested (previously, it scanned all 1000 messages because no verified message was present).
| let scan = scan_canonical_inputs(&messages); | |
| let scan = scan_canonical_inputs(&messages, false); |
|
Thanks @HUQIANTAO. I checked this during the v0.8.52 release freeze. The platform tests are green, but lint is failing on |
… pass build_canonical_state previously did two independent reverse walks of session.messages — one to extract the most recent user goal, and one to collect up to four confirmed-fact snippets. apply_verify_and_replan then added a third and fourth reverse scan to locate the latest user message and the latest [verification replay] user message for the re-plan path. All four reverse scans collect disjoint facts about the same most- recent-first view of the conversation. This PR folds them into a single helper, scan_canonical_inputs, that walks messages once in reverse, fills a CanonicalStateScan, and short-circuits as soon as every collector is satisfied. The helper exposes the latest-message indices so apply_verify_and_replan can clone the full Message values after the scan (eliminating the two independent find().cloned() walks). The output CanonicalState is byte-identical to the prior implementation: same goal, same confirmed facts (newest first, errors filtered), same fallback string when no user text exists. The re-plan path's keep-messages set is identical: latest user + latest verified. Tests: 6 new unit tests cover the goal lookup, fact cap, error-result filter, verified-marker scan, empty input, and the early-exit condition. The full engine test suite (153 tests) still passes.
…-user lookup The build_canonical_state path never reads CanonicalStateScan::latest_verified_user_idx, but the previous patch required is_complete() to find a verified user message before it would short-circuit. On a long history with no verification replay — the common case — the scan walked the entire message list looking for a match that could not exist. Add a find_verified: bool parameter to scan_canonical_inputs and CanonicalStateScan::is_complete. build_canonical_state now passes false, so the loop stops as soon as the goal and CANONICAL_SCAN_MAX_FACTS facts are found. The replan path (apply_verify_and_replan) keeps the existing true behavior so it still locates the latest verified user message. Test calls are updated to match; no behavior change for any test.
281b9d7 to
a2474f7
Compare
There was a problem hiding this comment.
HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
Summary
build_canonical_statepreviously did two independent reverse walks ofsession.messages— one to extract the most recent user goal, and one to collect up to four confirmed-fact snippets.apply_verify_and_replanthen added a third and fourth reverse scan to locate the latest user message and the latest[verification replay]user message for the re-plan path.All four reverse scans collect disjoint facts about the same most-recent-first view of the conversation. This PR folds them into a single helper,
scan_canonical_inputs, that walksmessagesonce in reverse, fills aCanonicalStateScan, and short-circuits as soon as every collector is satisfied. The helper exposes the latest-message indices soapply_verify_and_replancan clone the fullMessagevalues after the scan (eliminating the two independentfind().cloned()walks).Why now
The four scans ran on every capacity intervention. Long sessions (200+ messages) pay non-trivial CPU each time the controller decides to refresh, replan, or replay — exactly when the user is already waiting for an LLM call to complete. Collapsing to one walk also turns the early-exit into a real optimization: once the goal, the verified marker, and 4 facts are found, the scan stops; the rest of the conversation is irrelevant.
Output parity
The output
CanonicalStateis byte-identical to the prior implementation: same goal, same confirmed facts (newest first, errors filtered), same fallback string when no user text exists. The re-plan path's keep-messages set is identical: latest user + latest verified.Changes
crates/tui/src/core/engine/capacity_flow.rsscan_canonical_inputshelper andCanonicalStateScanstruct.build_canonical_statenow consumes the scan.apply_verify_and_replanuses the scan indices to look up the full messages. 6 new unit tests.Tests
The full engine test suite (153 tests) still passes.