Skip to content

perf(capacity): collapse build_canonical_state's reverse scans to one pass#2633

Open
HUQIANTAO wants to merge 2 commits into
Hmbown:mainfrom
HUQIANTAO:perf/canonical-single-pass
Open

perf(capacity): collapse build_canonical_state's reverse scans to one pass#2633
HUQIANTAO wants to merge 2 commits into
Hmbown:mainfrom
HUQIANTAO:perf/canonical-single-pass

Conversation

@HUQIANTAO
Copy link
Copy Markdown
Contributor

Summary

build_canonical_state previously did two independent reverse walks of session.messages — one to extract the most recent user goal, and one to collect up to four confirmed-fact snippets. apply_verify_and_replan then added a third and fourth reverse scan to locate the latest user message and the latest [verification replay] user message for the re-plan path.

All four reverse scans collect disjoint facts about the same most-recent-first view of the conversation. This PR folds them into a single helper, scan_canonical_inputs, that walks messages once in reverse, fills a CanonicalStateScan, and short-circuits as soon as every collector is satisfied. The helper exposes the latest-message indices so apply_verify_and_replan can clone the full Message values after the scan (eliminating the two independent find().cloned() walks).

Why now

The four scans ran on every capacity intervention. Long sessions (200+ messages) pay non-trivial CPU each time the controller decides to refresh, replan, or replay — exactly when the user is already waiting for an LLM call to complete. Collapsing to one walk also turns the early-exit into a real optimization: once the goal, the verified marker, and 4 facts are found, the scan stops; the rest of the conversation is irrelevant.

Output parity

The output CanonicalState is byte-identical to the prior implementation: same goal, same confirmed facts (newest first, errors filtered), same fallback string when no user text exists. The re-plan path's keep-messages set is identical: latest user + latest verified.

Changes

File Change
crates/tui/src/core/engine/capacity_flow.rs New scan_canonical_inputs helper and CanonicalStateScan struct. build_canonical_state now consumes the scan. apply_verify_and_replan uses the scan indices to look up the full messages. 6 new unit tests.

Tests

running 6 tests
test core::engine::capacity_flow::canonical_scan_tests::scan_handles_empty_input ... ok
test core::engine::capacity_flow::canonical_scan_tests::scan_returns_goal_for_latest_user_text ... ok
test core::engine::capacity_flow::canonical_scan_tests::scan_collects_up_to_four_facts_newest_first ... ok
test core::engine::capacity_flow::canonical_scan_tests::scan_skips_error_results ... ok
test core::engine::capacity_flow::canonical_scan_tests::scan_finds_latest_verified_user_message ... ok
test core::engine::capacity_flow::canonical_scan_tests::scan_early_exits_when_complete ... ok

test result: ok. 6 passed; 0 failed

The full engine test suite (153 tests) still passes.

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 3, 2026

Thanks @HUQIANTAO for taking the time to contribute.

This repository is currently observing a maintainer-managed contribution gate in dry-run mode, so this pull request is staying open. When enforcement is enabled, pull requests from contributors who are not listed in .github/APPROVED_CONTRIBUTORS will be closed automatically.

Please read CONTRIBUTING.md for the expected contribution shape. A maintainer can grant PR access by commenting /lgtm on a pull request.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request consolidates multiple reverse iterations over session messages into a single-pass helper function, scan_canonical_inputs, to improve efficiency. The reviewer identified a performance issue where the early-exit optimization is bypassed in standard runs because the helper always expects to find a verified user message. To address this, the reviewer suggested introducing a find_verified boolean parameter to conditionally skip searching for the verified message, allowing the scan to exit early once the goal and facts are collected, and provided corresponding updates for the caller sites and unit tests.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

// The replan path needs the *full* messages, not summaries.
// `scan_canonical_inputs` already located the indices in a single
// reverse pass; clone from the live `messages` slice once.
let scan = scan_canonical_inputs(&self.session.messages);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Pass true to scan_canonical_inputs here to indicate that we need to locate the latest verified user message index for the replan path.

Suggested change
let scan = scan_canonical_inputs(&self.session.messages);
let scan = scan_canonical_inputs(&self.session.messages, true);

// `.iter().rev().find_map()` walks and a third for facts; the
// dedicated scan below replaces all three with one pass that
// also early-exits once every collector is satisfied.
let scan = scan_canonical_inputs(&self.session.messages);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Pass false to scan_canonical_inputs here since build_canonical_state does not use latest_verified_user_idx. This allows the scan to early-exit immediately once the goal and 4 facts are found, avoiding scanning the rest of the history.

Suggested change
let scan = scan_canonical_inputs(&self.session.messages);
let scan = scan_canonical_inputs(&self.session.messages, false);

Comment on lines +971 to +1028
impl CanonicalStateScan {
/// `true` once every collector is satisfied. The single-pass
/// caller can use this to break out of the reverse iteration.
fn is_complete(&self) -> bool {
self.goal.is_some()
&& self.latest_verified_user_idx.is_some()
&& self.facts_collected >= CANONICAL_SCAN_MAX_FACTS
}
}

/// Walk `messages` once (in reverse) and collect everything the canonical
/// state and re-plan paths need. Replaces the previous pattern of three
/// independent reverse scans: one for the goal, one for confirmed facts,
/// and one for the latest verified user message.
fn scan_canonical_inputs(messages: &[Message]) -> CanonicalStateScan {
let mut scan = CanonicalStateScan::default();
for (idx, msg) in messages.iter().enumerate().rev() {
if msg.role == "user" {
if scan.goal.is_none() {
if let Some(text) = msg.content.iter().find_map(|b| match b {
ContentBlock::Text { text, .. } => Some(text.as_str()),
_ => None,
}) {
scan.goal = Some(summarize_text(text, 220));
scan.latest_user_text_idx = Some(idx);
}
}
if scan.latest_verified_user_idx.is_none() {
let verified = msg.content.iter().any(|b| match b {
ContentBlock::ToolResult { content, .. } => {
content.contains("[verification replay]")
}
_ => false,
});
if verified {
scan.latest_verified_user_idx = Some(idx);
}
}
}
if scan.facts_collected < CANONICAL_SCAN_MAX_FACTS {
for block in &msg.content {
if let ContentBlock::ToolResult { content, .. } = block
&& !content.starts_with("Error:")
{
scan.confirmed_facts.push(summarize_text(content, 180));
scan.facts_collected = scan.facts_collected.saturating_add(1);
if scan.facts_collected >= CANONICAL_SCAN_MAX_FACTS {
break;
}
}
}
}
if scan.is_complete() {
break;
}
}
scan
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Performance Issue: Early-Exit Optimization Bypassed in Common Cases

In the current implementation, is_complete() requires latest_verified_user_idx.is_some() to return true:

    fn is_complete(&self) -> bool {
        self.goal.is_some()
            && self.latest_verified_user_idx.is_some()
            && self.facts_collected >= CANONICAL_SCAN_MAX_FACTS
    }

However, in the vast majority of normal runs, there is no verified user message in the session history (since verification replays only happen under specific high-risk conditions).

This leads to three major issues:

  1. Bypassed Optimization in build_canonical_state: build_canonical_state does not use latest_verified_user_idx at all, yet it calls scan_canonical_inputs which is forced to scan the entire message history because is_complete() will never return true.
  2. Bypassed Optimization in apply_verify_and_replan: Even when we do want to find the verified message, if none exists, the loop will scan the entire history instead of early-exiting once the goal and 4 facts are found.
  3. Broken Test: The test scan_early_exits_when_complete actually does not early-exit in the current PR code because there is no verified message in the test input, meaning it scans all 1000 messages.

Solution

Introduce a find_verified: bool parameter to scan_canonical_inputs and is_complete. When find_verified is false (such as in build_canonical_state), we can skip searching for the verified message and allow the loop to early-exit immediately once the goal and 4 facts are collected.

impl CanonicalStateScan {
    /// `true` once every collector is satisfied. The single-pass
    /// caller can use this to break out of the reverse iteration.
    fn is_complete(&self, find_verified: bool) -> bool {
        self.goal.is_some()
            && (!find_verified || self.latest_verified_user_idx.is_some())
            && self.facts_collected >= CANONICAL_SCAN_MAX_FACTS
    }
}

/// Walk `messages` once (in reverse) and collect everything the canonical
/// state and re-plan paths need. Replaces the previous pattern of three
/// independent reverse scans: one for the goal, one for confirmed facts,
/// and one for the latest verified user message.
fn scan_canonical_inputs(messages: &[Message], find_verified: bool) -> CanonicalStateScan {
    let mut scan = CanonicalStateScan::default();
    for (idx, msg) in messages.iter().enumerate().rev() {
        if msg.role == "user" {
            if scan.goal.is_none() {
                if let Some(text) = msg.content.iter().find_map(|b| match b {
                    ContentBlock::Text { text, .. } => Some(text.as_str()),
                    _ => None,
                }) {
                    scan.goal = Some(summarize_text(text, 220));
                    scan.latest_user_text_idx = Some(idx);
                }
            }
            if find_verified && scan.latest_verified_user_idx.is_none() {
                let verified = msg.content.iter().any(|b| match b {
                    ContentBlock::ToolResult { content, .. } => {
                        content.contains("[verification replay]")
                    }
                    _ => false,
                });
                if verified {
                    scan.latest_verified_user_idx = Some(idx);
                }
            }
        }
        if scan.facts_collected < CANONICAL_SCAN_MAX_FACTS {
            for block in &msg.content {
                if let ContentBlock::ToolResult { content, .. } = block
                    && !content.starts_with("Error:")
                {
                    scan.confirmed_facts.push(summarize_text(content, 180));
                    scan.facts_collected = scan.facts_collected.saturating_add(1);
                    if scan.facts_collected >= CANONICAL_SCAN_MAX_FACTS {
                        break;
                    }
                }
            }
        }
        if scan.is_complete(find_verified) {
            break;
        }
    }
    scan
}

tool_result_msg("b"),
user_text_msg("third"),
];
let scan = scan_canonical_inputs(&messages);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Update the test call to pass false for find_verified.

Suggested change
let scan = scan_canonical_inputs(&messages);
let scan = scan_canonical_inputs(&messages, false);

tool_result_msg("fact-D"),
tool_result_msg("fact-E"),
];
let scan = scan_canonical_inputs(&messages);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Update the test call to pass false for find_verified.

Suggested change
let scan = scan_canonical_inputs(&messages);
let scan = scan_canonical_inputs(&messages, false);

tool_result_msg("Error: bad"),
tool_result_msg("good-B"),
];
let scan = scan_canonical_inputs(&messages);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Update the test call to pass false for find_verified.

Suggested change
let scan = scan_canonical_inputs(&messages);
let scan = scan_canonical_inputs(&messages, false);

user_with_verified_replay("verified"),
user_text_msg("third"),
];
let scan = scan_canonical_inputs(&messages);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Update the test call to pass true for find_verified since this test specifically verifies locating the latest verified user message.

Suggested change
let scan = scan_canonical_inputs(&messages);
let scan = scan_canonical_inputs(&messages, true);


#[test]
fn scan_handles_empty_input() {
let scan = scan_canonical_inputs(&[]);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Update the test call to pass false for find_verified.

Suggested change
let scan = scan_canonical_inputs(&[]);
let scan = scan_canonical_inputs(&[], false);

.collect();
// Most recent user message comes last.
messages.push(user_text_msg("goal"));
let scan = scan_canonical_inputs(&messages);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Update the test call to pass false for find_verified. This ensures the early-exit optimization is actually triggered and tested (previously, it scanned all 1000 messages because no verified message was present).

Suggested change
let scan = scan_canonical_inputs(&messages);
let scan = scan_canonical_inputs(&messages, false);

@Hmbown
Copy link
Copy Markdown
Owner

Hmbown commented Jun 3, 2026

Thanks @HUQIANTAO. I checked this during the v0.8.52 release freeze. The platform tests are green, but lint is failing on cargo fmt --all -- --check in capacity_flow.rs. Since this is a perf cleanup rather than a release-fix regression, I am keeping it out of 0.8.52 and will treat it as next-pass review once formatting is clean.

HUQIANTAO added 2 commits June 3, 2026 19:54
… pass

build_canonical_state previously did two independent reverse walks of
session.messages — one to extract the most recent user goal, and one
to collect up to four confirmed-fact snippets. apply_verify_and_replan
then added a third and fourth reverse scan to locate the latest user
message and the latest [verification replay] user message for the
re-plan path.

All four reverse scans collect disjoint facts about the same most-
recent-first view of the conversation. This PR folds them into a
single helper, scan_canonical_inputs, that walks messages once in
reverse, fills a CanonicalStateScan, and short-circuits as soon as
every collector is satisfied. The helper exposes the latest-message
indices so apply_verify_and_replan can clone the full Message values
after the scan (eliminating the two independent find().cloned() walks).

The output CanonicalState is byte-identical to the prior
implementation: same goal, same confirmed facts (newest first, errors
filtered), same fallback string when no user text exists. The re-plan
path's keep-messages set is identical: latest user + latest verified.

Tests: 6 new unit tests cover the goal lookup, fact cap, error-result
filter, verified-marker scan, empty input, and the early-exit
condition. The full engine test suite (153 tests) still passes.
…-user lookup

The build_canonical_state path never reads
CanonicalStateScan::latest_verified_user_idx, but the previous patch
required is_complete() to find a verified user message before it would
short-circuit. On a long history with no verification replay — the
common case — the scan walked the entire message list looking for a
match that could not exist.

Add a find_verified: bool parameter to scan_canonical_inputs and
CanonicalStateScan::is_complete. build_canonical_state now passes
false, so the loop stops as soon as the goal and CANONICAL_SCAN_MAX_FACTS
facts are found. The replan path (apply_verify_and_replan) keeps the
existing true behavior so it still locates the latest verified user
message.

Test calls are updated to match; no behavior change for any test.
@HUQIANTAO HUQIANTAO force-pushed the perf/canonical-single-pass branch from 281b9d7 to a2474f7 Compare June 3, 2026 12:00
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants