Skip to content

perf(history): cache output_rows and selected_output_indices per cell#2635

Open
HUQIANTAO wants to merge 2 commits into
Hmbown:mainfrom
HUQIANTAO:perf/output-rows-cache
Open

perf(history): cache output_rows and selected_output_indices per cell#2635
HUQIANTAO wants to merge 2 commits into
Hmbown:mainfrom
HUQIANTAO:perf/output-rows-cache

Conversation

@HUQIANTAO
Copy link
Copy Markdown
Contributor

Summary

output_rows (in tui::history) walks the raw tool output, ANSI-strips each line, classifies path/URL-like rows, and wraps the rest to the current viewport width. selected_output_indices then computes the head/tail/importance subset that the compact Live view shows. Both functions are pure, but they are called on every render frame for every visible tool cell. For a 4 KB tool output on a 120 FPS render loop that is 2-6 redundant walks per frame, per cell.

This PR adds tui::output_rows_cache, a thread-local, content-addressed cache keyed on (content_hash, width) for the rows and (content_hash, width, line_limit) for the indices. The cache stores the wrapped Vec<OutputRow> plus a per-line-limit map of selected indices on a single entry, so a single key lookup satisfies both render steps.

render_preserved_output_mode now consults the cache for both the rows and the indices; on a hit, neither the per-line ANSI strip nor the importance-ranking pass runs. The cache is bounded (default capacity 256) with insertion-order eviction.

Why thread-local

The TUI render loop runs on a single thread. A thread_local! RefCell<OutputRowsCacheInner> is the simplest and contention-free design — no Mutex or RwLock, no global mutable state visible to background workers, and no test-poisoning footguns. The cache is per-thread, so a background task that calls into the same module gets its own cold cache (acceptable: that path is rare and not on the render hot path).

Changes

File Change
crates/tui/src/tui/output_rows_cache.rs (new, 344 LOC) Thread-local cache, content hasher, 6 unit tests.
crates/tui/src/tui/history.rs OutputRow becomes pub with PartialEq + Eq so the cache can store and hash it. render_preserved_output_mode consults the cache for both rows and indices.
crates/tui/src/tui/mod.rs Register the new module.

Tests

running 6 tests
test tui::output_rows_cache::tests::cache_hit_returns_cached_rows ... ok
test tui::output_rows_cache::tests::different_width_invalidates_rows ... ok
test tui::output_rows_cache::tests::different_output_invalidates_rows ... ok
test tui::output_rows_cache::tests::indices_cached_per_line_limit ... ok
test tui::output_rows_cache::tests::capacity_evicts_oldest ... ok
test tui::output_rows_cache::tests::hash_str_stable_for_identical_input ... ok

test result: ok. 6 passed; 0 failed

The wider tui::history test suite (68 tests) still passes.

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 3, 2026

Thanks @HUQIANTAO for taking the time to contribute.

This repository is currently observing a maintainer-managed contribution gate in dry-run mode, so this pull request is staying open. When enforcement is enabled, pull requests from contributors who are not listed in .github/APPROVED_CONTRIBUTORS will be closed automatically.

Please read CONTRIBUTING.md for the expected contribution shape. A maintainer can grant PR access by commenting /lgtm on a pull request.

@Hmbown
Copy link
Copy Markdown
Owner

Hmbown commented Jun 3, 2026

Thanks @HUQIANTAO. I took a first look while checking the post-0.8.52 queue. This is a plausible render-hot-path optimization, but it is not a corrective-release candidate: CI is already blocked on lint, macOS/Windows/mobile are still pending, and the patch makes OutputRow public while adding a new cache module around history rendering.

I am keeping it out of 0.8.52 so the release stays focused on verified regression fixes. Once CI is clean, this should get a normal review around cache invalidation, memory bounds, and whether exposing OutputRow is the right API shape. Thanks for continuing to improve the perf path; I just do not want to bury a rendering cache change in the emergency patch.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a thread-local, bounded LRU cache (output_rows_cache) to memoize the tool-output shaping pipeline in the TUI, caching wrapped output rows and selected indices to avoid redundant computations on every render frame. The review feedback points out that the rows_hash field in CacheEntry is unused, meaning the entire Vec<OutputRow> is hashed redundantly on cache misses; removing this field allows simplifying the caching signatures, deleting the unused hash_rows function, and updating the callers and tests accordingly. Additionally, using DefaultHasher to hash large tool outputs on every frame could cause performance bottlenecks, so switching to a faster, non-cryptographic hasher like FxHasher or caching the hash on the cell itself is recommended.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +51 to +68
#[derive(Debug, Clone)]
struct CacheEntry {
rows: Vec<OutputRow>,
rows_hash: u64,
/// Map of `line_limit -> selected indices`. Bounded by the
/// distinct line limits passed in by the renderer (typically 1–3).
selected_by_limit: HashMap<usize, Vec<usize>>,
}

impl CacheEntry {
fn new(rows: Vec<OutputRow>, rows_hash: u64) -> Self {
Self {
rows,
rows_hash,
selected_by_limit: HashMap::new(),
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The rows_hash field in CacheEntry is completely unused. Computing it requires hashing the entire Vec<OutputRow> (every row's text and intact status) on every cache miss, which introduces significant redundant CPU overhead. Removing this field simplifies the struct and avoids unnecessary hashing.

#[derive(Debug, Clone)]
struct CacheEntry {
    rows: Vec<OutputRow>,
    /// Map of line_limit -> selected indices. Bounded by the
    /// distinct line limits passed in by the renderer (typically 1-3).
    selected_by_limit: HashMap<usize, Vec<usize>>,
}

impl CacheEntry {
    fn new(rows: Vec<OutputRow>) -> Self {
        Self {
            rows,
            selected_by_limit: HashMap::new(),
        }
    }
}

Comment on lines +111 to +137
fn get_or_compute_rows<F>(
&mut self,
content_hash: u64,
width: u16,
compute: F,
) -> (Vec<OutputRow>, u64)
where
F: FnOnce() -> Vec<OutputRow>,
{
let key = RowsKey { content_hash, width };
if let Some(entry) = self.by_key.get(&key) {
return (entry.rows.clone(), entry.rows_hash);
}

let rows = compute();
let rows_hash = hash_rows(&rows);
let entry = CacheEntry::new(rows.clone(), rows_hash);

if self.by_key.len() >= self.capacity
&& let Some(oldest) = self.insertion_order.pop_front()
{
self.by_key.remove(&oldest);
}
self.by_key.insert(key, entry);
self.insertion_order.push_back(key);
(rows, rows_hash)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Since rows_hash is unused, we can simplify get_or_compute_rows to return only Vec<OutputRow> instead of a tuple, eliminating the redundant hash_rows call on cache misses.

    fn get_or_compute_rows<F>(
        &mut self,
        content_hash: u64,
        width: u16,
        compute: F,
    ) -> Vec<OutputRow>
    where
        F: FnOnce() -> Vec<OutputRow>,
    {
        let key = RowsKey { content_hash, width };
        if let Some(entry) = self.by_key.get(&key) {
            return entry.rows.clone();
        }

        let rows = compute();
        let entry = CacheEntry::new(rows.clone());

        if self.by_key.len() >= self.capacity
            && let Some(oldest) = self.insertion_order.pop_front()
        {
            self.by_key.remove(&oldest);
        }
        self.by_key.insert(key, entry);
        self.insertion_order.push_back(key);
        rows
    }

Comment thread crates/tui/src/tui/output_rows_cache.rs Outdated
Comment on lines +188 to +194
pub fn get_or_compute_rows<F>(output: &str, width: u16, compute: F) -> (Vec<OutputRow>, u64)
where
F: FnOnce() -> Vec<OutputRow>,
{
let content_hash = hash_str(output);
GLOBAL_CACHE.with(|c| c.borrow_mut().get_or_compute_rows(content_hash, width, compute))
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Update the public get_or_compute_rows wrapper to return only Vec<OutputRow> instead of the tuple.

Suggested change
pub fn get_or_compute_rows<F>(output: &str, width: u16, compute: F) -> (Vec<OutputRow>, u64)
where
F: FnOnce() -> Vec<OutputRow>,
{
let content_hash = hash_str(output);
GLOBAL_CACHE.with(|c| c.borrow_mut().get_or_compute_rows(content_hash, width, compute))
}
pub fn get_or_compute_rows<F>(output: &str, width: u16, compute: F) -> Vec<OutputRow>
where
F: FnOnce() -> Vec<OutputRow>,
{
let content_hash = hash_str(output);
GLOBAL_CACHE.with(|c| c.borrow_mut().get_or_compute_rows(content_hash, width, compute))
}

Comment thread crates/tui/src/tui/output_rows_cache.rs Outdated
Comment on lines +221 to +231
/// Content hash of an `OutputRow` slice. Computed once on cache miss;
/// reused for the indices-cache key.
fn hash_rows(rows: &[OutputRow]) -> u64 {
let mut hasher = DefaultHasher::new();
rows.len().hash(&mut hasher);
for row in rows {
row.text.hash(&mut hasher);
row.intact.hash(&mut hasher);
}
hasher.finish()
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The hash_rows function is no longer needed since rows_hash has been removed. Deleting it cleans up dead code and avoids redundant hashing.

Comment thread crates/tui/src/tui/history.rs Outdated
Comment on lines +2640 to +2644
let (all_lines, _rows_hash) = crate::tui::output_rows_cache::get_or_compute_rows(
output,
width,
|| output_rows(output, width),
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Update the caller in render_preserved_output_mode to match the simplified get_or_compute_rows signature that returns only Vec<OutputRow>.

Suggested change
let (all_lines, _rows_hash) = crate::tui::output_rows_cache::get_or_compute_rows(
output,
width,
|| output_rows(output, width),
);
let all_lines = crate::tui::output_rows_cache::get_or_compute_rows(
output,
width,
|| output_rows(output, width),
);

Comment thread crates/tui/src/tui/output_rows_cache.rs Outdated
Comment on lines +251 to +259
let (a, hash_a) = get_or_compute_rows("payload", 80, compute);
let (b, hash_b) = get_or_compute_rows("payload", 80, || {
calls.set(calls.get() + 1);
vec![row("hello"), row("world")]
});
assert_eq!(calls.get(), 1, "second call should hit the cache");
assert_eq!(a, b);
assert_eq!(hash_a, hash_b);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Update the cache_hit_returns_cached_rows unit test to match the simplified get_or_compute_rows signature.

Suggested change
let (a, hash_a) = get_or_compute_rows("payload", 80, compute);
let (b, hash_b) = get_or_compute_rows("payload", 80, || {
calls.set(calls.get() + 1);
vec![row("hello"), row("world")]
});
assert_eq!(calls.get(), 1, "second call should hit the cache");
assert_eq!(a, b);
assert_eq!(hash_a, hash_b);
}
let a = get_or_compute_rows("payload", 80, compute);
let b = get_or_compute_rows("payload", 80, || {
calls.set(calls.get() + 1);
vec![row("hello"), row("world")]
});
assert_eq!(calls.get(), 1, "second call should hit the cache");
assert_eq!(a, b);

Comment thread crates/tui/src/tui/output_rows_cache.rs Outdated
Comment on lines +295 to +297
let (rows, _rows_hash) = get_or_compute_rows("payload", 80, || {
vec![row("a"), row("b"), row("c"), row("d"), row("e")]
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Update the indices_cached_per_line_limit unit test to match the simplified get_or_compute_rows signature.

Suggested change
let (rows, _rows_hash) = get_or_compute_rows("payload", 80, || {
vec![row("a"), row("b"), row("c"), row("d"), row("e")]
});
let rows = get_or_compute_rows("payload", 80, || {
vec![row("a"), row("b"), row("c"), row("d"), row("e")]
});

Comment thread crates/tui/src/tui/output_rows_cache.rs Outdated
Comment on lines +214 to +219
/// Cheap 64-bit content hash for a tool output string.
pub fn hash_str(s: &str) -> u64 {
let mut hasher = DefaultHasher::new();
s.hash(&mut hasher);
hasher.finish()
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Hashing the entire output string on every single render frame using DefaultHasher (SipHasher 1-3) can become a performance bottleneck for large tool outputs (e.g., tens or hundreds of kilobytes) at high frame rates (120 FPS). Since this is a local TUI cache, HashDoS protection is not required. Consider using a faster, non-cryptographic hasher like FxHasher (from rustc-hash) or AHasher (from ahash), or caching the hash on the cell itself (e.g., ExecCell or GenericToolCell) when the output is first populated/updated.

output_rows (in tui::history) walks the raw tool output, ANSI-strips
each line, classifies path/URL-like rows, and wraps the rest to the
current viewport width. selected_output_indices then computes the
head/tail/importance subset that the compact Live view shows. Both
functions are pure, but they are called on every render frame for
every visible tool cell. For a 4 KB tool output on a 120 FPS render
loop that is 2-6 redundant walks per frame, per cell, and the
function is called from a non-trivial number of cells across
exec, tool, command, and review history.

Add tui::output_rows_cache, a thread-local, content-addressed cache
keyed on (content_hash, width) for the rows and (content_hash, width,
line_limit) for the indices. The cache stores the wrapped
Vec<OutputRow> plus a per-line-limit map of selected indices on a
single entry, so a single key lookup satisfies both render steps.

render_preserved_output_mode now consults the cache for both the
rows and the indices; on a hit, neither the per-line ANSI strip nor
the importance-ranking pass runs. The cache is bounded (default
capacity 256) with insertion-order eviction. The OutputRow struct
gains PartialEq + Eq + pub fields so the cache module can store and
hash it without exposing private internals.

Tests: 6 new unit tests cover the hit/miss path, width invalidation,
content invalidation, indices per-line_limit caching, capacity
eviction, and hash stability. The wider tui::history test suite (68
tests) still passes.
@HUQIANTAO HUQIANTAO force-pushed the perf/output-rows-cache branch from bda8cbe to e16b530 Compare June 3, 2026 12:06
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

Three follow-ups to the previous perf commit:

1. Drop the rows_hash field on CacheEntry. The field was computed and
   stored but never read on the hot path; tests exercised it only to
   assert the cache returned a stable hash. After this change
   get_or_compute_rows returns just Vec<OutputRow>, halving the
   tuple-return ABI and removing one DefaultHasher::write pass on
   every cache miss.

2. Replace DefaultHasher (SipHash) with a hand-rolled FNV-1a 64-bit
   hash. SipHash is per-process-keyed and ~5-10x slower than FNV on
   the small-to-medium tool output strings we see at 120 FPS. FNV-1a
   has no per-process key, fits in 20 lines of pure-Rust, and a 64-bit
   collision space is more than wide enough for the per-process LRU's
   expected <= a few hundred entries. The cache is a correctness
   optimization, not a security boundary; collisions only cause a
   false miss, never wrong data.

3. Caller in tui::history::render_preserved_output_mode updated to
   the new Vec<OutputRow>-only signature. Two new tests cover the
   FNV-1a properties (length-suffix sensitivity, empty-input
   stability).
@HUQIANTAO HUQIANTAO force-pushed the perf/output-rows-cache branch from e16b530 to 563e077 Compare June 3, 2026 12:42
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants