perf(history): cache output_rows and selected_output_indices per cell by HUQIANTAO · Pull Request #2635 · Hmbown/CodeWhale

HUQIANTAO · 2026-06-03T11:05:47Z

Summary

output_rows (in tui::history) walks the raw tool output, ANSI-strips each line, classifies path/URL-like rows, and wraps the rest to the current viewport width. selected_output_indices then computes the head/tail/importance subset that the compact Live view shows. Both functions are pure, but they are called on every render frame for every visible tool cell. For a 4 KB tool output on a 120 FPS render loop that is 2-6 redundant walks per frame, per cell.

This PR adds tui::output_rows_cache, a thread-local, content-addressed cache keyed on (content_hash, width) for the rows and (content_hash, width, line_limit) for the indices. The cache stores the wrapped Vec<OutputRow> plus a per-line-limit map of selected indices on a single entry, so a single key lookup satisfies both render steps.

render_preserved_output_mode now consults the cache for both the rows and the indices; on a hit, neither the per-line ANSI strip nor the importance-ranking pass runs. The cache is bounded (default capacity 256) with insertion-order eviction.

Why thread-local

The TUI render loop runs on a single thread. A thread_local! RefCell<OutputRowsCacheInner> is the simplest and contention-free design — no Mutex or RwLock, no global mutable state visible to background workers, and no test-poisoning footguns. The cache is per-thread, so a background task that calls into the same module gets its own cold cache (acceptable: that path is rare and not on the render hot path).

Changes

File	Change
`crates/tui/src/tui/output_rows_cache.rs` (new, 344 LOC)	Thread-local cache, content hasher, 6 unit tests.
`crates/tui/src/tui/history.rs`	`OutputRow` becomes `pub` with `PartialEq + Eq` so the cache can store and hash it. `render_preserved_output_mode` consults the cache for both rows and indices.
`crates/tui/src/tui/mod.rs`	Register the new module.

Tests

running 6 tests
test tui::output_rows_cache::tests::cache_hit_returns_cached_rows ... ok
test tui::output_rows_cache::tests::different_width_invalidates_rows ... ok
test tui::output_rows_cache::tests::different_output_invalidates_rows ... ok
test tui::output_rows_cache::tests::indices_cached_per_line_limit ... ok
test tui::output_rows_cache::tests::capacity_evicts_oldest ... ok
test tui::output_rows_cache::tests::hash_str_stable_for_identical_input ... ok

test result: ok. 6 passed; 0 failed

The wider tui::history test suite (68 tests) still passes.

greptile-apps

HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

github-actions · 2026-06-03T11:05:58Z

Thanks @HUQIANTAO for taking the time to contribute.

This repository is currently observing a maintainer-managed contribution gate in dry-run mode, so this pull request is staying open. When enforcement is enabled, pull requests from contributors who are not listed in .github/APPROVED_CONTRIBUTORS will be closed automatically.

Please read CONTRIBUTING.md for the expected contribution shape. A maintainer can grant PR access by commenting /lgtm on a pull request.

Hmbown · 2026-06-03T11:07:02Z

Thanks @HUQIANTAO. I took a first look while checking the post-0.8.52 queue. This is a plausible render-hot-path optimization, but it is not a corrective-release candidate: CI is already blocked on lint, macOS/Windows/mobile are still pending, and the patch makes OutputRow public while adding a new cache module around history rendering.

I am keeping it out of 0.8.52 so the release stays focused on verified regression fixes. Once CI is clean, this should get a normal review around cache invalidation, memory bounds, and whether exposing OutputRow is the right API shape. Thanks for continuing to improve the perf path; I just do not want to bury a rendering cache change in the emergency patch.

gemini-code-assist

Code Review

This pull request introduces a thread-local, bounded LRU cache (output_rows_cache) to memoize the tool-output shaping pipeline in the TUI, caching wrapped output rows and selected indices to avoid redundant computations on every render frame. The review feedback points out that the rows_hash field in CacheEntry is unused, meaning the entire Vec<OutputRow> is hashed redundantly on cache misses; removing this field allows simplifying the caching signatures, deleting the unused hash_rows function, and updating the callers and tests accordingly. Additionally, using DefaultHasher to hash large tool outputs on every frame could cause performance bottlenecks, so switching to a faster, non-cryptographic hasher like FxHasher or caching the hash on the cell itself is recommended.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-03T11:07:28Z

+#[derive(Debug, Clone)]
+struct CacheEntry {
+    rows: Vec<OutputRow>,
+    rows_hash: u64,
+    /// Map of `line_limit -> selected indices`. Bounded by the
+    /// distinct line limits passed in by the renderer (typically 1–3).
+    selected_by_limit: HashMap<usize, Vec<usize>>,
+}
+
+impl CacheEntry {
+    fn new(rows: Vec<OutputRow>, rows_hash: u64) -> Self {
+        Self {
+            rows,
+            rows_hash,
+            selected_by_limit: HashMap::new(),
+        }
+    }
+}


The rows_hash field in CacheEntry is completely unused. Computing it requires hashing the entire Vec<OutputRow> (every row's text and intact status) on every cache miss, which introduces significant redundant CPU overhead. Removing this field simplifies the struct and avoids unnecessary hashing.

#[derive(Debug, Clone)] struct CacheEntry { rows: Vec<OutputRow>, /// Map of line_limit -> selected indices. Bounded by the /// distinct line limits passed in by the renderer (typically 1-3). selected_by_limit: HashMap<usize, Vec<usize>>, } impl CacheEntry { fn new(rows: Vec<OutputRow>) -> Self { Self { rows, selected_by_limit: HashMap::new(), } } }

gemini-code-assist · 2026-06-03T11:07:28Z

+    fn get_or_compute_rows<F>(
+        &mut self,
+        content_hash: u64,
+        width: u16,
+        compute: F,
+    ) -> (Vec<OutputRow>, u64)
+    where
+        F: FnOnce() -> Vec<OutputRow>,
+    {
+        let key = RowsKey { content_hash, width };
+        if let Some(entry) = self.by_key.get(&key) {
+            return (entry.rows.clone(), entry.rows_hash);
+        }
+
+        let rows = compute();
+        let rows_hash = hash_rows(&rows);
+        let entry = CacheEntry::new(rows.clone(), rows_hash);
+
+        if self.by_key.len() >= self.capacity
+            && let Some(oldest) = self.insertion_order.pop_front()
+        {
+            self.by_key.remove(&oldest);
+        }
+        self.by_key.insert(key, entry);
+        self.insertion_order.push_back(key);
+        (rows, rows_hash)
+    }


Since rows_hash is unused, we can simplify get_or_compute_rows to return only Vec<OutputRow> instead of a tuple, eliminating the redundant hash_rows call on cache misses.

fn get_or_compute_rows<F>( &mut self, content_hash: u64, width: u16, compute: F, ) -> Vec<OutputRow> where F: FnOnce() -> Vec<OutputRow>, { let key = RowsKey { content_hash, width }; if let Some(entry) = self.by_key.get(&key) { return entry.rows.clone(); } let rows = compute(); let entry = CacheEntry::new(rows.clone()); if self.by_key.len() >= self.capacity && let Some(oldest) = self.insertion_order.pop_front() { self.by_key.remove(&oldest); } self.by_key.insert(key, entry); self.insertion_order.push_back(key); rows }

gemini-code-assist · 2026-06-03T11:07:29Z

+pub fn get_or_compute_rows<F>(output: &str, width: u16, compute: F) -> (Vec<OutputRow>, u64)
+where
+    F: FnOnce() -> Vec<OutputRow>,
+{
+    let content_hash = hash_str(output);
+    GLOBAL_CACHE.with(|c| c.borrow_mut().get_or_compute_rows(content_hash, width, compute))
+}


Update the public get_or_compute_rows wrapper to return only Vec<OutputRow> instead of the tuple.

Suggested change

pub fn get_or_compute_rows<F>(output: &str, width: u16, compute: F) -> (Vec<OutputRow>, u64)

where

F: FnOnce() -> Vec<OutputRow>,

{

let content_hash = hash_str(output);

GLOBAL_CACHE.with(|c| c.borrow_mut().get_or_compute_rows(content_hash, width, compute))

}

pub fn get_or_compute_rows<F>(output: &str, width: u16, compute: F) -> Vec<OutputRow>

where

F: FnOnce() -> Vec<OutputRow>,

{

let content_hash = hash_str(output);

GLOBAL_CACHE.with(|c| c.borrow_mut().get_or_compute_rows(content_hash, width, compute))

}

gemini-code-assist · 2026-06-03T11:07:29Z

+/// Content hash of an `OutputRow` slice. Computed once on cache miss;
+/// reused for the indices-cache key.
+fn hash_rows(rows: &[OutputRow]) -> u64 {
+    let mut hasher = DefaultHasher::new();
+    rows.len().hash(&mut hasher);
+    for row in rows {
+        row.text.hash(&mut hasher);
+        row.intact.hash(&mut hasher);
+    }
+    hasher.finish()
+}


The hash_rows function is no longer needed since rows_hash has been removed. Deleting it cleans up dead code and avoids redundant hashing.

gemini-code-assist · 2026-06-03T11:07:29Z

+    let (all_lines, _rows_hash) = crate::tui::output_rows_cache::get_or_compute_rows(
+        output,
+        width,
+        || output_rows(output, width),
+    );


Update the caller in render_preserved_output_mode to match the simplified get_or_compute_rows signature that returns only Vec<OutputRow>.

Suggested change

let (all_lines, _rows_hash) = crate::tui::output_rows_cache::get_or_compute_rows(

output,

width,

|| output_rows(output, width),

);

let all_lines = crate::tui::output_rows_cache::get_or_compute_rows(

output,

width,

|| output_rows(output, width),

);

gemini-code-assist · 2026-06-03T11:07:29Z

+        let (a, hash_a) = get_or_compute_rows("payload", 80, compute);
+        let (b, hash_b) = get_or_compute_rows("payload", 80, || {
+            calls.set(calls.get() + 1);
+            vec![row("hello"), row("world")]
+        });
+        assert_eq!(calls.get(), 1, "second call should hit the cache");
+        assert_eq!(a, b);
+        assert_eq!(hash_a, hash_b);
+    }


Update the cache_hit_returns_cached_rows unit test to match the simplified get_or_compute_rows signature.

Suggested change

let (a, hash_a) = get_or_compute_rows("payload", 80, compute);

let (b, hash_b) = get_or_compute_rows("payload", 80, || {

calls.set(calls.get() + 1);

vec![row("hello"), row("world")]

});

assert_eq!(calls.get(), 1, "second call should hit the cache");

assert_eq!(a, b);

assert_eq!(hash_a, hash_b);

}

let a = get_or_compute_rows("payload", 80, compute);

let b = get_or_compute_rows("payload", 80, || {

calls.set(calls.get() + 1);

vec![row("hello"), row("world")]

});

assert_eq!(calls.get(), 1, "second call should hit the cache");

assert_eq!(a, b);

gemini-code-assist · 2026-06-03T11:07:29Z

+        let (rows, _rows_hash) = get_or_compute_rows("payload", 80, || {
+            vec![row("a"), row("b"), row("c"), row("d"), row("e")]
+        });


Update the indices_cached_per_line_limit unit test to match the simplified get_or_compute_rows signature.

Suggested change

let (rows, _rows_hash) = get_or_compute_rows("payload", 80, || {

vec![row("a"), row("b"), row("c"), row("d"), row("e")]

});

let rows = get_or_compute_rows("payload", 80, || {

vec![row("a"), row("b"), row("c"), row("d"), row("e")]

});

gemini-code-assist · 2026-06-03T11:07:29Z

+/// Cheap 64-bit content hash for a tool output string.
+pub fn hash_str(s: &str) -> u64 {
+    let mut hasher = DefaultHasher::new();
+    s.hash(&mut hasher);
+    hasher.finish()
+}


Hashing the entire output string on every single render frame using DefaultHasher (SipHasher 1-3) can become a performance bottleneck for large tool outputs (e.g., tens or hundreds of kilobytes) at high frame rates (120 FPS). Since this is a local TUI cache, HashDoS protection is not required. Consider using a faster, non-cryptographic hasher like FxHasher (from rustc-hash) or AHasher (from ahash), or caching the hash on the cell itself (e.g., ExecCell or GenericToolCell) when the output is first populated/updated.

output_rows (in tui::history) walks the raw tool output, ANSI-strips each line, classifies path/URL-like rows, and wraps the rest to the current viewport width. selected_output_indices then computes the head/tail/importance subset that the compact Live view shows. Both functions are pure, but they are called on every render frame for every visible tool cell. For a 4 KB tool output on a 120 FPS render loop that is 2-6 redundant walks per frame, per cell, and the function is called from a non-trivial number of cells across exec, tool, command, and review history. Add tui::output_rows_cache, a thread-local, content-addressed cache keyed on (content_hash, width) for the rows and (content_hash, width, line_limit) for the indices. The cache stores the wrapped Vec<OutputRow> plus a per-line-limit map of selected indices on a single entry, so a single key lookup satisfies both render steps. render_preserved_output_mode now consults the cache for both the rows and the indices; on a hit, neither the per-line ANSI strip nor the importance-ranking pass runs. The cache is bounded (default capacity 256) with insertion-order eviction. The OutputRow struct gains PartialEq + Eq + pub fields so the cache module can store and hash it without exposing private internals. Tests: 6 new unit tests cover the hit/miss path, width invalidation, content invalidation, indices per-line_limit caching, capacity eviction, and hash stability. The wider tui::history test suite (68 tests) still passes.

greptile-apps

HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

Three follow-ups to the previous perf commit: 1. Drop the rows_hash field on CacheEntry. The field was computed and stored but never read on the hot path; tests exercised it only to assert the cache returned a stable hash. After this change get_or_compute_rows returns just Vec<OutputRow>, halving the tuple-return ABI and removing one DefaultHasher::write pass on every cache miss. 2. Replace DefaultHasher (SipHash) with a hand-rolled FNV-1a 64-bit hash. SipHash is per-process-keyed and ~5-10x slower than FNV on the small-to-medium tool output strings we see at 120 FPS. FNV-1a has no per-process key, fits in 20 lines of pure-Rust, and a 64-bit collision space is more than wide enough for the per-process LRU's expected <= a few hundred entries. The cache is a correctness optimization, not a security boundary; collisions only cause a false miss, never wrong data. 3. Caller in tui::history::render_preserved_output_mode updated to the new Vec<OutputRow>-only signature. Two new tests cover the FNV-1a properties (length-suffix sensitivity, empty-input stability).

greptile-apps

HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

greptile-apps Bot reviewed Jun 3, 2026

View reviewed changes

gemini-code-assist Bot reviewed Jun 3, 2026

View reviewed changes

Hmbown mentioned this pull request Jun 3, 2026

[v0.8.51] Release checklist — carry the branch to a published release #2600

Closed

16 tasks

HUQIANTAO force-pushed the perf/output-rows-cache branch from bda8cbe to e16b530 Compare June 3, 2026 12:06

greptile-apps Bot reviewed Jun 3, 2026

View reviewed changes

HUQIANTAO force-pushed the perf/output-rows-cache branch from e16b530 to 563e077 Compare June 3, 2026 12:42

greptile-apps Bot reviewed Jun 3, 2026

View reviewed changes

Conversation

HUQIANTAO commented Jun 3, 2026

Summary

Why thread-local

Changes

Tests

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

Hmbown commented Jun 3, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants