perf(history): cache output_rows and selected_output_indices per cell#2635
perf(history): cache output_rows and selected_output_indices per cell#2635HUQIANTAO wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
|
Thanks @HUQIANTAO for taking the time to contribute. This repository is currently observing a maintainer-managed contribution gate in dry-run mode, so this pull request is staying open. When enforcement is enabled, pull requests from contributors who are not listed in Please read |
|
Thanks @HUQIANTAO. I took a first look while checking the post-0.8.52 queue. This is a plausible render-hot-path optimization, but it is not a corrective-release candidate: CI is already blocked on lint, macOS/Windows/mobile are still pending, and the patch makes I am keeping it out of 0.8.52 so the release stays focused on verified regression fixes. Once CI is clean, this should get a normal review around cache invalidation, memory bounds, and whether exposing |
There was a problem hiding this comment.
Code Review
This pull request introduces a thread-local, bounded LRU cache (output_rows_cache) to memoize the tool-output shaping pipeline in the TUI, caching wrapped output rows and selected indices to avoid redundant computations on every render frame. The review feedback points out that the rows_hash field in CacheEntry is unused, meaning the entire Vec<OutputRow> is hashed redundantly on cache misses; removing this field allows simplifying the caching signatures, deleting the unused hash_rows function, and updating the callers and tests accordingly. Additionally, using DefaultHasher to hash large tool outputs on every frame could cause performance bottlenecks, so switching to a faster, non-cryptographic hasher like FxHasher or caching the hash on the cell itself is recommended.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| #[derive(Debug, Clone)] | ||
| struct CacheEntry { | ||
| rows: Vec<OutputRow>, | ||
| rows_hash: u64, | ||
| /// Map of `line_limit -> selected indices`. Bounded by the | ||
| /// distinct line limits passed in by the renderer (typically 1–3). | ||
| selected_by_limit: HashMap<usize, Vec<usize>>, | ||
| } | ||
|
|
||
| impl CacheEntry { | ||
| fn new(rows: Vec<OutputRow>, rows_hash: u64) -> Self { | ||
| Self { | ||
| rows, | ||
| rows_hash, | ||
| selected_by_limit: HashMap::new(), | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The rows_hash field in CacheEntry is completely unused. Computing it requires hashing the entire Vec<OutputRow> (every row's text and intact status) on every cache miss, which introduces significant redundant CPU overhead. Removing this field simplifies the struct and avoids unnecessary hashing.
#[derive(Debug, Clone)]
struct CacheEntry {
rows: Vec<OutputRow>,
/// Map of line_limit -> selected indices. Bounded by the
/// distinct line limits passed in by the renderer (typically 1-3).
selected_by_limit: HashMap<usize, Vec<usize>>,
}
impl CacheEntry {
fn new(rows: Vec<OutputRow>) -> Self {
Self {
rows,
selected_by_limit: HashMap::new(),
}
}
}| fn get_or_compute_rows<F>( | ||
| &mut self, | ||
| content_hash: u64, | ||
| width: u16, | ||
| compute: F, | ||
| ) -> (Vec<OutputRow>, u64) | ||
| where | ||
| F: FnOnce() -> Vec<OutputRow>, | ||
| { | ||
| let key = RowsKey { content_hash, width }; | ||
| if let Some(entry) = self.by_key.get(&key) { | ||
| return (entry.rows.clone(), entry.rows_hash); | ||
| } | ||
|
|
||
| let rows = compute(); | ||
| let rows_hash = hash_rows(&rows); | ||
| let entry = CacheEntry::new(rows.clone(), rows_hash); | ||
|
|
||
| if self.by_key.len() >= self.capacity | ||
| && let Some(oldest) = self.insertion_order.pop_front() | ||
| { | ||
| self.by_key.remove(&oldest); | ||
| } | ||
| self.by_key.insert(key, entry); | ||
| self.insertion_order.push_back(key); | ||
| (rows, rows_hash) | ||
| } |
There was a problem hiding this comment.
Since rows_hash is unused, we can simplify get_or_compute_rows to return only Vec<OutputRow> instead of a tuple, eliminating the redundant hash_rows call on cache misses.
fn get_or_compute_rows<F>(
&mut self,
content_hash: u64,
width: u16,
compute: F,
) -> Vec<OutputRow>
where
F: FnOnce() -> Vec<OutputRow>,
{
let key = RowsKey { content_hash, width };
if let Some(entry) = self.by_key.get(&key) {
return entry.rows.clone();
}
let rows = compute();
let entry = CacheEntry::new(rows.clone());
if self.by_key.len() >= self.capacity
&& let Some(oldest) = self.insertion_order.pop_front()
{
self.by_key.remove(&oldest);
}
self.by_key.insert(key, entry);
self.insertion_order.push_back(key);
rows
}| pub fn get_or_compute_rows<F>(output: &str, width: u16, compute: F) -> (Vec<OutputRow>, u64) | ||
| where | ||
| F: FnOnce() -> Vec<OutputRow>, | ||
| { | ||
| let content_hash = hash_str(output); | ||
| GLOBAL_CACHE.with(|c| c.borrow_mut().get_or_compute_rows(content_hash, width, compute)) | ||
| } |
There was a problem hiding this comment.
Update the public get_or_compute_rows wrapper to return only Vec<OutputRow> instead of the tuple.
| pub fn get_or_compute_rows<F>(output: &str, width: u16, compute: F) -> (Vec<OutputRow>, u64) | |
| where | |
| F: FnOnce() -> Vec<OutputRow>, | |
| { | |
| let content_hash = hash_str(output); | |
| GLOBAL_CACHE.with(|c| c.borrow_mut().get_or_compute_rows(content_hash, width, compute)) | |
| } | |
| pub fn get_or_compute_rows<F>(output: &str, width: u16, compute: F) -> Vec<OutputRow> | |
| where | |
| F: FnOnce() -> Vec<OutputRow>, | |
| { | |
| let content_hash = hash_str(output); | |
| GLOBAL_CACHE.with(|c| c.borrow_mut().get_or_compute_rows(content_hash, width, compute)) | |
| } |
| /// Content hash of an `OutputRow` slice. Computed once on cache miss; | ||
| /// reused for the indices-cache key. | ||
| fn hash_rows(rows: &[OutputRow]) -> u64 { | ||
| let mut hasher = DefaultHasher::new(); | ||
| rows.len().hash(&mut hasher); | ||
| for row in rows { | ||
| row.text.hash(&mut hasher); | ||
| row.intact.hash(&mut hasher); | ||
| } | ||
| hasher.finish() | ||
| } |
| let (all_lines, _rows_hash) = crate::tui::output_rows_cache::get_or_compute_rows( | ||
| output, | ||
| width, | ||
| || output_rows(output, width), | ||
| ); |
There was a problem hiding this comment.
Update the caller in render_preserved_output_mode to match the simplified get_or_compute_rows signature that returns only Vec<OutputRow>.
| let (all_lines, _rows_hash) = crate::tui::output_rows_cache::get_or_compute_rows( | |
| output, | |
| width, | |
| || output_rows(output, width), | |
| ); | |
| let all_lines = crate::tui::output_rows_cache::get_or_compute_rows( | |
| output, | |
| width, | |
| || output_rows(output, width), | |
| ); |
| let (a, hash_a) = get_or_compute_rows("payload", 80, compute); | ||
| let (b, hash_b) = get_or_compute_rows("payload", 80, || { | ||
| calls.set(calls.get() + 1); | ||
| vec![row("hello"), row("world")] | ||
| }); | ||
| assert_eq!(calls.get(), 1, "second call should hit the cache"); | ||
| assert_eq!(a, b); | ||
| assert_eq!(hash_a, hash_b); | ||
| } |
There was a problem hiding this comment.
Update the cache_hit_returns_cached_rows unit test to match the simplified get_or_compute_rows signature.
| let (a, hash_a) = get_or_compute_rows("payload", 80, compute); | |
| let (b, hash_b) = get_or_compute_rows("payload", 80, || { | |
| calls.set(calls.get() + 1); | |
| vec![row("hello"), row("world")] | |
| }); | |
| assert_eq!(calls.get(), 1, "second call should hit the cache"); | |
| assert_eq!(a, b); | |
| assert_eq!(hash_a, hash_b); | |
| } | |
| let a = get_or_compute_rows("payload", 80, compute); | |
| let b = get_or_compute_rows("payload", 80, || { | |
| calls.set(calls.get() + 1); | |
| vec![row("hello"), row("world")] | |
| }); | |
| assert_eq!(calls.get(), 1, "second call should hit the cache"); | |
| assert_eq!(a, b); |
| let (rows, _rows_hash) = get_or_compute_rows("payload", 80, || { | ||
| vec![row("a"), row("b"), row("c"), row("d"), row("e")] | ||
| }); |
There was a problem hiding this comment.
Update the indices_cached_per_line_limit unit test to match the simplified get_or_compute_rows signature.
| let (rows, _rows_hash) = get_or_compute_rows("payload", 80, || { | |
| vec![row("a"), row("b"), row("c"), row("d"), row("e")] | |
| }); | |
| let rows = get_or_compute_rows("payload", 80, || { | |
| vec![row("a"), row("b"), row("c"), row("d"), row("e")] | |
| }); |
| /// Cheap 64-bit content hash for a tool output string. | ||
| pub fn hash_str(s: &str) -> u64 { | ||
| let mut hasher = DefaultHasher::new(); | ||
| s.hash(&mut hasher); | ||
| hasher.finish() | ||
| } |
There was a problem hiding this comment.
Hashing the entire output string on every single render frame using DefaultHasher (SipHasher 1-3) can become a performance bottleneck for large tool outputs (e.g., tens or hundreds of kilobytes) at high frame rates (120 FPS). Since this is a local TUI cache, HashDoS protection is not required. Consider using a faster, non-cryptographic hasher like FxHasher (from rustc-hash) or AHasher (from ahash), or caching the hash on the cell itself (e.g., ExecCell or GenericToolCell) when the output is first populated/updated.
output_rows (in tui::history) walks the raw tool output, ANSI-strips each line, classifies path/URL-like rows, and wraps the rest to the current viewport width. selected_output_indices then computes the head/tail/importance subset that the compact Live view shows. Both functions are pure, but they are called on every render frame for every visible tool cell. For a 4 KB tool output on a 120 FPS render loop that is 2-6 redundant walks per frame, per cell, and the function is called from a non-trivial number of cells across exec, tool, command, and review history. Add tui::output_rows_cache, a thread-local, content-addressed cache keyed on (content_hash, width) for the rows and (content_hash, width, line_limit) for the indices. The cache stores the wrapped Vec<OutputRow> plus a per-line-limit map of selected indices on a single entry, so a single key lookup satisfies both render steps. render_preserved_output_mode now consults the cache for both the rows and the indices; on a hit, neither the per-line ANSI strip nor the importance-ranking pass runs. The cache is bounded (default capacity 256) with insertion-order eviction. The OutputRow struct gains PartialEq + Eq + pub fields so the cache module can store and hash it without exposing private internals. Tests: 6 new unit tests cover the hit/miss path, width invalidation, content invalidation, indices per-line_limit caching, capacity eviction, and hash stability. The wider tui::history test suite (68 tests) still passes.
bda8cbe to
e16b530
Compare
There was a problem hiding this comment.
HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
Three follow-ups to the previous perf commit: 1. Drop the rows_hash field on CacheEntry. The field was computed and stored but never read on the hot path; tests exercised it only to assert the cache returned a stable hash. After this change get_or_compute_rows returns just Vec<OutputRow>, halving the tuple-return ABI and removing one DefaultHasher::write pass on every cache miss. 2. Replace DefaultHasher (SipHash) with a hand-rolled FNV-1a 64-bit hash. SipHash is per-process-keyed and ~5-10x slower than FNV on the small-to-medium tool output strings we see at 120 FPS. FNV-1a has no per-process key, fits in 20 lines of pure-Rust, and a 64-bit collision space is more than wide enough for the per-process LRU's expected <= a few hundred entries. The cache is a correctness optimization, not a security boundary; collisions only cause a false miss, never wrong data. 3. Caller in tui::history::render_preserved_output_mode updated to the new Vec<OutputRow>-only signature. Two new tests cover the FNV-1a properties (length-suffix sensitivity, empty-input stability).
e16b530 to
563e077
Compare
There was a problem hiding this comment.
HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
Summary
output_rows(intui::history) walks the raw tool output, ANSI-strips each line, classifies path/URL-like rows, and wraps the rest to the current viewport width.selected_output_indicesthen computes the head/tail/importance subset that the compact Live view shows. Both functions are pure, but they are called on every render frame for every visible tool cell. For a 4 KB tool output on a 120 FPS render loop that is 2-6 redundant walks per frame, per cell.This PR adds
tui::output_rows_cache, a thread-local, content-addressed cache keyed on(content_hash, width)for the rows and(content_hash, width, line_limit)for the indices. The cache stores the wrappedVec<OutputRow>plus a per-line-limit map of selected indices on a single entry, so a single key lookup satisfies both render steps.render_preserved_output_modenow consults the cache for both the rows and the indices; on a hit, neither the per-line ANSI strip nor the importance-ranking pass runs. The cache is bounded (default capacity 256) with insertion-order eviction.Why thread-local
The TUI render loop runs on a single thread. A
thread_local!RefCell<OutputRowsCacheInner>is the simplest and contention-free design — noMutexorRwLock, no global mutable state visible to background workers, and no test-poisoning footguns. The cache is per-thread, so a background task that calls into the same module gets its own cold cache (acceptable: that path is rare and not on the render hot path).Changes
crates/tui/src/tui/output_rows_cache.rs(new, 344 LOC)crates/tui/src/tui/history.rsOutputRowbecomespubwithPartialEq + Eqso the cache can store and hash it.render_preserved_output_modeconsults the cache for both rows and indices.crates/tui/src/tui/mod.rsTests
The wider
tui::historytest suite (68 tests) still passes.