perf(project-context): cache load_project_context_with_parents by mtime signature#2636
perf(project-context): cache load_project_context_with_parents by mtime signature#2636HUQIANTAO wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
|
Thanks @HUQIANTAO for taking the time to contribute. This repository is currently observing a maintainer-managed contribution gate in dry-run mode, so this pull request is staying open. When enforcement is enabled, pull requests from contributors who are not listed in Please read |
There was a problem hiding this comment.
Code Review
This pull request introduces a process-local cache for project context loading to optimize performance on hot paths. The feedback focuses on preventing potential cache collisions when workspace canonicalization fails by using a non-optional PathBuf fallback, canonicalizing the workspace path early in the loader, and refactoring the cache implementation to use a single thread-local struct with a VecDeque for more efficient
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| #[derive(Debug, Clone, PartialEq, Eq, Hash)] | ||
| pub struct CacheKey { | ||
| /// Canonicalized workspace path. `None` when canonicalization fails | ||
| /// (rare, e.g. cwd removed) — such entries are never shared between | ||
| /// callers. | ||
| pub canonical_workspace: Option<PathBuf>, | ||
| /// Cheap content fingerprint: sorted list of `(path, mtime)` for | ||
| /// every candidate file the loader would inspect. | ||
| pub signature: MtimeSignature, | ||
| } |
There was a problem hiding this comment.
If canonical_workspace is None (e.g., if canonicalization fails), different workspaces that both fail to canonicalize and have no context files will produce the same CacheKey (since None == None). This can cause a cache collision where one workspace incorrectly returns the cached ProjectContext of another, leading to an incorrect project_root being used.
To prevent this, we should store the workspace path as a non-optional PathBuf that falls back to the raw path if canonicalization fails.
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub struct CacheKey {
/// Canonicalized workspace path, falling back to the raw path if canonicalization fails.
pub workspace: PathBuf,
/// Cheap content fingerprint: sorted list of `(path, mtime)` for
/// every candidate file the loader would inspect.
pub signature: MtimeSignature,
}| pub fn load_project_context_with_parents(workspace: &Path) -> ProjectContext { | ||
| load_project_context_with_parents_and_home(workspace, dirs::home_dir().as_deref()) | ||
| let home_dir = dirs::home_dir(); | ||
| // Compute a content signature (canonical workspace + per-file mtimes) | ||
| // and consult the process-local cache before doing any file reads. | ||
| // Repeated calls within a session — common during the layered | ||
| // context checkpoint and prompt-assembly paths — short-circuit when | ||
| // none of the candidate files have been written since the last call. | ||
| let cache_key = crate::project_context_cache::compute_cache_key(workspace, home_dir.as_deref()); | ||
| if let Some(ctx) = crate::project_context_cache::lookup(&cache_key) { | ||
| return ctx; | ||
| } | ||
| let ctx = load_project_context_with_parents_and_home(workspace, home_dir.as_deref()); | ||
| crate::project_context_cache::store(cache_key, ctx.clone()); | ||
| ctx | ||
| } |
There was a problem hiding this comment.
Canonicalizing the workspace path at the entry of load_project_context_with_parents ensures that parent directory traversal behaves consistently and correctly, even if a relative path is passed. This also guarantees that the cache signature and the loader inspect the exact same canonicalized paths.
| pub fn load_project_context_with_parents(workspace: &Path) -> ProjectContext { | |
| load_project_context_with_parents_and_home(workspace, dirs::home_dir().as_deref()) | |
| let home_dir = dirs::home_dir(); | |
| // Compute a content signature (canonical workspace + per-file mtimes) | |
| // and consult the process-local cache before doing any file reads. | |
| // Repeated calls within a session — common during the layered | |
| // context checkpoint and prompt-assembly paths — short-circuit when | |
| // none of the candidate files have been written since the last call. | |
| let cache_key = crate::project_context_cache::compute_cache_key(workspace, home_dir.as_deref()); | |
| if let Some(ctx) = crate::project_context_cache::lookup(&cache_key) { | |
| return ctx; | |
| } | |
| let ctx = load_project_context_with_parents_and_home(workspace, home_dir.as_deref()); | |
| crate::project_context_cache::store(cache_key, ctx.clone()); | |
| ctx | |
| } | |
| pub fn load_project_context_with_parents(workspace: &Path) -> ProjectContext { | |
| let canonical_workspace = std::fs::canonicalize(workspace).unwrap_or_else(|_| workspace.to_path_buf()); | |
| let home_dir = dirs::home_dir(); | |
| // Compute a content signature (canonical workspace + per-file mtimes) | |
| // and consult the process-local cache before doing any file reads. | |
| // Repeated calls within a session — common during the layered | |
| // context checkpoint and prompt-assembly paths — short-circuit when | |
| // none of the candidate files have been written since the last call. | |
| let cache_key = crate::project_context_cache::compute_cache_key(&canonical_workspace, home_dir.as_deref()); | |
| if let Some(ctx) = crate::project_context_cache::lookup(&cache_key) { | |
| return ctx; | |
| } | |
| let ctx = load_project_context_with_parents_and_home(&canonical_workspace, home_dir.as_deref()); | |
| crate::project_context_cache::store(cache_key, ctx.clone()); | |
| ctx | |
| } |
| thread_local! { | ||
| static CACHE: RefCell<HashMap<CacheKey, ProjectContext>> = RefCell::new(HashMap::new()); | ||
| /// FIFO order for eviction. Mirrors the `VecDeque<CacheKey>` pattern | ||
| /// used in the other caches. | ||
| static ORDER: RefCell<Vec<CacheKey>> = RefCell::new(Vec::new()); | ||
| } | ||
|
|
||
| /// Look up a `ProjectContext` by key. Returns `Some` clone on hit. | ||
| pub fn lookup(key: &CacheKey) -> Option<ProjectContext> { | ||
| CACHE.with(|c| c.borrow().get(key).cloned()) | ||
| } | ||
|
|
||
| /// Store a `ProjectContext` under `key`, evicting the oldest entry if | ||
| /// the cache is at capacity. The stored value is the same | ||
| /// `ProjectContext` instance the caller already has — no extra clone. | ||
| pub fn store(key: CacheKey, value: ProjectContext) { | ||
| CACHE.with(|c| { | ||
| let mut cache = c.borrow_mut(); | ||
| if cache.insert(key.clone(), value).is_none() { | ||
| ORDER.with(|o| o.borrow_mut().push(key.clone())); | ||
| } | ||
| }); | ||
| evict_if_needed(); | ||
| } | ||
|
|
||
| /// Drop every cached entry. Used by tests and `/clear` paths. | ||
| #[cfg(test)] | ||
| pub fn clear() { | ||
| CACHE.with(|c| c.borrow_mut().clear()); | ||
| ORDER.with(|o| o.borrow_mut().clear()); | ||
| } | ||
|
|
||
| fn evict_if_needed() { | ||
| CACHE.with(|c| { | ||
| let mut cache = c.borrow_mut(); | ||
| ORDER.with(|o| { | ||
| let mut order = o.borrow_mut(); | ||
| while cache.len() > DEFAULT_CAPACITY { | ||
| if let Some(oldest) = order.first().cloned() { | ||
| cache.remove(&oldest); | ||
| order.remove(0); | ||
| } else { | ||
| break; | ||
| } | ||
| } | ||
| }); | ||
| }); | ||
| } |
There was a problem hiding this comment.
Managing CACHE and ORDER as separate thread-local variables requires nested .with calls and manual synchronization. Additionally, using Vec::remove(0) for FIFO eviction is an
We can refactor this into a single thread-local struct wrapping both the HashMap and a VecDeque. This simplifies the eviction logic, eliminates nested borrows, and uses the idiomatic double-ended queue for
struct WorkspaceCache {
map: HashMap<CacheKey, ProjectContext>,
order: std::collections::VecDeque<CacheKey>,
}
thread_local! {
static CACHE: RefCell<WorkspaceCache> = RefCell::new(WorkspaceCache {
map: HashMap::new(),
order: std::collections::VecDeque::new(),
});
}
/// Look up a `ProjectContext` by key. Returns `Some` clone on hit.
pub fn lookup(key: &CacheKey) -> Option<ProjectContext> {
CACHE.with(|c| c.borrow().map.get(key).cloned())
}
/// Store a `ProjectContext` under `key`, evicting the oldest entry if
/// the cache is at capacity. The stored value is the same
/// `ProjectContext` instance the caller already has — no extra clone.
pub fn store(key: CacheKey, value: ProjectContext) {
CACHE.with(|c| {
let mut state = c.borrow_mut();
if state.map.insert(key.clone(), value).is_none() {
state.order.push_back(key);
}
while state.map.len() > DEFAULT_CAPACITY {
if let Some(oldest) = state.order.pop_front() {
state.map.remove(&oldest);
} else {
break;
}
}
});
}
/// Drop every cached entry. Used by tests and `/clear` paths.
#[cfg(test)]
pub fn clear() {
CACHE.with(|c| {
let mut state = c.borrow_mut();
state.map.clear();
state.order.clear();
});
}| /// Compute the cache key for a `load_project_context_with_parents` call. | ||
| /// `home_dir` may be `None`; the signature still resolves correctly. | ||
| #[must_use] | ||
| pub fn compute_cache_key(workspace: &Path, home_dir: Option<&Path>) -> CacheKey { | ||
| CacheKey { | ||
| canonical_workspace: std::fs::canonicalize(workspace).ok(), | ||
| signature: MtimeSignature::for_loader(workspace, home_dir), | ||
| } | ||
| } |
There was a problem hiding this comment.
Update compute_cache_key to match the new CacheKey structure by falling back to the raw path if canonicalization fails.
| /// Compute the cache key for a `load_project_context_with_parents` call. | |
| /// `home_dir` may be `None`; the signature still resolves correctly. | |
| #[must_use] | |
| pub fn compute_cache_key(workspace: &Path, home_dir: Option<&Path>) -> CacheKey { | |
| CacheKey { | |
| canonical_workspace: std::fs::canonicalize(workspace).ok(), | |
| signature: MtimeSignature::for_loader(workspace, home_dir), | |
| } | |
| } | |
| /// Compute the cache key for a `load_project_context_with_parents` call. | |
| /// `home_dir` may be `None`; the signature still resolves correctly. | |
| #[must_use] | |
| pub fn compute_cache_key(workspace: &Path, home_dir: Option<&Path>) -> CacheKey { | |
| CacheKey { | |
| workspace: std::fs::canonicalize(workspace).unwrap_or_else(|_| workspace.to_path_buf()), | |
| signature: MtimeSignature::for_loader(workspace, home_dir), | |
| } | |
| } |
| fn cache_round_trip() { | ||
| let _ = TempDir::new(); // discard the previous one | ||
| clear(); | ||
| let key = CacheKey { | ||
| canonical_workspace: None, | ||
| signature: MtimeSignature::default(), | ||
| }; | ||
| let ctx = ProjectContext::empty(PathBuf::from("/tmp/whatever")); | ||
| store(key.clone(), ctx.clone()); | ||
| let got = lookup(&key).expect("hit"); | ||
| assert_eq!(got.project_root, ctx.project_root); | ||
| } | ||
|
|
||
| #[test] | ||
| fn store_does_not_grow_unbounded() { | ||
| clear(); | ||
| // Insert `DEFAULT_CAPACITY + 4` distinct keys. The oldest | ||
| // entries should be evicted on each insert. | ||
| for i in 0..(DEFAULT_CAPACITY + 4) { | ||
| let mut sig = MtimeSignature::default(); | ||
| sig.entries.push(MtimeEntry { | ||
| path: PathBuf::from(format!("/synthetic/{i}")), | ||
| mtime: None, | ||
| }); | ||
| let key = CacheKey { canonical_workspace: None, signature: sig }; | ||
| store(key, ProjectContext::empty(PathBuf::from("/tmp"))); | ||
| } | ||
| // After all the inserts, the cache should hold at most | ||
| // DEFAULT_CAPACITY entries. | ||
| let count = CACHE.with(|c| c.borrow().len()); | ||
| assert!(count <= DEFAULT_CAPACITY, "cache held {count} entries"); | ||
| } |
There was a problem hiding this comment.
Update the unit tests to align with the refactored CacheKey structure and the single thread-local CACHE state.
#[test]
fn cache_round_trip() {
let _ = TempDir::new(); // discard the previous one
clear();
let key = CacheKey {
workspace: PathBuf::from("/tmp/whatever"),
signature: MtimeSignature::default(),
};
let ctx = ProjectContext::empty(PathBuf::from("/tmp/whatever"));
store(key.clone(), ctx.clone());
let got = lookup(&key).expect("hit");
assert_eq!(got.project_root, ctx.project_root);
}
#[test]
fn store_does_not_grow_unbounded() {
clear();
// Insert `DEFAULT_CAPACITY + 4` distinct keys. The oldest
// entries should be evicted on each insert.
for i in 0..(DEFAULT_CAPACITY + 4) {
let mut sig = MtimeSignature::default();
sig.entries.push(MtimeEntry {
path: PathBuf::from(format!("/synthetic/{i}")),
mtime: None,
});
let key = CacheKey {
workspace: PathBuf::from("/tmp"),
signature: sig,
};
store(key, ProjectContext::empty(PathBuf::from("/tmp")));
}
// After all the inserts, the cache should hold at most
// DEFAULT_CAPACITY entries.
let count = CACHE.with(|c| c.borrow().map.len());
assert!(count <= DEFAULT_CAPACITY, "cache held {count} entries");
}…me signature load_project_context_with_parents walks the workspace and every parent directory, checks for six project-context filenames in each, then consults three global fallback paths under the user's home directory. The per-file work is cheap (a metadata() call plus a single read on the first match), but the call is on the engine's hot path: it runs from Session::new, the layered-context checkpoint, build_system_prompt_with_session_context, and the TUI context inspector. A long session can re-invoke the loader dozens of times per turn without any of the candidate files having changed. Add a thread-local cache keyed on the canonical workspace path plus a cheap MtimeSignature: an ordered list of (path, SystemTime) pairs for every candidate file the loader would inspect. The signature is computed by metadata() only — no file reads. On a hit the cached ProjectContext is returned without any I/O beyond the metadata calls. On a miss the loader runs and the result is stored. The cache is bounded (default capacity 8 workspaces) and uses insertion-order eviction. The signature is sorted to make it independent of iteration order; it changes when any candidate file appears, is overwritten, or is removed (mtime delta). Files that vanish contribute a None mtime that still differs from a stable SystemTime, so the cache invalidates correctly. PROJECT_CONTEXT_FILES gains pub(crate) visibility so the cache module can read the same file list the loader uses. Tests: 5 new unit tests cover signature stability, mtime invalidation, new-file detection, cache round-trip, and bounded eviction. The wider project_context test suite (30 tests) still passes.
…FO eviction Three follow-ups to the previous perf commit: 1. Correctness: replace the Option<PathBuf> canonical_workspace field on CacheKey with a plain PathBuf that falls back to the raw input on canonicalize failure. The previous layout let two distinct workspaces whose canonicalization both failed (deleted cwd, non-Unix platform) collide under CacheKey::eq — both produce None, so lookup returned the wrong ProjectContext. Now compute_cache_key unwrap_or_else's the raw path, so distinct workspaces always have distinct keys. 2. Performance: collapse the two thread-locals (CACHE: HashMap + ORDER: Vec) into a single WorkspaceCache struct that owns both the HashMap and a VecDeque<CacheKey> insertion order. The earlier patch used Vec::remove(0) for FIFO eviction — O(n) on every evict — and nested two RefCell borrows per call. The new layout gives O(1) pop_front, one borrow per call, and matches the strategy used in tui::transcript_cache and tui::output_rows_cache. 3. The loader now canonicalizes the workspace at entry so a caller passing a relative path produces a stable cache key independent of cwd. Tests are updated to the new CacheKey shape; a new distinct_workspaces_do_not_collide test pins the None-collision fix.
2467bb6 to
a21a70f
Compare
There was a problem hiding this comment.
HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
|
Thanks @HUQIANTAO. This missed the emergency v0.8.52 lane because we kept that release to publish/provenance fixes only, but it is queued for the v0.8.53 PR sweep in #2645. Sorry for the messy release window; this should get a normal performance-review pass now that the registries are coherent. |
Summary
load_project_context_with_parentswalks the workspace and every parent directory, checks for six project-context filenames in each, then consults three global fallback paths under the user's home directory. The per-file work is cheap (ametadata()call plus a single read on the first match), but the call is on the engine's hot path: it runs fromSession::new, the layered-context checkpoint,build_system_prompt_with_session_context, and the TUI context inspector. A long session can re-invoke the loader dozens of times per turn without any of the candidate files having changed.This PR adds a thread-local cache keyed on the canonical workspace path plus a cheap
MtimeSignature: an ordered list of(path, SystemTime)pairs for every candidate file the loader would inspect. The signature is computed bymetadata()only — no file reads. On a hit the cachedProjectContextis returned without any I/O beyond the metadata calls. On a miss the loader runs and the result is stored.The cache is bounded (default capacity 8 workspaces) and uses insertion-order eviction, matching the strategy in
tui::transcript_cacheandtui::output_rows_cache. The signature is sorted to make it independent of iteration order; it changes when any candidate file appears, is overwritten, or is removed (mtime delta). Files that vanish contribute aNonemtime that still differs from a stableSystemTime, so the cache invalidates correctly.Changes
crates/tui/src/project_context_cache.rs(new, 260 LOC)MtimeSignaturebuilder, 5 unit tests.crates/tui/src/project_context.rsload_project_context_with_parentsconsults the cache first.PROJECT_CONTEXT_FILESbecomespub(crate)so the cache module reads the same file list.crates/tui/src/main.rsTests
The wider
project_contexttest suite (30 tests) still passes.