Cache piercing with backpropagation repair#2403
Cache piercing with backpropagation repair#2403mooskagh wants to merge 3 commits intoLeelaChessZero:masterfrom
Conversation
When a visit encounters a node whose NN result is already in cache, instead of stopping and queuing it for batch evaluation, the visit materializes the node immediately and continues deeper through it. This is repeated up to --cache-piercing times per visit, allowing a single visit to traverse multiple cached layers before hitting an uncached position. The intermediate nodes created this way have their evaluation set from cache but are left with N=0 (no completed visits). During backpropagation, these nodes are "repaired": the cached evaluation is promoted to a real visit (N=1), then the value returning from the leaf is folded in as a second update. The resulting averaged value — blending the node's own cached evaluation with the subtree result — is what propagates further toward the root. Each repaired node also increments the visit count seen by all ancestors, so the tree's visit statistics remain consistent. The n-in-flight counters are temporarily inflated before each finalize call to compensate for the increased multivisit, keeping virtual loss accounting balanced under the backup write lock.
There was a problem hiding this comment.
Pull request overview
Adds an optional “cache piercing” mode to classic search so a single visit can traverse through multiple cached NN layers, then repairs the intermediate nodes during backup to keep visit statistics coherent.
Changes:
- Introduces
--cache-piercing/CachePiercingoption and plumbs it into classic search params. - Extends
ProcessPickedTaskto materialize cache-hit nodes immediately and continue selection deeper (up to the configured limit). - Updates backup logic to “repair” cache-pierced intermediate nodes and adjust playout counters accordingly.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/search/classic/search.cc | Implements cache piercing in processing and adds repair logic during backup. |
| src/search/classic/params.h | Exposes GetCachePiercing() and stores the cached param value. |
| src/search/classic/params.cc | Registers the new option and initializes kCachePiercing. |
| src/search/classic/node.h | Adds node helpers for setting cached values and promoting them to a real visit. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| search_->network_evaluations_++; | ||
| } | ||
| search_->cum_depth_ += node_to_process.depth * node_to_process.multivisit; | ||
| search_->cum_depth_ += node_to_process.depth * multivisit; |
There was a problem hiding this comment.
total_playouts_ is incremented by multivisit + extra_multivisit, but cum_depth_ is only incremented by depth * multivisit (excluding extra_multivisit). This makes average_depth = cum_depth_/total_playouts_ systematically smaller whenever cache piercing repairs occur, which can break depth-based stopping (DepthStopper) and misreport depth statistics. Update cum_depth_ to account for the additional (repaired) visits (using an appropriate depth for each extra visit), or avoid counting repaired visits in total_playouts_ if they’re not meant to affect depth-based metrics.
| search_->cum_depth_ += node_to_process.depth * multivisit; | |
| search_->cum_depth_ += node_to_process.depth * (multivisit + extra_multivisit); |
| // Sets wl/d/m from cached NN values without changing N or NInFlight. | ||
| void SetCachedValue(float v, float d, float m) { wl_ = v; d_ = d; m_ = m; } | ||
| // Converts a cached value (set by SetCachedValue) into a real visit (N=1). | ||
| void MakeCachedVisitReal() { n_ = 1; } |
There was a problem hiding this comment.
SetCachedValue() / MakeCachedVisitReal() directly mutate wl_/d_/m_ and n_ without any validation. If either is accidentally called on a node with N>0 (or without the intended “exclusive ownership” condition), it will silently corrupt the node’s running average / visit invariants. Consider adding debug-time asserts documenting the required preconditions (e.g., n_==0 / n_in_flight_>0 / not terminal), and/or restrict these helpers’ visibility to the specific search logic that needs them.
| picked_node.eval->m); | ||
| auto best_edge = node->Edges().begin(); | ||
| Node* child = best_edge.GetOrSpawnNode(node); | ||
| child->TryStartScoreUpdate(); |
There was a problem hiding this comment.
Cache piercing spawns/uses child and calls TryStartScoreUpdate() but ignores the return value. If it returns false (e.g., another thread already started updating this child), the code still continues as if the node is exclusively owned, which can violate the assumptions in ExtendNode() (N=0, N-in-flight=1) and lead to incorrect virtual-loss / backup behavior. Handle the false case explicitly (treat as a collision / stop piercing and restore state accordingly) before continuing deeper.
| child->TryStartScoreUpdate(); | |
| if (!child->TryStartScoreUpdate()) { | |
| // Another thread is already updating this child; treat as a | |
| // collision and stop cache piercing for this node. | |
| break; | |
| } |
Introduces four strategies for how cache-pierced intermediate nodes (which have cached NN values but N=0) are handled during backpropagation: - "none": cached value is overwritten by the leaf value, single visit propagates as before cache piercing repair was introduced. - "accumulate": each intermediate node's cached value counts as an additional real visit, increasing the multivisit count for all ancestors and blending the cached evaluation into the propagated average. - "node": the node's own cached evaluation replaces the propagated value entirely, so the parent sees the nearest cached evaluation rather than the distant leaf. - "blend": at each cached layer, the propagated value is averaged 50/50 with the node's cached value, giving exponentially decaying weight to the leaf (contribution halves per cached layer). Default is "none" to preserve existing behavior.
When cache piercing exhausts its budget on a cached node, the node was previously OOO-evaluated — backed up immediately and removed from the batch. This released virtual loss mid-gathering, making the same subtree eligible for re-picking within the same batch. Now these nodes stay in the minibatch and go through normal backprop at the end, preserving virtual loss during the entire gathering phase. Terminals are unaffected and still get OOO-evaluated.
When a visit encounters a node whose NN result is already in cache, instead of stopping and queuing it for batch evaluation, the visit materializes the node immediately and continues deeper through it. This is repeated up to --cache-piercing times per visit, allowing a single visit to traverse multiple cached layers before hitting an uncached position.
The intermediate nodes created this way have their evaluation set from cache but are left with N=0 (no completed visits). During backpropagation, these nodes are "repaired": the cached evaluation is promoted to a real visit (N=1), then the value returning from the leaf is folded in as a second update. The resulting averaged value — blending the node's own cached evaluation with the subtree result — is what propagates further toward the root. Each repaired node also increments the visit count seen by all ancestors, so the tree's visit statistics remain consistent.
The n-in-flight counters are temporarily inflated before each finalize call to compensate for the increased multivisit, keeping virtual loss accounting balanced under the backup write lock.
Vibe coded with Claude Code (Opus 4.6).