Skip to content

Cache piercing with backpropagation repair#2403

Draft
mooskagh wants to merge 3 commits intoLeelaChessZero:masterfrom
mooskagh:cache-piercing-20260324
Draft

Cache piercing with backpropagation repair#2403
mooskagh wants to merge 3 commits intoLeelaChessZero:masterfrom
mooskagh:cache-piercing-20260324

Conversation

@mooskagh
Copy link
Copy Markdown
Member

When a visit encounters a node whose NN result is already in cache, instead of stopping and queuing it for batch evaluation, the visit materializes the node immediately and continues deeper through it. This is repeated up to --cache-piercing times per visit, allowing a single visit to traverse multiple cached layers before hitting an uncached position.

The intermediate nodes created this way have their evaluation set from cache but are left with N=0 (no completed visits). During backpropagation, these nodes are "repaired": the cached evaluation is promoted to a real visit (N=1), then the value returning from the leaf is folded in as a second update. The resulting averaged value — blending the node's own cached evaluation with the subtree result — is what propagates further toward the root. Each repaired node also increments the visit count seen by all ancestors, so the tree's visit statistics remain consistent.

The n-in-flight counters are temporarily inflated before each finalize call to compensate for the increased multivisit, keeping virtual loss accounting balanced under the backup write lock.

Vibe coded with Claude Code (Opus 4.6).

When a visit encounters a node whose NN result is already in cache,
instead of stopping and queuing it for batch evaluation, the visit
materializes the node immediately and continues deeper through it.
This is repeated up to --cache-piercing times per visit, allowing a
single visit to traverse multiple cached layers before hitting an
uncached position.

The intermediate nodes created this way have their evaluation set from
cache but are left with N=0 (no completed visits). During
backpropagation, these nodes are "repaired": the cached evaluation is
promoted to a real visit (N=1), then the value returning from the leaf
is folded in as a second update. The resulting averaged value — blending
the node's own cached evaluation with the subtree result — is what
propagates further toward the root. Each repaired node also increments
the visit count seen by all ancestors, so the tree's visit statistics
remain consistent.

The n-in-flight counters are temporarily inflated before each finalize
call to compensate for the increased multivisit, keeping virtual loss
accounting balanced under the backup write lock.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an optional “cache piercing” mode to classic search so a single visit can traverse through multiple cached NN layers, then repairs the intermediate nodes during backup to keep visit statistics coherent.

Changes:

  • Introduces --cache-piercing / CachePiercing option and plumbs it into classic search params.
  • Extends ProcessPickedTask to materialize cache-hit nodes immediately and continue selection deeper (up to the configured limit).
  • Updates backup logic to “repair” cache-pierced intermediate nodes and adjust playout counters accordingly.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
src/search/classic/search.cc Implements cache piercing in processing and adds repair logic during backup.
src/search/classic/params.h Exposes GetCachePiercing() and stores the cached param value.
src/search/classic/params.cc Registers the new option and initializes kCachePiercing.
src/search/classic/node.h Adds node helpers for setting cached values and promoting them to a real visit.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

search_->network_evaluations_++;
}
search_->cum_depth_ += node_to_process.depth * node_to_process.multivisit;
search_->cum_depth_ += node_to_process.depth * multivisit;
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

total_playouts_ is incremented by multivisit + extra_multivisit, but cum_depth_ is only incremented by depth * multivisit (excluding extra_multivisit). This makes average_depth = cum_depth_/total_playouts_ systematically smaller whenever cache piercing repairs occur, which can break depth-based stopping (DepthStopper) and misreport depth statistics. Update cum_depth_ to account for the additional (repaired) visits (using an appropriate depth for each extra visit), or avoid counting repaired visits in total_playouts_ if they’re not meant to affect depth-based metrics.

Suggested change
search_->cum_depth_ += node_to_process.depth * multivisit;
search_->cum_depth_ += node_to_process.depth * (multivisit + extra_multivisit);

Copilot uses AI. Check for mistakes.
Comment thread src/search/classic/node.h
Comment on lines +212 to +215
// Sets wl/d/m from cached NN values without changing N or NInFlight.
void SetCachedValue(float v, float d, float m) { wl_ = v; d_ = d; m_ = m; }
// Converts a cached value (set by SetCachedValue) into a real visit (N=1).
void MakeCachedVisitReal() { n_ = 1; }
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SetCachedValue() / MakeCachedVisitReal() directly mutate wl_/d_/m_ and n_ without any validation. If either is accidentally called on a node with N>0 (or without the intended “exclusive ownership” condition), it will silently corrupt the node’s running average / visit invariants. Consider adding debug-time asserts documenting the required preconditions (e.g., n_==0 / n_in_flight_>0 / not terminal), and/or restrict these helpers’ visibility to the specific search logic that needs them.

Copilot uses AI. Check for mistakes.
picked_node.eval->m);
auto best_edge = node->Edges().begin();
Node* child = best_edge.GetOrSpawnNode(node);
child->TryStartScoreUpdate();
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cache piercing spawns/uses child and calls TryStartScoreUpdate() but ignores the return value. If it returns false (e.g., another thread already started updating this child), the code still continues as if the node is exclusively owned, which can violate the assumptions in ExtendNode() (N=0, N-in-flight=1) and lead to incorrect virtual-loss / backup behavior. Handle the false case explicitly (treat as a collision / stop piercing and restore state accordingly) before continuing deeper.

Suggested change
child->TryStartScoreUpdate();
if (!child->TryStartScoreUpdate()) {
// Another thread is already updating this child; treat as a
// collision and stop cache piercing for this node.
break;
}

Copilot uses AI. Check for mistakes.
Introduces four strategies for how cache-pierced intermediate nodes
(which have cached NN values but N=0) are handled during backpropagation:

- "none": cached value is overwritten by the leaf value, single visit
  propagates as before cache piercing repair was introduced.
- "accumulate": each intermediate node's cached value counts as an
  additional real visit, increasing the multivisit count for all
  ancestors and blending the cached evaluation into the propagated
  average.
- "node": the node's own cached evaluation replaces the propagated
  value entirely, so the parent sees the nearest cached evaluation
  rather than the distant leaf.
- "blend": at each cached layer, the propagated value is averaged
  50/50 with the node's cached value, giving exponentially decaying
  weight to the leaf (contribution halves per cached layer).

Default is "none" to preserve existing behavior.
When cache piercing exhausts its budget on a cached node, the node was
previously OOO-evaluated — backed up immediately and removed from the
batch. This released virtual loss mid-gathering, making the same
subtree eligible for re-picking within the same batch.

Now these nodes stay in the minibatch and go through normal backprop at
the end, preserving virtual loss during the entire gathering phase.
Terminals are unaffected and still get OOO-evaluated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants