Cache piercing with backpropagation repair by mooskagh · Pull Request #2403 · LeelaChessZero/lc0

mooskagh · 2026-03-24T19:03:00Z

When a visit encounters a node whose NN result is already in cache, instead of stopping and queuing it for batch evaluation, the visit materializes the node immediately and continues deeper through it. This is repeated up to --cache-piercing times per visit, allowing a single visit to traverse multiple cached layers before hitting an uncached position.

The intermediate nodes created this way have their evaluation set from cache but are left with N=0 (no completed visits). During backpropagation, these nodes are "repaired": the cached evaluation is promoted to a real visit (N=1), then the value returning from the leaf is folded in as a second update. The resulting averaged value — blending the node's own cached evaluation with the subtree result — is what propagates further toward the root. Each repaired node also increments the visit count seen by all ancestors, so the tree's visit statistics remain consistent.

The n-in-flight counters are temporarily inflated before each finalize call to compensate for the increased multivisit, keeping virtual loss accounting balanced under the backup write lock.

Vibe coded with Claude Code (Opus 4.6).

When a visit encounters a node whose NN result is already in cache, instead of stopping and queuing it for batch evaluation, the visit materializes the node immediately and continues deeper through it. This is repeated up to --cache-piercing times per visit, allowing a single visit to traverse multiple cached layers before hitting an uncached position. The intermediate nodes created this way have their evaluation set from cache but are left with N=0 (no completed visits). During backpropagation, these nodes are "repaired": the cached evaluation is promoted to a real visit (N=1), then the value returning from the leaf is folded in as a second update. The resulting averaged value — blending the node's own cached evaluation with the subtree result — is what propagates further toward the root. Each repaired node also increments the visit count seen by all ancestors, so the tree's visit statistics remain consistent. The n-in-flight counters are temporarily inflated before each finalize call to compensate for the increased multivisit, keeping virtual loss accounting balanced under the backup write lock.

Copilot

Pull request overview

Adds an optional “cache piercing” mode to classic search so a single visit can traverse through multiple cached NN layers, then repairs the intermediate nodes during backup to keep visit statistics coherent.

Changes:

Introduces --cache-piercing / CachePiercing option and plumbs it into classic search params.
Extends ProcessPickedTask to materialize cache-hit nodes immediately and continue selection deeper (up to the configured limit).
Updates backup logic to “repair” cache-pierced intermediate nodes and adjust playout counters accordingly.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
src/search/classic/search.cc	Implements cache piercing in processing and adds repair logic during backup.
src/search/classic/params.h	Exposes `GetCachePiercing()` and stores the cached param value.
src/search/classic/params.cc	Registers the new option and initializes `kCachePiercing`.
src/search/classic/node.h	Adds node helpers for setting cached values and promoting them to a real visit.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-24T19:10:03Z

    search_->network_evaluations_++;
  }
-  search_->cum_depth_ += node_to_process.depth * node_to_process.multivisit;
+  search_->cum_depth_ += node_to_process.depth * multivisit;


total_playouts_ is incremented by multivisit + extra_multivisit, but cum_depth_ is only incremented by depth * multivisit (excluding extra_multivisit). This makes average_depth = cum_depth_/total_playouts_ systematically smaller whenever cache piercing repairs occur, which can break depth-based stopping (DepthStopper) and misreport depth statistics. Update cum_depth_ to account for the additional (repaired) visits (using an appropriate depth for each extra visit), or avoid counting repaired visits in total_playouts_ if they’re not meant to affect depth-based metrics.

Suggested change

search_->cum_depth_ += node_to_process.depth * multivisit;

search_->cum_depth_ += node_to_process.depth * (multivisit + extra_multivisit);

Copilot · 2026-03-24T19:10:04Z

+  // Sets wl/d/m from cached NN values without changing N or NInFlight.
+  void SetCachedValue(float v, float d, float m) { wl_ = v; d_ = d; m_ = m; }
+  // Converts a cached value (set by SetCachedValue) into a real visit (N=1).
+  void MakeCachedVisitReal() { n_ = 1; }


SetCachedValue() / MakeCachedVisitReal() directly mutate wl_/d_/m_ and n_ without any validation. If either is accidentally called on a node with N>0 (or without the intended “exclusive ownership” condition), it will silently corrupt the node’s running average / visit invariants. Consider adding debug-time asserts documenting the required preconditions (e.g., n_==0 / n_in_flight_>0 / not terminal), and/or restrict these helpers’ visibility to the specific search logic that needs them.

Copilot · 2026-03-24T19:10:04Z

+                           picked_node.eval->m);
+      auto best_edge = node->Edges().begin();
+      Node* child = best_edge.GetOrSpawnNode(node);
+      child->TryStartScoreUpdate();


Cache piercing spawns/uses child and calls TryStartScoreUpdate() but ignores the return value. If it returns false (e.g., another thread already started updating this child), the code still continues as if the node is exclusively owned, which can violate the assumptions in ExtendNode() (N=0, N-in-flight=1) and lead to incorrect virtual-loss / backup behavior. Handle the false case explicitly (treat as a collision / stop piercing and restore state accordingly) before continuing deeper.

Suggested change

child->TryStartScoreUpdate();

if (!child->TryStartScoreUpdate()) {

// Another thread is already updating this child; treat as a

// collision and stop cache piercing for this node.

break;

}

Introduces four strategies for how cache-pierced intermediate nodes (which have cached NN values but N=0) are handled during backpropagation: - "none": cached value is overwritten by the leaf value, single visit propagates as before cache piercing repair was introduced. - "accumulate": each intermediate node's cached value counts as an additional real visit, increasing the multivisit count for all ancestors and blending the cached evaluation into the propagated average. - "node": the node's own cached evaluation replaces the propagated value entirely, so the parent sees the nearest cached evaluation rather than the distant leaf. - "blend": at each cached layer, the propagated value is averaged 50/50 with the node's cached value, giving exponentially decaying weight to the leaf (contribution halves per cached layer). Default is "none" to preserve existing behavior.

When cache piercing exhausts its budget on a cached node, the node was previously OOO-evaluated — backed up immediately and removed from the batch. This released virtual loss mid-gathering, making the same subtree eligible for re-picking within the same batch. Now these nodes stay in the minibatch and go through normal backprop at the end, preserving virtual loss during the entire gathering phase. Terminals are unaffected and still get OOO-evaluated.

mooskagh requested a review from Copilot March 24, 2026 19:03

Copilot started reviewing on behalf of mooskagh March 24, 2026 19:03 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

mooskagh added 2 commits March 24, 2026 21:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache piercing with backpropagation repair#2403

Cache piercing with backpropagation repair#2403
mooskagh wants to merge 3 commits intoLeelaChessZero:masterfrom
mooskagh:cache-piercing-20260324

mooskagh commented Mar 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	search_->cum_depth_ += node_to_process.depth * multivisit;
	search_->cum_depth_ += node_to_process.depth * (multivisit + extra_multivisit);

-      child->TryStartScoreUpdate();
+      if (!child->TryStartScoreUpdate()) {
+        // Another thread is already updating this child; treat as a
+        // collision and stop cache piercing for this node.
+        break;
+      }

Conversation

mooskagh commented Mar 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants