Policy decay to child value by Menkib64 · Pull Request #2394 · LeelaChessZero/lc0

Menkib64 · 2026-03-01T20:23:46Z

This is an attempt to decay prior policy to value defined policy. Default values are random guess which might not be completely stupid. These would have to be tuned to work well.

Copilot

Pull request overview

This PR introduces a “policy decay” mechanism that blends the prior policy with a value-derived policy as node visits increase, configurable via new search parameters. It applies to both classic and dag_classic search paths and also surfaces the decayed policy in verbose stats.

Changes:

Add PolicyDecay(...) and apply it during child selection (classic + dag_classic).
Add new UCI/CLI options for decay configuration (temperature, visit horizon, value share).
Enhance dag_classic verbose stats to display decayed policy (PD) and use it in U/S reporting.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
src/search/dag_classic/search.cc	Adds PolicyDecay and uses decayed policy in verbose stats and U/S calculations.
src/search/dag_classic/node.h	Adds `EdgeAndNode::GetIndex()` helper for mapping edges to array indices.
src/search/classic/search.cc	Adds PolicyDecay and applies it to `current_pol` during node picking.
src/search/classic/params.h	Exposes new decay/temperature getters on search params.
src/search/classic/params.cc	Registers new options and stores decay parameters.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-04T15:02:06Z

+      kPolicyValueTemperature(1.0f / options.Get<float>(kPolicyValueTemperatureId)),
+      kPolicyDecayVisits(options.Get<int>(kPolicyDecayVisitsId)),
+      kPolicyDecayValueShare(options.Get<float>(kPolicyDecayValueShareId) / 100.0f),


kPolicyValueTemperature is initialized as 1.0f / PolicyValueTemperature, but the getter is named GetPolicyValueTemperature() and the option help refers to "Tau". This inversion is easy to misread later; consider renaming the cached field/getter to indicate it's an inverse temperature, or store the raw tau and perform the inversion at the call site.

Copilot · 2026-03-04T15:02:07Z

+  unsigned GetIndex(const Node* parent) const {
+    return edge_ - parent->GetLowNode()->GetEdges();
+  }


edge_ - parent->GetLowNode()->GetEdges() is a ptrdiff_t; returning it as unsigned can hide negative values and may trigger sign-conversion warnings. Consider returning size_t/uint16_t and static_casting from the pointer difference after an assertion that the pointers are in the same array.

Menkib64 · 2026-03-04T15:06:23Z

Proposed defaults are based on a little local exploration. They reach about -10 elo against SF in ultra bullet. The best configurations without decay reach -5 elo. My search has already found a few configurations up to +18 elo when decay is enabled. I think there is need to update the default tune. The update should be a separate pull request when there is more data about good configurations. My tuning model haven't yet learned parameter interactions which are fairly complicated.

…on-value

Using parent nodes to decay policy can suppress low policy values completely after the first visit evaluates them badly. This is likely not a good enough exploration. I suspect it causes problems using 100% decay in earlier test. Using child visits aims to avoid supressing a move only after it has had a fair chance to prove the early evaluation wrong.

zz4032 · 2026-03-22T13:12:44Z

Gauntlet: SF vs. ba4dbdf (outdated by now) and its master base 702d4b8, both with parameters tuned:

RANK  NAME                      :  ELO  ERROR        LOS(%)  PAIRS
==================================================================
1     lc0_master_b38ec9e        :  7.5  (-4.5/+5.0)    89.0   2500
2     lc0_PR2394_ba4dbdf        :  3.3  (-5.1/+4.7)    90.3   2500
3     stockfish_master_702d4b8  :  0.0  (-0.0/+0.0)     0.0   5000

Network: 791556, MinibatchSize=384 (2x GPU), Backend=roundrobin
lc0_master_b38ec9e:
CPuct=2.027, FpuValue=0.413, PolicyTemperature=1.295
NPM (median): 103505
lc0_PR2394_ba4dbdf:
CPuct=1.523, FpuValue=0.416, PolicyTemperature=1.397, PolicyValueTemperature=1.114, PolicyDecayValueShare=18.4, PolicyDecayVisits=58300
NPM (median): 93297 (-9.9% to master base)

Menkib64 · 2026-03-22T13:22:09Z

Network: 791556, MinibatchSize=384 (2x GPU), Backend=roundrobin lc0_master_b38ec9e: CPuct=2.027, FpuValue=0.413, PolicyTemperature=1.295
NPM (median): 103505
lc0_PR2394_ba4dbdf: CPuct=1.523, FpuValue=0.416, PolicyTemperature=1.397, PolicyValueTemperature=1.114, PolicyDecayValueShare=18.4, PolicyDecayVisits=58300
NPM (median): 93297 (-9.9% to master base)

Your tune wants to reduce search wide more than my tune for the older version. I can see that there is similar performance drop when my tune tests similar configurations. I'm thinking that depth first prefetching might be an important feature to improve search without losing performance.

I have made a few new iterations after the initial version. The current version seem to produce more consistent policy shapes for different type of positions. I'm still in progress to discover the best configuration.

…on-value

Policy decay to child value

34f3b52

This is an attempt to decay prior policy to value defined policy. Default values are random guess which might not be completely stupid. These would have to be tuned to work well.

Menkib64 force-pushed the policy-decay-based-on-value branch from 2c66bb5 to 34f3b52 Compare March 1, 2026 20:29

Menkib64 added 4 commits March 4, 2026 16:27

Fix windows build

1cf0ac0

Simplify policy decay code

f66bb3e

Dag shows both original and decayed policy

3510f24

Improve default guesses

00234d1

Menkib64 marked this pull request as ready for review March 4, 2026 14:55

Copilot AI review requested due to automatic review settings March 4, 2026 14:55

Copilot started reviewing on behalf of Menkib64 March 4, 2026 14:56 View session

Copilot AI reviewed Mar 4, 2026

View reviewed changes

Menkib64 added 16 commits March 4, 2026 17:09

Fix policy decay visit lower limit

e074660

Uncertanty -> Uncertainty

3cdab06

Fix potential error if all visited children are terminal

0450728

Improve help grammar

b9f3853

Fix maximum policy value calculation for terminals

b40a6cf

Correct policy value temperature to the test point

6624b25

Defaults updated to improved tune values

e131ec7

Merge remote-tracking branch 'origin/master' into policy-decay-based-…

ba4dbdf

…on-value

Maximum value difference adjust value temperature

73f4c98

Avoid too extremely sharp when values are close

35cc1d9

Use both child and parent visit for interpolation.

0066c31

Fix uci option name

89366a3

Replace softmax with polynomial function

fa5ecb3

Improve defaults based on tune

80bbb26

Normalize value to make policy sharpness more uniform

3d88d53

Allow sharper policy when both sides can win in the position

5ff190c

Menkib64 added 4 commits March 29, 2026 18:52

Guess a potential good configuration

01852e8

Merge remote-tracking branch 'origin/master' into policy-decay-based-…

d090799

…on-value

Merge remote-tracking branch 'origin/master' into policy-decay-based-…

5ffee82

…on-value

Adjust defaults

5c5e3f9

Menkib64 marked this pull request as draft April 11, 2026 13:24

Menkib64 added 3 commits April 12, 2026 22:59

Delay policy decay if value predicts move is bad

bb3357b

Scale reduction delay based on prior policy

18ff9b2

Scale policy decay based on prior and reduce type conversions.

9ba5e2b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Policy decay to child value#2394

Policy decay to child value#2394
Menkib64 wants to merge 29 commits intoLeelaChessZero:masterfrom
Menkib64:policy-decay-based-on-value

Menkib64 commented Mar 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Uh oh!

Uh oh!

Menkib64 commented Mar 4, 2026

Uh oh!

zz4032 commented Mar 22, 2026

Uh oh!

Menkib64 commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Menkib64 commented Mar 1, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Menkib64 commented Mar 4, 2026

Uh oh!

zz4032 commented Mar 22, 2026

Uh oh!

Menkib64 commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants