Use the v6 field played_q to do a more direct blunder rescoring #5

Naphthalin · 2021-01-09T16:18:37Z

Draft for how to use the new field played_q from v6 training data LeelaChessZero#1468 in rescoring:

compare best_q and played_q
if their difference is above a threshold, a proven win was missed, or a proven loss played without being forced, rescore all positions until that position
rescore q, d and m (last one is still missing)

There is currently no code dealing with the special case of playing a longer/shorter proven line which gives inaccurate m but accurate q and d.

In general, the current code base is split between the final position where no consecutive position exists, and all other positions for which a consecutive position exists. I suppose this split isn't needed for the v6 rescoring following https://github.com/LeelaChessZero/lc0/issue/1308/ though I worked with it here.

…losses

…h v6 data instead

fix AppVeyor build

src/selfplay/loop.cc

masterkni6 · 2021-02-02T02:02:26Z

Is the rescorer supposed to change result_q and result_d of the blunder move to best_q and best_d? I don't really understand the training all too well but doesn't that make it think the blunder is a good move?

Naphthalin · 2021-02-02T10:58:58Z

Is the rescorer supposed to change result_q and result_d of the blunder move to best_q and best_d? I don't really understand the training all too well but doesn't that make it think the blunder is a good move?

There might be a small misconception on your side about the training: The Lc0 training trains on positions, not on played moves; the played move is only relevant for the game outcome. Changing result_q/result_d when a blunder occurs is the equivalent of a human "experimenting" in a position, i.e. "the current position is slightly better for white. This move looks interesting, let's see what happens even though I think it's a bad move." It doesn't make the blunder itself look good, but the position where the blunder happened because of temp exploration.
Does that make sense?

Tilps · 2021-02-02T11:02:16Z

Is the rescorer supposed to change result_q and result_d of the blunder move to best_q and best_d? I don't really understand the training all too well but doesn't that make it think the blunder is a good move?

No - it avoids having the game outcome after the blunder, affect the training target before the blunder.
Nothing about that causes the policy of the bad move to increase, or for the training target after the blunder to be changed. (Unless there are multiple blunders in the game... but the logic still holds there.)
The main risk of this change is deblundering too often per game. Unless there are runs of multiple positions between each blunder - this change almost becomes like training on a high q ratio. Which is known to be a tiny bit weaker than what we already have. The hypothesis is that there exists some happy medium where runs of training target are left unchanged far enough that learnings propagate back through the game effectively, while not also learning to expect blunders so much.

… win

* Add a mode to turn lc0 into a chunk data rescorer powered by Tablebase. * Add some stats. * Add secondary rescoring using wdl to reduce back propigation of blunders a bit. * Add policy distribution adjust support to rescorer. * Track the game outcomes, and the change to the start of the game * Add DTZ based assistance for secondary rescoring. * Change move count to a moves remaining to potentially use for modulating target value. * Use DTZ for pawnless 3 piece positions as a substitue for DTM to adjust move_count to be more correct * another fix. * More fixing. * Getting things compiling again. * Make rescorer more obvious. * reorder to match struct order. * Actually update the version when converting to v4 format. * Implement the threading support. * Fix compilation issues on some compilers. * More compilation fixing. * Fix off by one. * Add support for root probe policy boosting for minimum dtz in winning positions. * Fix test compile. * Fix missing option. * Add a counter. * Log if policy boost is for a move labelled illegal. * Add a histogram for total amount of boosted policy per boosted position. * Distribute boost rather than apply to all - also log before and after dists. * Add gaviotatb code for later use in dtm_boost * Fix compile issue on linux. * Prepare logic for dtm policy boost. * Load gaviota tb if specified. * Probe gaviota to decide which 'safe' moves are most deserving of boost based on dtm. * First attempt at supporting arbitrary starting point training data for rescorer. * Fix missing brackets. * Some fixes. * Avoid crashes from walking history before start of provided game information. * Some more merge fixes. * Fix some formatting. * Only process .gz files, don't crash out on invalid files, don't create output until input has been read. * Don't keep partially valid files. * Add basic range validation for input data. * Don't create writer any earlier than needed. * Fix decoding castling moves for the new Move format. * Validate game moves for legality. * Also log illegal move if it passes probability check but fails the real check. * Fix another merge error. * Compile fix for linux. * Plies left in rescorer (#1) * Rescore move_count using Gaviota TBs * Fix lczero-common commit * Add condition for Gaviota move_count rescoring * Post merge fixup for the kings/knights change in board. * Rescore tb v5 (#2) * Make lc0 output v5 training data. * Finish merge of v5 data into rescorer tb. * Fixes for rescoring v4 data. * Revert some unneeded formatting changes. * Support FRC input_format in rescoring. * Add some very important missing break statements... * Fix merge. * Change movement decode to not rely on there being any history planes filled in. Since that will not always be the case for input type 3. * Minimum changes to make it compile again post merge. * Input format 3 support. * Fix data range checks were incorrect for format 3 and 2. * Fix up bugs with chess 960 castle moves that leave a rook or king in place. * Post merge compile fixups for renames. * Add support for hectoplies and hectoplies armageddon to validate, and fixup the merge of latest code. * More fixes for type 4 and 132. * Add input format conversion support to rescorer. * Better match for training. * Add canonical v2 format to rescorer. * Add a utility for substituting policy from higher quality data into main data. * Fix missing option and add some commented out diagnostic code. * More cleanup in comments. * Handle empty policy-substitutions dir and input dir better. * Don't keep chunks that are marked as not for training. * More fixes for handling files with placeholder chunks. * Add 'deblunderer' Completely untested... * Fix some bugs in deblunder. * simplify windows rescorer build (#4) Co-authored-by: borg323 <[email protected]> * Tweak windows build file. * Some updates for writer.h/cc for v6 * Update rescorer loop.cc for V6. * Some additional validations to do with played_idx/best_idx. * make appveyor build the rescorer (#7) Co-authored-by: borg323 <[email protected]> * subproject for gaviota tb files (#8) Co-authored-by: borg323 <[email protected]> * 'Fix' for build on windows Probably should be fixed some other way... * Fix my breakage. (#9) * Update loop.cc * Update meson.build * Use the v6 field played_q to do a more direct blunder rescoring (#5) * included the issue 1308 deblunder mechanism in loop.cc * blunder detection now acts on missed proven wins and unforced proven losses * added comment on missing activeM * removed probabilistic randomization of result rescorer and worked with v6 data instead * included moves left rescore, removed unneeded options * doubled code not needed as final positions aren't special * changed appveyor script to hopefully build rescorer.sln * reverted failed attempt at fixing appveyor * included minimal std::cout for blunders * included blunder counter, added comment to visits v6 data checking * checking for bit 3 of invariance info to make sure best_q is a proven win * Fix v5 upgrading for decisive games. * Additional safety. * Add missing brackets. * don't keep the first TB position for the deblundering pass. (#10) * included the issue 1308 deblunder mechanism in loop.cc * blunder detection now acts on missed proven wins and unforced proven losses * added comment on missing activeM * removed probabilistic randomization of result rescorer and worked with v6 data instead * included moves left rescore, removed unneeded options * doubled code not needed as final positions aren't special * changed appveyor script to hopefully build rescorer.sln * reverted failed attempt at fixing appveyor * included minimal std::cout for blunders * included blunder counter, added comment to visits v6 data checking * checking for bit 3 of invariance info to make sure best_q is a proven win * don't keep the first TB position for rescorer * change recorer logo (#11) Co-authored-by: borg323 <[email protected]> * Make the deblunder transition soft through a width parameter (#13) * included the issue 1308 deblunder mechanism in loop.cc * blunder detection now acts on missed proven wins and unforced proven losses * added comment on missing activeM * removed probabilistic randomization of result rescorer and worked with v6 data instead * included moves left rescore, removed unneeded options * doubled code not needed as final positions aren't special * changed appveyor script to hopefully build rescorer.sln * reverted failed attempt at fixing appveyor * included minimal std::cout for blunders * included blunder counter, added comment to visits v6 data checking * checking for bit 3 of invariance info to make sure best_q is a proven win * don't keep the first TB position for rescorer * added a deblunder width parameter to allow a soft transition * clang formatting * resolve merge conflict * Add nnue plain file output (#12) * GetFen() from pr834 * first version of nnue output * flag to delete fils * address review comments * support pre v6 data * fix sign * correct nnue data misunderstanding Co-authored-by: borg323 <[email protected]> * fix copy-paste error (#15) Co-authored-by: borg323 <[email protected]> * add -t flag (#16) Co-authored-by: borg323 <[email protected]> * Post merge fixes. * Missed cleanup. * Fix input format change bug that can corrupt played_idx and best_idx * Post merge fixes. * fix merge * remove unnecessary options * split out rescore loop * minimize rescorer build * merge rescorer with master * minimize syzygy diff --------- Co-authored-by: Tilps <[email protected]> Co-authored-by: Henrik Forstén <[email protected]> Co-authored-by: borg323 <[email protected]> Co-authored-by: Naphthalin <[email protected]>

Naphthalin and others added 10 commits January 6, 2021 00:57

included the issue 1308 deblunder mechanism in loop.cc

3fc0425

blunder detection now acts on missed proven wins and unforced proven …

6531e5d

…losses

added comment on missing activeM

d159f75

Merge branch 'rescore_tb' into rescore_tb

97edb41

removed probabilistic randomization of result rescorer and worked wit…

e3cacc6

…h v6 data instead

included moves left rescore, removed unneeded options

99866d2

doubled code not needed as final positions aren't special

9a6b5c5

changed appveyor script to hopefully build rescorer.sln

2ea7031

Merge pull request #34 from Tilps/rescore_tb

909f6d6

fix AppVeyor build

reverted failed attempt at fixing appveyor

4eae4b8

Tilps reviewed Feb 1, 2021

View reviewed changes

src/selfplay/loop.cc Show resolved Hide resolved

src/selfplay/loop.cc Show resolved Hide resolved

included minimal std::cout for blunders

8700ffb

included blunder counter, added comment to visits v6 data checking

92ef36c

checking for bit 3 of invariance info to make sure best_q is a proven…

e75d887

… win

Tilps approved these changes Feb 2, 2021

View reviewed changes

Tilps merged commit 1235fa0 into Tilps:rescore_tb Feb 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use the v6 field played_q to do a more direct blunder rescoring #5

Use the v6 field played_q to do a more direct blunder rescoring #5

Uh oh!

Naphthalin commented Jan 9, 2021 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

masterkni6 commented Feb 2, 2021

Uh oh!

Naphthalin commented Feb 2, 2021

Uh oh!

Tilps commented Feb 2, 2021

Uh oh!

Uh oh!

Use the v6 field played_q to do a more direct blunder rescoring #5

Use the v6 field played_q to do a more direct blunder rescoring #5

Uh oh!

Conversation

Naphthalin commented Jan 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

masterkni6 commented Feb 2, 2021

Uh oh!

Naphthalin commented Feb 2, 2021

Uh oh!

Tilps commented Feb 2, 2021

Uh oh!

Uh oh!

Naphthalin commented Jan 9, 2021 •

edited

Loading