-
Notifications
You must be signed in to change notification settings - Fork 571
Randomly play opening moves based on priors #873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
As a sanity check letting the random openings play the random number of plies up to 30 then immediately adjudicating:
Those total per-ply counts do match up pretty closely to 1 / (30 + 1) * 10000 ~= 323 |
I think we'll want a temperature parameter for this pre-training period. And maybe some other parameters. But I am favorable to the concept. |
I can understand the desire to have additional parameters like temperature, but do we have a concrete example of what behavior we want more than controlling the max? And even then, should that be done in a followup? And to be clear, those existing move temperature options can be used after the random opening, e.g., if random opening picked 0-ply, regular visit-ful search can still use the 6 existing options: Temperature, TempDecayMoves, TempCutoffMove, TempEndgame, TempValueCutoff and TempVisitOffset. |
adding a temp to this seems key, at least T40 has very extreme policies in main lines where this wouldn't help exploration at all |
The priors used for the random opening are after noise has been applied to a newly extended root node. So with something like #267, there would be a way to control the priors a bit instead of the fixed 25% epsilon -- although that would affect noise everywhere and not just the opening… |
It does seem easier to me to just increase the temp value than to do a temp over the noise. That way you don't have to tune noise for this specifically. Having two sources of randomness interact with each other so directly doesn't seem beneficial |
Noise will certainly help here - maybe that is good enough - but I expect some additional flattening would be quite likely to be useful. We already use temp > 1 for our first moves - while they have visits and this does not, it does seem intuitive that the temperature used before the first moves of the game should be even higher. |
Updated PR with two options: |
Another sanity check testing temperature for first ply using 53200 with 10k games:
And similarly except forcing e2e4 as the opening move:
|
05f4ef6
to
02f70e9
Compare
I just wanted to point out that the matter discussed in this PR is within the expected behavior of using #918 in training with |
r?@Tilps or @mooskagh Fixes #342 by adding selfplay
--random-opening-max-plies
that results in a random number of plies doing a single visit to randomly pick a move proportional to the (noised) priors while not saving training data for these early moves. Can also change--random-opening-temperature
from 1.0 to larger/flatten or smaller/sharper priors.Not explicitly mentioned in the issue but implemented here is the skipping of adjudication during this random opening to hopefully allow more opening diversity of positions that otherwise would have been overlooked if assuming opponent will play best move -> resign.
This doesn't change the existing default behavior of 0 prior-based random moves, but it allows the server to change it and even in addition to regular temperature settings.
If the game ends while playing randomly, it's treated like an abort.
This includes/supersedes #773 as I needed to move the
MakeMove
earlier anyway (but still after getting training data).