Skip to content

Randomly play opening moves based on priors #873

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

Mardak
Copy link
Contributor

@Mardak Mardak commented Jun 3, 2019

r?@Tilps or @mooskagh Fixes #342 by adding selfplay --random-opening-max-plies that results in a random number of plies doing a single visit to randomly pick a move proportional to the (noised) priors while not saving training data for these early moves. Can also change --random-opening-temperature from 1.0 to larger/flatten or smaller/sharper priors.

Not explicitly mentioned in the issue but implemented here is the skipping of adjudication during this random opening to hopefully allow more opening diversity of positions that otherwise would have been overlooked if assuming opponent will play best move -> resign.

This doesn't change the existing default behavior of 0 prior-based random moves, but it allows the server to change it and even in addition to regular temperature settings.

If the game ends while playing randomly, it's treated like an abort.

This includes/supersedes #773 as I needed to move the MakeMove earlier anyway (but still after getting training data).

@Mardak
Copy link
Contributor Author

Mardak commented Jun 3, 2019

As a sanity check letting the random openings play the random number of plies up to 30 then immediately adjudicating:

./lc0 selfplay -w 53200 --visits=1 --games=10000 --resign-percentage=100 --max-random-opening-plies=30

# 326 games that had 0-ply openings (followed by highest prior move to end the game)
 326 e2e4

# 300 games that had 1-ply openings (followed by highest prior move to end the game)
# the distribution of the first ply does match up with noised 53200 priors
   1 a2a4 g8f6
   1 e2e3 d7d5
   2 b2b3 e7e5
   2 b2b4 e7e5
   3 a2a4 e7e5
   4 b1c3 d7d5
   4 h2h3 e7e5
   5 f2f4 g8f6
   5 g2g4 d7d5
   5 h2h4 e7e5
   6 b1a3 e7e5
   7 e2e3 e7e6
   7 f2f3 e7e5
   7 g1h3 d7d5
   7 g2g3 e7e5
   8 a2a3 e7e5
   8 c2c3 g8f6
   8 c2c4 e7e5
   9 d2d3 d7d5
  10 d2d4 g8f6
  12 g1f3 g8f6
 179 e2e4 e7e6

# 318 games that had 2-ply openings (followed by highest prior move to end the game)
…
   1 h2h4 g8h6 e2e4
   2 a2a4 d7d5 g1f3
   2 b1a3 e7e5 a3c4
   2 b2b3 e7e6 c1b2
   2 b2b4 e7e5 c1b2
   2 c2c4 c7c5 g1f3
   2 d2d3 d7d5 g1f3
   2 e2e3 c7c5 d2d4
   2 e2e3 e7e6 g1f3
   2 e2e4 b7b5 f1b5
   2 e2e4 e7e6 d2d4
   2 e2e4 g8f6 e4e5
   3 a2a3 e7e5 c2c4
   3 c2c4 e7e5 g2g3
   3 d2d4 d7d5 c2c4
   3 e2e4 g8h6 d2d4
   3 e2e4 h7h6 d2d4
   3 f2f4 g8f6 e2e3
   3 g1f3 d7d5 d2d4
   3 g1f3 g8f6 d2d4
   4 d2d4 g8f6 g1f3
   4 e2e4 b8a6 g1f3
   4 e2e4 f7f5 e4f5
   4 e2e4 f7f6 d2d4
   4 e2e4 h7h5 d2d4
   4 g1h3 d7d5 g2g3
   6 b1c3 d7d5 d2d4
   6 e2e4 d7d6 d2d4
   6 h2h3 e7e5 c2c4
   7 e2e4 b7b6 d2d4
   7 e2e4 b8c6 d2d4
   8 e2e4 a7a6 d2d4
   8 e2e4 d7d5 e4d5
   9 e2e4 c7c6 d2d4
  11 e2e4 g7g6 d2d4
  16 e2e4 e7e5 g1f3
  25 e2e4 c7c5 g1f3
  81 e2e4 e7e6 b1c3

# 325 for 3-ply
# 337 for 4-ply
# 307 for 5-ply
# 352 for 6-ply
# 325 for 7-ply
# 319 for 8-ply
# 339 for 9-ply
# 306 for 10-ply
# 328 for 11-ply
# 315 for 12-ply
# 354 for 13-ply
# 300 for 14-ply
# 331 for 15-ply
# 327 for 16-ply
# 320 for 17-ply
# 315 for 18-ply
# 325 for 19-ply
# 308 for 20-ply
# 300 for 21-ply
# 357 for 22-ply
# 312 for 23-ply
# 325 for 24-ply
# 304 for 25-ply
# 323 for 26-ply
# 354 for 27-ply
# 315 for 28-ply
# 283 for 29-ply
# 298 for 30-ply

Those total per-ply counts do match up pretty closely to 1 / (30 + 1) * 10000 ~= 323

@Tilps
Copy link
Contributor

Tilps commented Jun 3, 2019

I think we'll want a temperature parameter for this pre-training period. And maybe some other parameters. But I am favorable to the concept.

@Mardak
Copy link
Contributor Author

Mardak commented Jun 3, 2019

I can understand the desire to have additional parameters like temperature, but do we have a concrete example of what behavior we want more than controlling the max? And even then, should that be done in a followup?

And to be clear, those existing move temperature options can be used after the random opening, e.g., if random opening picked 0-ply, regular visit-ful search can still use the 6 existing options: Temperature, TempDecayMoves, TempCutoffMove, TempEndgame, TempValueCutoff and TempVisitOffset.

@aartdappel
Copy link

adding a temp to this seems key, at least T40 has very extreme policies in main lines where this wouldn't help exploration at all

@Mardak
Copy link
Contributor Author

Mardak commented Jun 3, 2019

The priors used for the random opening are after noise has been applied to a newly extended root node. So with something like #267, there would be a way to control the priors a bit instead of the fixed 25% epsilon -- although that would affect noise everywhere and not just the opening…

@aartdappel
Copy link

It does seem easier to me to just increase the temp value than to do a temp over the noise. That way you don't have to tune noise for this specifically. Having two sources of randomness interact with each other so directly doesn't seem beneficial

@Tilps
Copy link
Contributor

Tilps commented Jun 3, 2019

Noise will certainly help here - maybe that is good enough - but I expect some additional flattening would be quite likely to be useful. We already use temp > 1 for our first moves - while they have visits and this does not, it does seem intuitive that the temperature used before the first moves of the game should be even higher.

@Mardak
Copy link
Contributor Author

Mardak commented Jun 4, 2019

Updated PR with two options: --random-opening-max-plies and --random-opening-temperature (the latter can be set per player1. vs player2.)

@Mardak
Copy link
Contributor Author

Mardak commented Jun 4, 2019

Another sanity check testing temperature for first ply using 53200 with 10k games:

--policy-softmax-temp=1 (without noise)
g2g4 P:  0.29%
f2f3 P:  0.34%
g1h3 P:  0.37%
b1a3 P:  0.38%
h2h4 P:  0.41%
f2f4 P:  0.46%
b2b4 P:  0.47%
a2a4 P:  0.53%
h2h3 P:  0.56%
a2a3 P:  0.67%
d2d3 P:  0.72%
b1c3 P:  0.77%
b2b3 P:  0.82%
c2c3 P:  0.86%
e2e3 P:  1.29%
g2g3 P:  1.57%
c2c4 P:  1.67%
g1f3 P:  2.72%
d2d4 P:  2.86%
e2e4 P: 82.23%

--random-opening-temperature=100
 460 h2h4
 461 a2a3
 463 b1c3
 464 c2c4
 468 b2b4
 486 g2g4
 491 f2f3
 492 b1a3
 496 g1f3
 496 g1h3
 498 f2f4
 501 c2c3
 503 d2d4
 504 e2e3
 532 a2a4
 532 e2e4
 536 g2g3
 536 h2h3
 538 b2b3
 543 d2d3

--random-opening-temperature=10
 450 g1h3
 455 f2f3
 459 f2f4
 460 g2g3
 470 b2b3
 473 d2d3
 475 h2h4
 476 g2g4
 479 b1c3
 482 a2a4
 484 b2b4
 488 c2c3
 490 a2a3
 494 b1a3
 507 h2h3
 515 d2d4
 531 c2c4
 536 e2e3
 544 g1f3
 732 e2e4

--random-opening-temperature=1
 134 g2g4
 148 g1h3
 149 a2a3
 149 b1a3
 153 h2h3
 154 h2h4
 170 a2a4
 173 f2f3
 179 b1c3
 181 f2f4
 183 b2b4
 191 b2b3
 196 d2d3
 210 c2c3
 220 e2e3
 239 c2c4
 257 g2g3
 328 g1f3
 329 d2d4
6257 e2e4

--random-opening-temperature=0.5
  10 g2g4
  11 f2f3
  12 a2a3
  13 h2h3
  14 d2d3
  16 b2b4
  17 c2c3
  17 f2f4
  19 a2a4
  19 g1h3
  21 b1a3
  22 h2h4
  23 b2b3
  23 e2e3
  28 c2c4
  29 g2g3
  37 b1c3
  39 d2d4
  44 g1f3
9586 e2e4

--random-opening-temperature=0.1
10000 e2e4

And similarly except forcing e2e4 as the opening move:

--policy-softmax-temp=1 (without noise)
g7g5 P:  0.28%
b7b5 P:  0.29%
f7f5 P:  0.29%
f7f6 P:  0.35%
g8h6 P:  0.40%
h7h5 P:  0.46%
b8a6 P:  0.51%
a7a5 P:  0.68%
b7b6 P:  0.82%
h7h6 P:  0.88%
d7d5 P:  1.08%
g8f6 P:  1.10%
g7g6 P:  2.07%
d7d6 P:  2.49%
b8c6 P:  2.71%
a7a6 P:  2.82%
c7c6 P:  4.97%
e7e5 P:  9.17%
c7c5 P: 17.83%
e7e6 P: 50.81%

--random-opening-temperature=100
 453 e2e4 b8a6
 460 e2e4 d7d5
 466 e2e4 h7h6
 472 e2e4 f7f6
 475 e2e4 a7a6
 493 e2e4 g8f6
 495 e2e4 g8h6
 496 e2e4 c7c6
 500 e2e4 h7h5
 502 e2e4 b7b6
 507 e2e4 b8c6
 507 e2e4 g7g6
 511 e2e4 f7f5
 512 e2e4 g7g5
 518 e2e4 a7a5
 518 e2e4 c7c5
 523 e2e4 b7b5
 524 e2e4 d7d6
 525 e2e4 e7e6
 543 e2e4 e7e5

--random-opening-temperature=10
 412 e2e4 h7h5
 421 e2e4 f7f5
 439 e2e4 g8h6
 447 e2e4 f7f6
 454 e2e4 g8f6
 455 e2e4 b7b6
 465 e2e4 b7b5
 466 e2e4 b8a6
 470 e2e4 g7g5
 478 e2e4 d7d5
 488 e2e4 d7d6
 491 e2e4 b8c6
 493 e2e4 h7h6
 496 e2e4 a7a5
 517 e2e4 a7a6
 539 e2e4 g7g6
 559 e2e4 c7c6
 596 e2e4 e7e5
 620 e2e4 c7c5
 694 e2e4 e7e6

--random-opening-temperature=1
 136 e2e4 g7g5
 138 e2e4 g8h6
 146 e2e4 b7b5
 153 e2e4 f7f5
 155 e2e4 a7a5
 166 e2e4 h7h6
 171 e2e4 b8a6
 174 e2e4 h7h5
 179 e2e4 f7f6
 196 e2e4 b7b6
 202 e2e4 d7d5
 231 e2e4 g8f6
 283 e2e4 g7g6
 284 e2e4 d7d6
 301 e2e4 b8c6
 332 e2e4 a7a6
 501 e2e4 c7c6
 774 e2e4 e7e5
1561 e2e4 c7c5
3917 e2e4 e7e6

--random-opening-temperature=0.5
  29 e2e4 a7a5
  31 e2e4 b7b5
  31 e2e4 h7h5
  34 e2e4 b8a6
  34 e2e4 d7d5
  36 e2e4 g7g5
  37 e2e4 f7f6
  37 e2e4 g8h6
  43 e2e4 f7f5
  43 e2e4 g8f6
  45 e2e4 h7h6
  46 e2e4 b7b6
  58 e2e4 g7g6
  66 e2e4 d7d6
  83 e2e4 b8c6
  84 e2e4 a7a6
 143 e2e4 c7c6
 332 e2e4 e7e5
1042 e2e4 c7c5
7746 e2e4 e7e6

--random-opening-temperature=0.1
10000 e2e4 e7e6

@Mardak
Copy link
Contributor Author

Mardak commented Sep 13, 2019

@Tilps I rebased on latest master resolving adjacent-line conflicts with #821. Is the current set of options sufficient for the server to control the behavior at least for a first attempt:

  • random-opening-temperature
  • random-opening-max-plies

@Naphthalin
Copy link
Contributor

I just wanted to point out that the matter discussed in this PR is within the expected behavior of using #918 in training with temp=1.0 where the % of each line is given by the equlibrium policies.

@Mardak
Copy link
Contributor Author

Mardak commented Jan 1, 2020

#342 was fixed with #964 so closing this.

@Mardak Mardak closed this Jan 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Separate exploration from training feedback (alternate method of lowering T)
4 participants