-
Notifications
You must be signed in to change notification settings - Fork 572
Fix policy softmax accuracy if masking is enabled. #912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix policy softmax accuracy if masking is enabled. #912
Conversation
CUDA fp16 looks now much more sane. Tested with net 60020 that gives completely flat policy for version 0.23.1 on startpos with fp16. This pr seems to fix it and policy for fp32 and fp16 are almost identical as can seen below. Although 1000 node search from startpos does not give same best move. Maybe this is just expected side effect of very flat policies of 60020. Used args: search results for go nodes 1000:
|
If it works for cuda 16, and regular, I'd say merge now and deal with potential opencl problems later. This will let us train with masking which has given us nice gains in sl |
Could you take some small network (e.g. 10b) and check on fast RTX GPU that it didn't become slower? Probably it didn't but worth checking. |
FWIW, a quick benchmark (best of three) with the Note that for practical backends, slightly less time needs to be spent on NN eval. So far, I had no big success in using approximate exp(.) (from an accuracy standpoint of view, it does go faster). |
Ok, now it works better. With the FastExp calculations (which are in line with @borg323's fast approximation for softmax policy temperature corrections), results are better now, best of three (under current laptop load conditions) with random backend gives 238537 nps (master), 231485 nps (with approx.), 228647 (without approx.). The slight loss in accuracy is marginal:
|
This reverts commit 9fb73d0.
I wonder if the random backend should change the values it outputs for policy in random mode to have potentially greater range with this change. |
|
The result of GetPValue as of this change can be anything - any positive or negative float value. Use of max move the range to be non-positive, with at least one value being 0. |
Is there an easy way to make sure it works? Is there supposed to be an elo
gain with any net that was trained with policy making or is more like all
or nothing where it gains 200 elo if it happened to be broken net?
…On Tue, Jul 30, 2019, 3:01 PM Dieter Dobbelaere ***@***.***> wrote:
Perform softmax outside backends on set of legal moves.
This should fix the limited accuracy issues observed with cuda backend
FP16 in combination with policy masking enabled in training.
Blas backend has been tested.
Testing of CUDA and OpenCL backend much appreciated!
------------------------------
You can view, comment on, or merge this pull request online at:
#912
Commit Summary
- Do softmax outside backend on set of legal moves.
- Remove policy softmax from blas backend.
- Remove policy softmax from CUDA backend.
- Remove policy softmax from OpenCL backend.
- Remove policy softmax from TensorFlow backend.
File Changes
- *M* src/mcts/search.cc
<https://github.com/LeelaChessZero/lc0/pull/912/files#diff-0> (10)
- *M* src/neural/blas/network_blas.cc
<https://github.com/LeelaChessZero/lc0/pull/912/files#diff-1> (9)
- *M* src/neural/cuda/network_cudnn.cc
<https://github.com/LeelaChessZero/lc0/pull/912/files#diff-2> (44)
- *M* src/neural/network_tf.cc
<https://github.com/LeelaChessZero/lc0/pull/912/files#diff-3> (3)
- *M* src/neural/opencl/network_opencl.cc
<https://github.com/LeelaChessZero/lc0/pull/912/files#diff-4> (7)
Patch Links:
- https://github.com/LeelaChessZero/lc0/pull/912.patch
- https://github.com/LeelaChessZero/lc0/pull/912.diff
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#912?email_source=notifications&email_token=ADXIQNBUODSCBISFGAKJBMDQCCFZBA5CNFSM4IH7LOK2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HCMLFIQ>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADXIQNHLH4QUZYFN75QSKSTQCCFZBANCNFSM4IH7LOKQ>
.
|
I've fused the softmax and softmax temperature steps, such that now the random backend has just about the same performance: best of three yields 240698 nps (master) and 239228 nps (this PR), even if this PR has to do more work (namely softmax)! The trick is that w.r.t. master an extra Accuracy doesn't seem to be an issue:
|
@Tilps I have modified the distribution of the random backend policy value. Note that it is difficult to compare the two cases (master and this PR) with the random backend directly, as the policy distributions are different, as you mentioned on Discord. Therefore, a speed comparison with a fast GPU like @mooskagh proposed seems highly advisable and interesting at this stage. |
Actually a fair comparison is possible with the uniform random backend (
It is safe to say that there is no significant regression in terms of nps. NN evals are expected to be faster (because of dropped softmax layer), so the balance might even be positive, although speedup was not the goal of this PR in any case. |
No speed difference with real nets using RTX 2070. Ten samples of goodgyal-5 (48x5), cudnn-fp16 go nodes 1000000 from startpos:
Same for net 42850 but with 100000 nodes:
|
* Do softmax outside backend on set of legal moves. * Remove policy softmax from blas backend. * Remove policy softmax from CUDA backend. * Remove policy softmax from OpenCL backend. * Remove policy softmax from TensorFlow backend. * Use FastExp for policy softmax calculations. * Fix for negative exponentials. * Revert "Fix for negative exponentials." This reverts commit 9fb73d0. * Fuse softmax with softmax temperature. * Modify random backend policy value distribution. * Comment improvements.
* Do softmax outside backend on set of legal moves. * Remove policy softmax from blas backend. * Remove policy softmax from CUDA backend. * Remove policy softmax from OpenCL backend. * Remove policy softmax from TensorFlow backend. * Use FastExp for policy softmax calculations. * Fix for negative exponentials. * Revert "Fix for negative exponentials." This reverts commit 9fb73d0. * Fuse softmax with softmax temperature. * Modify random backend policy value distribution. * Comment improvements.
Perform policy softmax outside backends on set of legal moves.
This should fix the limited accuracy issues observed with CUDA backend FP16 in combination with policy masking enabled in training.
Blas backend has been tested.
Testing of CUDA and OpenCL backend much appreciated!