Skip to content

Early exit if backend is idle and there is work to do. #1503

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 30, 2021

Conversation

Tilps
Copy link
Contributor

@Tilps Tilps commented Jan 26, 2021

Inspired by investigations into position fen 1r1qr1k1/6bp/1n4p1/2pPpp2/1n3P2/1QN5/1P1N2PP/R4RBK w - - 0 24

In that position on my machine this change almost doubles NPS at one point during the search (a few million nodes in).
In my 'standard benchmark run' it may be worth a couple of percent.

I think this heuristic can probably be improved upon, but it seems a pretty good start.

Since its an early exit - so long as the nps stays high I think its a clear win (the early exit itself is potentially worth elo at fixed nodes - untested).

I am unclear if there are any scenarios where this will be a net loss of nps - but I haven't thought of any of signifiance so far.

@Tilps
Copy link
Contributor Author

Tilps commented Jan 26, 2021

Worth mentioning this only works for multigather - but I've put the backend idle counter in unconditionally since it should be very cheap - and maybe a similar heuristic could be invented for non multigather if we wanted to bother doing so.

@Tilps
Copy link
Contributor Author

Tilps commented Jan 26, 2021

Umm this PR is 'broken' since I put the counter in the wrong spot. Surprised that it still gives the nps improvement...

@Tilps Tilps changed the title Early exit gathering if all collisions and backend is idle. Early exit if backend is idle and there is work to do. Jan 26, 2021
@Tilps
Copy link
Contributor Author

Tilps commented Jan 26, 2021

Having fixed the broken logic, I retested a more aggressive idea that had previously turned up bad and found it vastly superior.
New version showing up 6% faster on my 'standard benchmark'. Seems harder to find any scenario its slower. Speed up seems to apply even for positions with high out of order rates, not just positions with super high collision rates.
On problem position, even at the full go nodes 20000000 final position its about 30% faster (as opposed to maybe 20% before). 'Peak gain' on problem position eyeballed as similar, probably slightly better than before.

@Tilps
Copy link
Contributor Author

Tilps commented Jan 26, 2021

I realized that this optimization while very cool, interacts very poorly if there is only one search worker. So I've disabled it in that case.
Some more thought suggests it probably isn't as good as it can be for users with multigpu using roundrobin or multiplexing backends since the number of concurrent tasks the backend can perform in that case exceeds 1. So it probably makes sense to cut batches short if the number of backend tasks pending is a number higher than 0. This should still be better than nothing for that case - and very good for our normal one gpu users and demux backend users using the default 2 threads.
If we want to expand it multigpu roundrobin or multiplexing we'll need the backend to expose its 'minimal desired concurrent batch count'. I'll leave that as an excercise for the reader for now though :P

@Tilps
Copy link
Contributor Author

Tilps commented Jan 26, 2021

Fixed nodes test:

Score of lc0testdev vs lc0testdev-Basline: 4314 - 3873 - 11813 [0.511]
...      lc0testdev playing White: 2776 - 1325 - 5899  [0.573] 10000
...      lc0testdev playing Black: 1538 - 2548 - 5914  [0.450] 10000
...      White vs Black: 5324 - 2863 - 11813  [0.562] 20000
Elo difference: 7.7 +/- 3.1, LOS: 100.0 %, DrawRatio: 59.1 %
20000 of 20000 games finished.

@Tilps Tilps merged commit 0e1c34d into LeelaChessZero:master Jan 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants