-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Introduce new encoding of BPV 21 for DocIdsWriter used in BKD Tree #14361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-Authored-by: expani <[email protected]>
Results on
|
Should we floor to a multiple of 16 instead of 8 so that we have a perfect second loop with AVX-512 as well? (By the way, which of your machine produced the above benchmark results?) Otherwise, the change makes sense to me. |
Thanks for feedback,
That is what i thought initially. But my AVX-512 machine (hopefully it is) somehow only deals 256 bit once for its
cpu flags:
The luceneutil results get on the intel chip (Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz (AVX 512)). |
OK i get expected results that multiple of 16 faster than multiple of 8 when i force
|
…14361) Co-Authored-by: expani <[email protected]>
…pache#14361) Co-Authored-by: expani <[email protected]>
…pache#14361) Co-Authored-by: expani <[email protected]>
This PR tries another way to implement the idea of #13521, taking advantage of auto-vectorized loop to decode ints like we did in for bpv24 in #14203.
One thing need to be pointed out is that the remainder loop does not get vectorized (again!) since
512 / 3 = 170
is not a multiple of 8, then you see thefloorToMultipleOf8
trick .Mac M2
Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz (AVX 512)
cc @expani who raised this neat idea.