-
Notifications
You must be signed in to change notification settings - Fork 3.7k
GH-46788: [C++] Enable SIMD for byte stream split with 2 streams #46789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Benchmark result on my Macbook Pro M3: Show benchmark results
|
@ursabot please benchmark |
Benchmark runs are scheduled for commit ca35643. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete. |
Here are the benchmark numbers on my local machine (AMD Ryzen 9 3900X, gcc 14.3):
|
Now that we have SIMD optimizations for this, can we make sure the benchmarks cover the different cases? Scalar and the various SIMD kinds (SSE, AVX2, Neon). |
Thanks for your patience. Conbench analyzed the 4 benchmarking runs that have been run so far on PR commit ca35643. There were 72 benchmark results with an error:
There were 18 benchmark results indicating a performance regression:
The full Conbench report has more details. |
I've done so with |
Rationale for this change
Performance improvements for split stream encoding with two streams.
f16
are often used in machine learning for instance.What changes are included in this PR?
ByteStreamSplitDecodeSimd128
was a straightforward beneficial change.ByteStreamSplitEncodeSimd128
was significantly refactor to make it more generic. With the new implementation, we can investigate merging it with theavx2
version.Are these changes tested?
Yes with existing tests.
Are there any user-facing changes?
No.