Skip to content

flate: improve huffman flate hcode spatial locality #46007

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

teivah
Copy link
Contributor

@teivah teivah commented May 6, 2021

Improving huffman hcode spatial locality and optimize accesses when iterating over len. For example:

for i := range cgnl {
cgnl[i] = uint8(litEnc.codes[i].len)
}

I would propose that instead of having in huffmanEncoder a slice of hcode, to have an hcode struct holding a slice of code and a slice of len. It would optimize the utilization of the CPU cache lines.

Here are the results I'm getting while comparing both benchmarks locally (x86) with benstat:

name                         old time/op    new time/op    delta
Decode/Digits/Huffman/1e4-4     112µs ± 0%     123µs ± 0%   ~     (p=1.000 n=1+1)
Decode/Digits/Huffman/1e5-4    1.07ms ± 0%    1.06ms ± 0%   ~     (p=1.000 n=1+1)
Decode/Digits/Huffman/1e6-4    10.7ms ± 0%    10.5ms ± 0%   ~     (p=1.000 n=1+1)
Decode/Newton/Huffman/1e4-4     131µs ± 0%     124µs ± 0%   ~     (p=1.000 n=1+1)
Decode/Newton/Huffman/1e5-4    1.21ms ± 0%    1.17ms ± 0%   ~     (p=1.000 n=1+1)
Decode/Newton/Huffman/1e6-4    11.5ms ± 0%    11.4ms ± 0%   ~     (p=1.000 n=1+1)
Encode/Digits/Huffman/1e4-4    45.5µs ± 0%    34.1µs ± 0%   ~     (p=1.000 n=1+1)
Encode/Digits/Huffman/1e5-4     445µs ± 0%     322µs ± 0%   ~     (p=1.000 n=1+1)
Encode/Digits/Huffman/1e6-4    4.32ms ± 0%    3.16ms ± 0%   ~     (p=1.000 n=1+1)
Encode/Newton/Huffman/1e4-4    59.6µs ± 0%    50.0µs ± 0%   ~     (p=1.000 n=1+1)
Encode/Newton/Huffman/1e5-4     507µs ± 0%     390µs ± 0%   ~     (p=1.000 n=1+1)
Encode/Newton/Huffman/1e6-4    5.00ms ± 0%    3.86ms ± 0%   ~     (p=1.000 n=1+1)

name                         old speed      new speed      delta
Decode/Digits/Huffman/1e4-4  89.6MB/s ± 0%  81.5MB/s ± 0%   ~     (p=1.000 n=1+1)
Decode/Digits/Huffman/1e5-4  93.2MB/s ± 0%  94.7MB/s ± 0%   ~     (p=1.000 n=1+1)
Decode/Digits/Huffman/1e6-4  93.4MB/s ± 0%  95.2MB/s ± 0%   ~     (p=1.000 n=1+1)
Decode/Newton/Huffman/1e4-4  76.5MB/s ± 0%  81.0MB/s ± 0%   ~     (p=1.000 n=1+1)
Decode/Newton/Huffman/1e5-4  82.9MB/s ± 0%  85.5MB/s ± 0%   ~     (p=1.000 n=1+1)
Decode/Newton/Huffman/1e6-4  86.7MB/s ± 0%  87.8MB/s ± 0%   ~     (p=1.000 n=1+1)
Encode/Digits/Huffman/1e4-4   220MB/s ± 0%   293MB/s ± 0%   ~     (p=1.000 n=1+1)
Encode/Digits/Huffman/1e5-4   225MB/s ± 0%   311MB/s ± 0%   ~     (p=1.000 n=1+1)
Encode/Digits/Huffman/1e6-4   232MB/s ± 0%   316MB/s ± 0%   ~     (p=1.000 n=1+1)
Encode/Newton/Huffman/1e4-4   168MB/s ± 0%   200MB/s ± 0%   ~     (p=1.000 n=1+1)
Encode/Newton/Huffman/1e5-4   197MB/s ± 0%   257MB/s ± 0%   ~     (p=1.000 n=1+1)
Encode/Newton/Huffman/1e6-4   200MB/s ± 0%   259MB/s ± 0%   ~     (p=1.000 n=1+1)

name                         old alloc/op   new alloc/op   delta
Decode/Digits/Huffman/1e4-4    40.5kB ± 0%    40.5kB ± 0%   ~     (all equal)
Decode/Digits/Huffman/1e5-4    40.5kB ± 0%    40.5kB ± 0%   ~     (all equal)
Decode/Digits/Huffman/1e6-4    40.6kB ± 0%    40.6kB ± 0%   ~     (all equal)
Decode/Newton/Huffman/1e4-4    41.2kB ± 0%    41.2kB ± 0%   ~     (all equal)
Decode/Newton/Huffman/1e5-4    45.0kB ± 0%    45.0kB ± 0%   ~     (p=1.000 n=1+1)
Decode/Newton/Huffman/1e6-4    79.9kB ± 0%    79.9kB ± 0%   ~     (all equal)
Encode/Digits/Huffman/1e4-4     0.00B          0.00B        ~     (all equal)
Encode/Digits/Huffman/1e5-4     1.00B ± 0%     1.00B ± 0%   ~     (all equal)
Encode/Digits/Huffman/1e6-4     16.0B ± 0%     12.0B ± 0%   ~     (p=1.000 n=1+1)
Encode/Newton/Huffman/1e4-4     0.00B          0.00B        ~     (all equal)
Encode/Newton/Huffman/1e5-4     1.00B ± 0%     1.00B ± 0%   ~     (all equal)
Encode/Newton/Huffman/1e6-4     19.0B ± 0%     14.0B ± 0%   ~     (p=1.000 n=1+1)

name                         old allocs/op  new allocs/op  delta
Decode/Digits/Huffman/1e4-4      5.00 ± 0%      5.00 ± 0%   ~     (all equal)
Decode/Digits/Huffman/1e5-4      5.00 ± 0%      5.00 ± 0%   ~     (all equal)
Decode/Digits/Huffman/1e6-4      5.00 ± 0%      5.00 ± 0%   ~     (all equal)
Decode/Newton/Huffman/1e4-4      14.0 ± 0%      14.0 ± 0%   ~     (all equal)
Decode/Newton/Huffman/1e5-4      23.0 ± 0%      23.0 ± 0%   ~     (all equal)
Decode/Newton/Huffman/1e6-4       161 ± 0%       161 ± 0%   ~     (all equal)
Encode/Digits/Huffman/1e4-4      0.00           0.00        ~     (all equal)
Encode/Digits/Huffman/1e5-4      0.00           0.00        ~     (all equal)
Encode/Digits/Huffman/1e6-4      0.00           0.00        ~     (all equal)
Encode/Newton/Huffman/1e4-4      0.00           0.00        ~     (all equal)
Encode/Newton/Huffman/1e5-4      0.00           0.00        ~     (all equal)
Encode/Newton/Huffman/1e6-4      0.00           0.00        ~     (all equal)

@google-cla
Copy link

google-cla bot commented May 6, 2021

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@google-cla google-cla bot added the cla: no Used by googlebot to label PRs as having an invalid CLA. The text of this label should not change. label May 6, 2021
@teivah
Copy link
Contributor Author

teivah commented May 6, 2021

@googlebot I signed it!

@google-cla google-cla bot added cla: yes Used by googlebot to label PRs as having a valid CLA. The text of this label should not change. and removed cla: no Used by googlebot to label PRs as having an invalid CLA. The text of this label should not change. labels May 6, 2021
@gopherbot
Copy link
Contributor

This PR (HEAD: 05bf781) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/317789 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@gopherbot
Copy link
Contributor

Message from Go Bot:

Patch Set 1:

Congratulations on opening your first change. Thank you for your contribution!

Next steps:
A maintainer will review your change and provide feedback. See
https://golang.org/doc/contribute.html#review for more info and tips to get your
patch through code review.

Most changes in the Go project go through a few rounds of revision. This can be
surprising to people new to the project. The careful, iterative review process
is our way of helping mentor contributors and ensuring that their contributions
have a lasting impact.

During May-July and Nov-Jan the Go project is in a code freeze, during which
little code gets reviewed or merged. If a reviewer responds with a comment like
R=go1.11 or adds a tag like "wait-release", it means that this CL will be
reviewed as part of the next development cycle. See https://golang.org/s/release
for more details.


Please don’t reply on this GitHub thread. Visit golang.org/cl/317789.
After addressing review feedback, remember to publish your drafts!

@teivah teivah changed the title flate: improve flate hcode spatial locality flate: improve huffman flate hcode spatial locality May 6, 2021
@gopherbot
Copy link
Contributor

Message from Joe Tsai:

Patch Set 3:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/317789.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Teiva Harsanyi:

Patch Set 3:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/317789.
After addressing review feedback, remember to publish your drafts!

klauspost added a commit to klauspost/compress that referenced this pull request Jun 7, 2022
Experiment by adding golang/go#46007 from @teivah

Before/after:

```
file	out	level	insize	outsize	millis	mb/s
github-ranks-backup.bin	gzkp	1	1862623243	458201422	6979	254.51
github-ranks-backup.bin	gzkp	1	1862623243	458201422	7273	244.22

enwik9	gzkp	1	1000000000	382781160	5805	164.26
enwik9	gzkp	1	1000000000	382781160	5976	159.57

github-ranks-backup.bin	gzkp	-2	1862623243	1298789681	5592	317.65
github-ranks-backup.bin	gzkp	-2	1862623243	1298789681	5420	327.70

```

Slower for general compression, but faster for huffman only compression.
@teivah teivah closed this Aug 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Used by googlebot to label PRs as having a valid CLA. The text of this label should not change.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants