-
Notifications
You must be signed in to change notification settings - Fork 15
Faster and dependency-free MurmurHash3_32 implementation #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for the PR, but unfortunately this fails the tests. I'm not sure how much this would buy in general, since it looks like it replaces some stuff that exists for speed ("these 8 bytes are a uint64") with a bunch of slice indexing (which causes mem barriers), and replaces the switch (fast) with a loop (not as fast). |
Hi twmb, I am afraid that your tests will also test something wrong here. Your implementation uses the See https://en.wikipedia.org/wiki/MurmurHash
To verify this, you can simply replace line 38 in my PR like following (other indexing in the
... and let the tests run again. Do you know a solution to make the Finally: Sorry for my bad English :) |
I see the endianness point, and now see in the canonical source as well as your quote from the wikipedia. I would say that the tests are based around little endianness, though, since my laptop is little endian and the tests test against the canonical source. The unsafe conversion just says four bytes are now uint32, and then all the math works on that. Your PR is piecemeal converting four contiguous bytes to a uint32, reading those bytes as if they were encoded big endian order. I think that really the confusion here comes around the fact that murmur3 actually doesn't define an endianness with its hashing. Your PR assumes that we have to convert blocks of four bytes from a big-endian encoded order to CPU native order. The murmur3 canonical source doesn't define that such a conversion needs to happen. The murmur2 source explicitly documents that the resulting hashes are variable depending on endianness. Even that wikipedia quote says that the results are varying across the endianness. The murmur3 source does have opening documentation that x86 and x64 do produce different results, which in a way implies that hashes across platforms are not consistent. I did some looking and have found a conversation about the Dovecot(??) source around this same problem. The path they took to resolve this was to read the input as little endian (see As @dignifiedquire points out on this discussion around this same problem with @spaolacci's murmur3, murmur3 hashing is underspecified. His linked Rust murmur3 library also favors reading blocks of four as little endian uint32s. Other systems also seem to have this same endian bug, with the common report that tests fail on big endian systems: Python cassandra driver, even a bug report on the canonical code. Lastly, although this is likely a non-issue, changing the reading from platform-dependent to always-big-endian is a behavior change that may break users, especially considering that little endian systems are the most common. I think a better approach here would be to define new APIs: NewNeutral
SeedNewNeutral
SeedNeutralStringSum
SeedNeutralSum
NeutralStringSum
NeutralSum for 32, 64, and where relevant, 128. I do think it's unfortunate that this would basically double the API surface, which makes Alternatively, since big endian is likely pretty uncommon, we could just define the hashing based off of little endian blocks. But, in short, I favor little endian since it's the most common. Thoughts? |
Great respect for your research!!! I favor little-endian too, because, how did you write "since it's the most common."
Other way, my PR handles the data as encoded little-endian:
What do you think about an endianness dependent solution? (But therefor the tests have to be duplicated for both endian types). Unfortunately I still haven't found a reliable solution in Go to check for the system endianness, except to execute a native
I hope, you understand what I mean. |
I've done this in the past with some unsafe code, e.g. https://play.golang.org/p/wyamz1LFWxL. What do you mean by an endianness dependent solution? |
The On a little-endian mashine I would expect
I think that is exactly why the tests in my PR fail.
Like the following?
|
You write "On a little-endian mashine I would expect But ultimately, this is a question for how to read four continuous bytes and parse them into a number. I think keeping the current behavior is best, whereas your proposed solution looks to reverse the order of reading numbers, changing all tests. But, it also looks to try to get the same hash across both big and little endian platforms, which is what I proposed at the end of this comment. I think that this may suffice to do the same thing by making big endian platforms conform to little endian:
Although, I think the logic in the tail code also may break things. I used to have some test matrix for this code, maybe it'd be worth re-setting that up with github actions and adding big and little endian. |
Right, I had a mistake in thought.
I think, the simplest solution is to use
|
Unfortunately, just doing that results in a nearly 40% hit to throughput on my laptop, likely due to the compiler being unable to prove that the One can be eliminated by doing
so that does one bounds check (then the four indexes into Another benefit of my suggested code is that the conditional is likely 100% predicted, meaning it's of no hit to performance (no change on my machine). Unfortunately, it's a 58% hit if the flip is needed. I can't come up with a way that keeps the performance while works the same on both endians, which leads me to believe that, if we want such behavior, it's worth it to add new APIs and document that these APIs return hashes that are not portable across architectures. |
We're in "luck". Due to failures on go1.14, I've converted everything to read little endian numbers, making the hashes architectures portable. The speed is not as fast, but we do fix some alignment problems, so, pros and cons. It's close to being as fast. |
No description provided.