-
Notifications
You must be signed in to change notification settings - Fork 602
Fix bug in MannWhitneyU on some JVMs. #5371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -525,7 +529,7 @@ public double permutationTest(final double[] series1, final double[] series2, fi | |||
assert (series2End == n2); | |||
|
|||
double newU = MathUtils.sum(newSeries1) - ((n1 * (n1 + 1)) / 2.0); | |||
histo.increment(newU); | |||
histo.increment(FastMath.round(2 * newU)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking, but you shouldn't need to round, is that right? 2 * newU
should always be an integer so casting or floor would also work here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried that with finite precision 2 * newU
would end up slightly below the correct integer, in which case casting or floor would give the wrong value. And maybe this isn't supposed to happen, but I'm paranoid because this whole bug is from somebody's JVM doing things that the JVM isn't supposed to do.
src/main/java/org/broadinstitute/hellbender/utils/MannWhitneyU.java
Outdated
Show resolved
Hide resolved
Co-Authored-By: davidbenjamin <[email protected]>
Codecov Report
@@ Coverage Diff @@
## master #5371 +/- ##
===============================================
- Coverage 86.931% 86.929% -0.001%
+ Complexity 30322 30320 -2
===============================================
Files 1849 1849
Lines 140743 140743
Branches 15476 15476
===============================================
- Hits 122349 122347 -2
- Misses 12786 12788 +2
Partials 5608 5608
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidbenjamin One question for you, otherwise looks good.
for (final Histogram.Bin<Double> bin : histo.values()) { | ||
if (bin.getId() < testStatU) sumOfAllSmallerBins += bin.getValue(); | ||
for (final Histogram.Bin<Long> bin : histo.values()) { | ||
if (bin.getId() < FastMath.round(2 * testStatU)) sumOfAllSmallerBins += bin.getValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably missing something here, but can you explain why bin.getValue()
is not divided by 2.0 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the value is just the count, and we only modified the key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I got confused because the code above is purposefully taking "half of the count in the observed bin".
👍 merge away
Closes #4290.
Since the Mann-Whitney U statistic is always integer or half-integer, we don't need to store a histogram of
Double
s, which causes issues on some JVMs. Instead we can multiply U by two and store an integer key for the histogram, which avoids the issue.@droazen I'll assign you and also @meganshand to sign off on the statistics.