Skip to content

Fix bug in MannWhitneyU on some JVMs. #5371

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 30, 2018
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

import htsjdk.samtools.util.Histogram;
import org.apache.commons.math3.distribution.NormalDistribution;
import org.apache.commons.math3.util.FastMath;
import org.apache.log4j.Logger;


Expand Down Expand Up @@ -484,7 +485,10 @@ Set<List<Integer>> getPermutations(final Integer[] listToPermute, int numOfPermu
* @return P-value based on histogram with u calculated for every possible permutation of group tag.
*/
public double permutationTest(final double[] series1, final double[] series2, final double testStatU) {
final Histogram<Double> histo = new Histogram<>();

// note that Mann-Whitney U stats are always integer or half-integer (this is true even in the case of ties)
// thus for safety we store a histogram of twice the Mann-Whitney values
final Histogram<Long> histo = new Histogram<>();
final int n1 = series1.length;
final int n2 = series2.length;

Expand Down Expand Up @@ -525,7 +529,7 @@ public double permutationTest(final double[] series1, final double[] series2, fi
assert (series2End == n2);

double newU = MathUtils.sum(newSeries1) - ((n1 * (n1 + 1)) / 2.0);
histo.increment(newU);
histo.increment(FastMath.round(2 * newU));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking, but you shouldn't need to round, is that right? 2 * newU should always be an integer so casting or floor would also work here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried that with finite precision 2 * newU would end up slightly below the correct integer, in which case casting or floor would give the wrong value. And maybe this isn't supposed to happen, but I'm paranoid because this whole bug is from somebody's JVM doing things that the JVM isn't supposed to do.

}

/**
Expand All @@ -534,10 +538,10 @@ public double permutationTest(final double[] series1, final double[] series2, fi
* and dividing by the total count of everything in the histogram. Just using getCumulativeDistribution() gives
* a p-value of 1 in the most extreme case which doesn't result in a usable z-score.
*/
double sumOfAllSmallerBins = histo.get(testStatU).getValue() / 2.0;
double sumOfAllSmallerBins = histo.get(FastMath.round(2 * testStatU)).getValue() / 2.0;

for (final Histogram.Bin<Double> bin : histo.values()) {
if (bin.getId() < testStatU) sumOfAllSmallerBins += bin.getValue();
for (final Histogram.Bin<Long> bin : histo.values()) {
if (bin.getId() < FastMath.round(2 * testStatU)) sumOfAllSmallerBins += bin.getValue();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably missing something here, but can you explain why bin.getValue() is not divided by 2.0 here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the value is just the count, and we only modified the key.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I got confused because the code above is purposefully taking "half of the count in the observed bin".

👍 merge away

}

return sumOfAllSmallerBins / histo.getCount();
Expand Down