Skip to content

fixed rare infinite recursion in KBestHaplotypeFinder #5786

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 12, 2019
Merged

Conversation

davidbenjamin
Copy link
Contributor

This fixes a bug that @meganshand found. Here's the background:

By construction we do not assemble graphs with cycles (although @kvg has something to say about this). However, in rare cases recovering a dangling end may create a cycle. Although it's debatable whether this is an issue for the new, Dijkstra's algorithm-based best haplotype finding algorithm, we remove cycles before finding best haplotypes. It seems that the code for removing cycles can go into an infinite loop when, as in Mutect2's mitochondria mode, we allow for the recovery of forked dangling ends.

This PR deletes a single line. parentVertices is the set of previously visited vertices in the depth-first search. When an edge is incident on one of these vertices it creates a cycle and we mark it for removal. My best guess (@ldgauthier could you be an extra set of brain? @droazen you're welcome to look, too.) is that the idea behind removing a currentVertex from parentVertices once all its edges were processed was to optimize the O(log n) cost of subsequent parentVertices.contains calls. Since it's a depth-first search, you would think that currentVertex will never be seen again and that this is innocuous. However, if some other branch of the depth-first search that is not descended from currentVertex also leads to a cycle that goes through currentVertex, forgetting that it has been visited creates a huge problem. I believe that forked dangling ends create this possibility.

Removing the line in question will incur a tiny performance cost, if any. By the time we get here the graph has been zipped into a SeqGraph, so it doesn't have very many vertices. In any case, Set.contains is not an expensive operation. We might even save runtime by eliminating all the Set.remove.

I have tested this on several WGS samples and it does no harm.

@codecov-io
Copy link

codecov-io commented Mar 12, 2019

Codecov Report

Merging #5786 into master will increase coverage by 0.005%.
The diff coverage is n/a.

@@               Coverage Diff               @@
##              master     #5786       +/-   ##
===============================================
+ Coverage     87.003%   87.007%   +0.005%     
- Complexity     32091     32094        +3     
===============================================
  Files           1975      1975               
  Lines         147184    147183        -1     
  Branches       16228     16228               
===============================================
+ Hits          128054    128060        +6     
+ Misses         13225     13218        -7     
  Partials        5905      5905
Impacted Files Coverage Δ Complexity Δ
...s/haplotypecaller/graphs/KBestHaplotypeFinder.java 95.455% <ø> (-0.068%) 23 <0> (ø)
...ithwaterman/SmithWatermanIntelAlignerUnitTest.java 90% <0%> (+30%) 2% <0%> (ø) ⬇️
...utils/smithwaterman/SmithWatermanIntelAligner.java 90% <0%> (+40%) 3% <0%> (+2%) ⬆️

Copy link
Contributor

@ldgauthier ldgauthier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a good justification for the old way, so if this fixes the bug then I'm in favor.

@davidbenjamin davidbenjamin merged commit 913f24d into master Mar 12, 2019
@davidbenjamin davidbenjamin deleted the db_kbest branch March 13, 2019 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants