fixed rare infinite recursion in KBestHaplotypeFinder #5786

davidbenjamin · 2019-03-12T04:47:19Z

This fixes a bug that @meganshand found. Here's the background:

By construction we do not assemble graphs with cycles (although @kvg has something to say about this). However, in rare cases recovering a dangling end may create a cycle. Although it's debatable whether this is an issue for the new, Dijkstra's algorithm-based best haplotype finding algorithm, we remove cycles before finding best haplotypes. It seems that the code for removing cycles can go into an infinite loop when, as in Mutect2's mitochondria mode, we allow for the recovery of forked dangling ends.

This PR deletes a single line. parentVertices is the set of previously visited vertices in the depth-first search. When an edge is incident on one of these vertices it creates a cycle and we mark it for removal. My best guess (@ldgauthier could you be an extra set of brain? @droazen you're welcome to look, too.) is that the idea behind removing a currentVertex from parentVertices once all its edges were processed was to optimize the O(log n) cost of subsequent parentVertices.contains calls. Since it's a depth-first search, you would think that currentVertex will never be seen again and that this is innocuous. However, if some other branch of the depth-first search that is not descended from currentVertex also leads to a cycle that goes through currentVertex, forgetting that it has been visited creates a huge problem. I believe that forked dangling ends create this possibility.

Removing the line in question will incur a tiny performance cost, if any. By the time we get here the graph has been zipped into a SeqGraph, so it doesn't have very many vertices. In any case, Set.contains is not an expensive operation. We might even save runtime by eliminating all the Set.remove.

I have tested this on several WGS samples and it does no harm.

…inite recursion

codecov-io · 2019-03-12T05:27:51Z

Codecov Report

Merging #5786 into master will increase coverage by 0.005%.
The diff coverage is n/a.

@@               Coverage Diff               @@
##              master     #5786       +/-   ##
===============================================
+ Coverage     87.003%   87.007%   +0.005%     
- Complexity     32091     32094        +3     
===============================================
  Files           1975      1975               
  Lines         147184    147183        -1     
  Branches       16228     16228               
===============================================
+ Hits          128054    128060        +6     
+ Misses         13225     13218        -7     
  Partials        5905      5905

Impacted Files	Coverage Δ	Complexity Δ
...s/haplotypecaller/graphs/KBestHaplotypeFinder.java	`95.455% <ø> (-0.068%)`	`23 <0> (ø)`
...ithwaterman/SmithWatermanIntelAlignerUnitTest.java	`90% <0%> (+30%)`	`2% <0%> (ø)`	⬇️
...utils/smithwaterman/SmithWatermanIntelAligner.java	`90% <0%> (+40%)`	`3% <0%> (+2%)`	⬆️

ldgauthier

I don't have a good justification for the old way, so if this fixes the bug then I'm in favor.

removed unnecessary line in KBestHaplotypeFinder that could cause inf…

58ac048

…inite recursion

davidbenjamin added the Mutect label Mar 12, 2019

davidbenjamin added this to the Mutect 3 milestone Mar 12, 2019

davidbenjamin assigned ldgauthier and meganshand Mar 12, 2019

davidbenjamin requested review from ldgauthier and meganshand March 12, 2019 04:47

ldgauthier approved these changes Mar 12, 2019

View reviewed changes

davidbenjamin merged commit 913f24d into master Mar 12, 2019

davidbenjamin deleted the db_kbest branch March 13, 2019 15:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fixed rare infinite recursion in KBestHaplotypeFinder #5786

fixed rare infinite recursion in KBestHaplotypeFinder #5786

davidbenjamin commented Mar 12, 2019

codecov-io commented Mar 12, 2019 •

edited

Loading

ldgauthier left a comment

fixed rare infinite recursion in KBestHaplotypeFinder #5786

fixed rare infinite recursion in KBestHaplotypeFinder #5786

Conversation

davidbenjamin commented Mar 12, 2019

codecov-io commented Mar 12, 2019 • edited Loading

Codecov Report

ldgauthier left a comment

Choose a reason for hiding this comment

codecov-io commented Mar 12, 2019 •

edited

Loading