fixed rare infinite recursion in KBestHaplotypeFinder #5786
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This fixes a bug that @meganshand found. Here's the background:
By construction we do not assemble graphs with cycles (although @kvg has something to say about this). However, in rare cases recovering a dangling end may create a cycle. Although it's debatable whether this is an issue for the new, Dijkstra's algorithm-based best haplotype finding algorithm, we remove cycles before finding best haplotypes. It seems that the code for removing cycles can go into an infinite loop when, as in Mutect2's mitochondria mode, we allow for the recovery of forked dangling ends.
This PR deletes a single line.
parentVertices
is the set of previously visited vertices in the depth-first search. When an edge is incident on one of these vertices it creates a cycle and we mark it for removal. My best guess (@ldgauthier could you be an extra set of brain? @droazen you're welcome to look, too.) is that the idea behind removing acurrentVertex
fromparentVertices
once all its edges were processed was to optimize the O(log n) cost of subsequentparentVertices.contains
calls. Since it's a depth-first search, you would think thatcurrentVertex
will never be seen again and that this is innocuous. However, if some other branch of the depth-first search that is not descended fromcurrentVertex
also leads to a cycle that goes throughcurrentVertex
, forgetting that it has been visited creates a huge problem. I believe that forked dangling ends create this possibility.Removing the line in question will incur a tiny performance cost, if any. By the time we get here the graph has been zipped into a
SeqGraph
, so it doesn't have very many vertices. In any case,Set.contains
is not an expensive operation. We might even save runtime by eliminating all theSet.remove
.I have tested this on several WGS samples and it does no harm.