-
Notifications
You must be signed in to change notification settings - Fork 603
Fix non-determinism in KbestHaplotypeFinding/SeqGraphZippingCode #6195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…each assembly region
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Back to you @jamesemery with some comments
private AssemblyResult cleanupSeqGraph(final SeqGraph seqGraph) { | ||
printDebugGraphTransform(seqGraph, "sequenceGraph.1.dot"); | ||
private AssemblyResult cleanupSeqGraph(final SeqGraph seqGraph, final Haplotype refHaplotype) { | ||
printDebugGraphTransform(seqGraph, refHaplotype.getLocation() + "-sequenceGraph."+seqGraph.getKmerSize()+".1.0.non_ref_removed.dot"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you refactor so that we don't construct these Strings at all if we're not in debugging mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added guarding if statements to the outside of every call to this method.
} | ||
|
||
/** | ||
* Get the set of sink vertices of this graph | ||
* @return a non-null set | ||
*/ | ||
public final Set<V> getSinks() { | ||
return vertexSet().stream().filter(v -> isSink(v)).collect(Collectors.toSet()); | ||
return vertexSet().stream().filter(v -> isSink(v)).collect(Collectors.toCollection(LinkedHashSet::new)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment here (and at similar lines below) about why the LinkedHashSet
is important here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One or two minor remaining nits, @jamesemery , then this is good to go.
try { | ||
assemblyDebugOutStream = new PrintStream(Files.newOutputStream(IOUtils.getPath(hcArgs.assemblyStateOutput))); | ||
} catch (IOException e) { | ||
e.printStackTrace(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throw a UserException.CouldNotCreateOutputFile
here instead of printing the stack trace.
} else { | ||
graph.subsetToRefSource(10).printGraph(outputFile, pruneFactor); | ||
} | ||
if ( PRINT_FULL_GRAPH_FOR_DEBUGGING ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you removed the if ( debugGraphTransformations ) {
guard from this method, did you check all usages of it to ensure that the callsites are all now checking the debugGraphTransformations
flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Every usage. I will add a comment to be doubly sure that its someone elses fault if that causes problems in the future.
@droazen responded to your comments. Added a better exception and commented the method that lost its defensive check to shift the blame to others when mistakes are made in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments
@@ -130,6 +130,12 @@ protected ReadThreadingAssemblerArgumentCollection getReadThreadingAssemblerArgu | |||
@Argument(fullName = "just-determine-active-regions", doc = "Just determine ActiveRegions, don't perform assembly or calling", optional = true) | |||
public boolean justDetermineActiveRegions = false; | |||
|
|||
|
|||
@Hidden |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dislike hidden options, but /shrug
|
||
@Hidden | ||
@Advanced | ||
@Argument(fullName="debug-assembly-region-state", doc="Write output files for assembled regions with read summaries and called haplotypes", optional = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc should explain what the the value is supposed to be, it sounds like it should be a boolean.
@@ -71,6 +73,8 @@ | |||
|
|||
private AssemblyRegionTrimmer trimmer = new AssemblyRegionTrimmer(); | |||
|
|||
private PrintStream assemblyDebugOutStream; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final?
@lbergelson @droazen and here i was thinking this simple fix would for whatever reason be easy and painless to get in... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Latest version looks good, merging!
After some painstaking debug work with @tlangs we managed to isolate it to the graphing code and furthermore find what methods in our part of the graph code produce non-determinisitc output. This doesn't address the underlying issue which was that the order in the graph zipping can make a significant difference in the weights of edges in the graph code which effects the found haplotypes. This does however apparently make the haplotype caller outputs more deterministic though this still doesn't quite explain the apparent connection to NIO.
Fixes #6003