Skip to content

Fix non-determinism in KbestHaplotypeFinding/SeqGraphZippingCode #6195

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Oct 7, 2019

Conversation

jamesemery
Copy link
Collaborator

@jamesemery jamesemery commented Oct 3, 2019

After some painstaking debug work with @tlangs we managed to isolate it to the graphing code and furthermore find what methods in our part of the graph code produce non-determinisitc output. This doesn't address the underlying issue which was that the order in the graph zipping can make a significant difference in the weights of edges in the graph code which effects the found haplotypes. This does however apparently make the haplotype caller outputs more deterministic though this still doesn't quite explain the apparent connection to NIO.

Fixes #6003

@jamesemery jamesemery requested a review from lbergelson October 3, 2019 14:00
@droazen droazen self-requested a review October 3, 2019 16:52
@droazen droazen self-assigned this Oct 3, 2019
Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back to you @jamesemery with some comments

private AssemblyResult cleanupSeqGraph(final SeqGraph seqGraph) {
printDebugGraphTransform(seqGraph, "sequenceGraph.1.dot");
private AssemblyResult cleanupSeqGraph(final SeqGraph seqGraph, final Haplotype refHaplotype) {
printDebugGraphTransform(seqGraph, refHaplotype.getLocation() + "-sequenceGraph."+seqGraph.getKmerSize()+".1.0.non_ref_removed.dot");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you refactor so that we don't construct these Strings at all if we're not in debugging mode?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added guarding if statements to the outside of every call to this method.

}

/**
* Get the set of sink vertices of this graph
* @return a non-null set
*/
public final Set<V> getSinks() {
return vertexSet().stream().filter(v -> isSink(v)).collect(Collectors.toSet());
return vertexSet().stream().filter(v -> isSink(v)).collect(Collectors.toCollection(LinkedHashSet::new));
Copy link
Contributor

@droazen droazen Oct 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment here (and at similar lines below) about why the LinkedHashSet is important here.

@droazen droazen assigned jamesemery and unassigned droazen Oct 3, 2019
Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One or two minor remaining nits, @jamesemery , then this is good to go.

try {
assemblyDebugOutStream = new PrintStream(Files.newOutputStream(IOUtils.getPath(hcArgs.assemblyStateOutput)));
} catch (IOException e) {
e.printStackTrace();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throw a UserException.CouldNotCreateOutputFile here instead of printing the stack trace.

} else {
graph.subsetToRefSource(10).printGraph(outputFile, pruneFactor);
}
if ( PRINT_FULL_GRAPH_FOR_DEBUGGING ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you removed the if ( debugGraphTransformations ) { guard from this method, did you check all usages of it to ensure that the callsites are all now checking the debugGraphTransformations flag?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Every usage. I will add a comment to be doubly sure that its someone elses fault if that causes problems in the future.

@jamesemery
Copy link
Collaborator Author

@droazen responded to your comments. Added a better exception and commented the method that lost its defensive check to shift the blame to others when mistakes are made in the future.

Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments

@@ -130,6 +130,12 @@ protected ReadThreadingAssemblerArgumentCollection getReadThreadingAssemblerArgu
@Argument(fullName = "just-determine-active-regions", doc = "Just determine ActiveRegions, don't perform assembly or calling", optional = true)
public boolean justDetermineActiveRegions = false;


@Hidden
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike hidden options, but /shrug


@Hidden
@Advanced
@Argument(fullName="debug-assembly-region-state", doc="Write output files for assembled regions with read summaries and called haplotypes", optional = true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc should explain what the the value is supposed to be, it sounds like it should be a boolean.

@@ -71,6 +73,8 @@

private AssemblyRegionTrimmer trimmer = new AssemblyRegionTrimmer();

private PrintStream assemblyDebugOutStream;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final?

@jamesemery
Copy link
Collaborator Author

@lbergelson @droazen and here i was thinking this simple fix would for whatever reason be easy and painless to get in...

Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Latest version looks good, merging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nondeterministic behaviour in graph processing
3 participants