You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/local_assembly.tex
+6-1
Original file line number
Diff line number
Diff line change
@@ -75,7 +75,12 @@ \section{Building the graph} \label{graph-assembly}
75
75
\section{Cleaning the Graph} \label{graph-cleaning}
76
76
Before deciding on candidate haplotypes, the assembler simplifies the graph with the following heuristics to remove spurious paths and to merge variant paths that diverge from the reference.
77
77
\begin{itemize}
78
-
\item pruning: The assembler finds all maximal non-branching subgraphs and removes those that 1) do not share an edge with the reference path and 2) contain no edges with sufficient multiplicity\footnote{By default 2. This is controlled by the \code{minPruning} argument.} While the default multiplicity threshold of 2 is quite permissive, it \textit{does} cause \Mutect~ to lose sensitivity for deletions occurring in a single read\footnote{While a SNV occurring on a single read would not yield a confident somatic variant call, a long deletion in a non-STR context could easily be supported by a single read be due to the tiny probability of its arising from sequencing error.}.
78
+
\item pruning: The assembler finds all maximal non-branching subgraphs (``chains") and removes those that 1) do not share an edge with the reference path and 2) contain no edges with sufficient multiplicity\footnote{By default 2. This is controlled by the \code{minPruning} argument.} While the default multiplicity threshold of 2 is quite permissive, it \textit{does} cause \Mutect~ to lose sensitivity for deletions occurring in a single read\footnote{While a SNV occurring on a single read would not yield a confident somatic variant call, a long deletion in a non-STR context could easily be supported by a single read be due to the tiny probability of its arising from sequencing error.}.
79
+
80
+
There is a command line flag \code{--adaptive-pruning} to turn on an adaptive pruning algorithm that adjusts itself to both the local depth of coverage and the observed sequencing error rate and removes chains based on a likelihood score. The score of a chain is the maximum of a left score and a right score, where the score on left (right) end of the chain is the active region determination log likelihood from the \code{Mutect2Engine}, treating the first (last) edge of the chain as a potential variant reads and all other outgoing (incoming) edges of the first (last) vertex in the chain as ref reads. The adaptive algorithm does this in two passes, where the first pass is used to determine likely errors from which to determine an empirical guess of the error rate.
81
+
82
+
The adaptive pruning option is extremely useful for samples with high coverage, such as mitochondria and targeted panels, and for samples with variable coverage, such as exomes and RNA.
83
+
79
84
\item dangling tails: The assembler only outputs haplotypes that start and end with a reference kmer, so it attempts to rescue paths in the graph that do not. To rescue a ``dangling tails" -- a path that ends in a non-reference kmer vertex -- the assembler first traverses the graph backwards from this vertex to a reference vertex. If during traversal it encounters a vertex with more than one incoming edge it gives up\footnote{as opposed to doing eg depth-first search of all possible paths back to the reference.} It also gives up if it encounters a vertex with more than one outgoing edge, that is, if the path branches again after diverging from the reference\footnote{It seems like this could be changed to increase sensitivity.}. Then it generates the Smith-Waterman alignment of the branching path versus the reference path after the vertex at which they diverge. If the alignment's CIGAR contains three or fewer elements, that is, if the alignment has at most one indel, the assembly engine attempts to merge the dangling tail back into the reference.
80
85
81
86
To merge the dangling tail back into the reference path, the assembler finds the beginning of the maximal common suffix of the dangling path and the reference path, that is, the point at which the sequences coverges\footnote{this is \textit{not} where the \textit{paths in the graph} converge (they don't) because kmers in the suffix disagree with the ref at upstream bases.} and adds an edge between the dangling path's vertex and the reference path's vertex at this position. This means that the graph is no longer a valid de Bruijn graph because the dangling vertex kmer and its succeeding reference vertex kmer do not overlap by $k - 1$ bases. Nonetheless, this graph yields valid haplotypes when we later ``zip'' the graph's chains (see below) by accumulating the last base of each kmer.
Copy file name to clipboardExpand all lines: src/main/java/org/broadinstitute/hellbender/tools/walkers/haplotypecaller/AssemblyBasedCallerArgumentCollection.java
+4-1
Original file line number
Diff line number
Diff line change
@@ -29,8 +29,11 @@ public abstract class AssemblyBasedCallerArgumentCollection extends StandardCall
if ( hcArgs.genotypingOutputMode == GenotypingOutputMode.GENOTYPE_GIVEN_ALLELES && hcArgs.assemblerArgs.consensusMode ) {
266
+
if ( hcArgs.genotypingOutputMode == GenotypingOutputMode.GENOTYPE_GIVEN_ALLELES && hcArgs.assemblerArgs.consensusMode() ) {
267
267
thrownewUserException("HaplotypeCaller cannot be run in both GENOTYPE_GIVEN_ALLELES mode and in consensus mode at the same time. Please choose one or the other.");
268
268
}
269
269
@@ -604,7 +604,7 @@ public List<VariantContext> callRegion(final AssemblyRegion region, final Featur
* By default, the read threading assembler will attempt to recover dangling heads and tails. See the `minDanglingBranchLength` argument documentation for more details.
23
+
*/
24
+
@Hidden
25
+
@Argument(fullName="do-not-recover-dangling-branches", doc="Disable dangling head and tail recovery", optional = true)
* As of version 3.3, this argument is no longer needed because dangling end recovery is now the default behavior. See GATK 3.3 release notes for more details.
30
+
*/
31
+
@Deprecated
32
+
@Argument(fullName="recover-dangling-heads", doc="This argument is deprecated since version 3.3", optional = true)
Copy file name to clipboardExpand all lines: src/main/java/org/broadinstitute/hellbender/tools/walkers/haplotypecaller/ReadThreadingAssemblerArgumentCollection.java
@@ -48,20 +49,6 @@ public final class ReadThreadingAssemblerArgumentCollection implements Serializa
48
49
@Argument(fullName="num-pruning-samples", doc="Number of samples that must pass the minPruning threshold", optional = true)
49
50
publicintnumPruningSamples = 1;
50
51
51
-
/**
52
-
* As of version 3.3, this argument is no longer needed because dangling end recovery is now the default behavior. See GATK 3.3 release notes for more details.
53
-
*/
54
-
@Deprecated
55
-
@Argument(fullName="recover-dangling-heads", doc="This argument is deprecated since version 3.3", optional = true)
* By default, the read threading assembler will attempt to recover dangling heads and tails. See the `minDanglingBranchLength` argument documentation for more details.
60
-
*/
61
-
@Hidden
62
-
@Argument(fullName="do-not-recover-dangling-branches", doc="Disable dangling head and tail recovery", optional = true)
* The assembly graph can be quite complex, and could imply a very large number of possible haplotypes. Each haplotype
@@ -91,13 +72,6 @@ public final class ReadThreadingAssemblerArgumentCollection implements Serializa
91
72
@Argument(fullName="max-num-haplotypes-in-population", doc="Maximum number of haplotypes to consider for your population", optional = true)
92
73
publicintmaxNumHaplotypesInPopulation = 128;
93
74
94
-
/**
95
-
* Enabling this argument may cause fundamental problems with the assembly graph itself.
96
-
*/
97
-
@Hidden
98
-
@Argument(fullName="error-correct-kmers", doc = "Use an exploratory algorithm to error correct the kmers used during assembly", optional = true)
99
-
publicbooleanerrorCorrectKmers = false;
100
-
101
75
/**
102
76
* Paths with fewer supporting kmers than the specified threshold will be pruned from the graph.
103
77
*
@@ -111,6 +85,28 @@ public final class ReadThreadingAssemblerArgumentCollection implements Serializa
111
85
@Argument(fullName="min-pruning", doc = "Minimum support to not prune paths in the graph", optional = true)
112
86
publicintminPruneFactor = 2;
113
87
88
+
/**
89
+
* Initial base error rate guess for the probabilistic adaptive pruning model. Results are not very sensitive to this
90
+
* parameter because it is only a starting point from which the algorithm discovers the true error rate.
91
+
*/
92
+
@Advanced
93
+
@Argument(fullName="adaptive-pruning-initial-error-rate", doc = "Initial base error rate estimate for adaptive pruning", optional = true)
94
+
publicdoubleinitialErrorRateForPruning = 0.001;
95
+
96
+
/**
97
+
* Log-10 likelihood ratio threshold for adaptive pruning algorithm.
98
+
*/
99
+
@Advanced
100
+
@Argument(fullName="pruning-lod-threshold", doc = "Log-10 likelihood ratio threshold for adaptive pruning algorithm", optional = true)
101
+
publicdoublepruningLog10OddsThreshold = 1.0;
102
+
103
+
/**
104
+
* The maximum number of variants in graph the adaptive pruner will allow
105
+
*/
106
+
@Advanced
107
+
@Argument(fullName="max-unpruned-variants", doc = "Maximum number of variants in graph the adaptive pruner will allow", optional = true)
108
+
publicintmaxUnprunedVariants = 100;
109
+
114
110
@Hidden
115
111
@Argument(fullName="debug-graph-transformations", doc="Write DOT formatted graph files out of the assembler for only this graph size", optional = true)
116
112
publicbooleandebugGraphTransformations = false;
@@ -137,4 +133,8 @@ public final class ReadThreadingAssemblerArgumentCollection implements Serializa
137
133
@Hidden
138
134
@Argument(fullName="min-observations-for-kmer-to-be-solid", doc = "A k-mer must be seen at least these times for it considered to be solid", optional = true)
0 commit comments