-
Notifications
You must be signed in to change notification settings - Fork 603
Blood Biopsy Annotations and Filters #5317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
maddyduran
commented
Oct 17, 2018
- an annotation to count the number of Ns at a variant site
- a strict strand bias filter
- a filter based on the ratio of N count to alt count
Codecov Report
@@ Coverage Diff @@
## master #5317 +/- ##
===============================================
+ Coverage 86.793% 86.794% +0.001%
- Complexity 30082 30124 +42
===============================================
Files 1842 1843 +1
Lines 139291 139480 +189
Branches 15358 15376 +18
===============================================
+ Hits 120895 121060 +165
- Misses 12810 12827 +17
- Partials 5586 5593 +7
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@madduran very minor comments. Back to you.
/** | ||
* Apply an annotation that reports the number of Ns seen at a given site. This is intended for use on consensus called data. | ||
*/ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove newline
|
||
|
||
/** | ||
* Apply an annotation that reports the number of Ns seen at a given site. This is intended for use on consensus called data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should mention here this is a read based annotation, not a fragment based annotation.
|
||
public static final String FILTERING_STATS_LONG_NAME = "stats"; | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove newline here
@@ -327,6 +327,49 @@ private void applyFilteredHaplotypeFilter(final M2FiltersArgumentCollection MTFA | |||
} | |||
} | |||
|
|||
private void applyStrictStrandFilter(final M2FiltersArgumentCollection MTFAC, final VariantContext vc, final FilterResult filterResult) { | |||
|
|||
if (! MTFAC.strictStrandBias) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra whitespace between ) {
} | ||
final int[] strandBiasCounts = GATKProtectedVariantContextUtils.getAttributeAsIntArray(tumorGenotype, GATKVCFConstants.STRAND_BIAS_BY_SAMPLE_KEY, ()->null, -1); | ||
|
||
final int ALT_FWD_INDEX = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are constants that should have wider scope. I suggest putting them with other constants around line 22.
@@ -77,7 +77,8 @@ private static void addFilterLine(final VCFFilterHeaderLine line) { | |||
addFilterLine(new VCFFilterHeaderLine(DUPLICATED_EVIDENCE_FILTER_NAME, "evidence for alt allele is overrepresented by apparent duplicates")); | |||
addFilterLine(new VCFFilterHeaderLine(READ_ORIENTATION_ARTIFACT_FILTER_NAME, "orientation bias detected by the orientation bias mixture model")); | |||
addFilterLine(new VCFFilterHeaderLine(BAD_HAPLOTYPE_FILTER_NAME, "Variant near filtered variant on same haplotype.")); | |||
|
|||
addFilterLine(new VCFFilterHeaderLine(STRICT_STRAND_BIAS_FILTER_NAME, "evidence for alt allele is not represented in both directions")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capitalize Evidence
@@ -77,7 +77,8 @@ private static void addFilterLine(final VCFFilterHeaderLine line) { | |||
addFilterLine(new VCFFilterHeaderLine(DUPLICATED_EVIDENCE_FILTER_NAME, "evidence for alt allele is overrepresented by apparent duplicates")); | |||
addFilterLine(new VCFFilterHeaderLine(READ_ORIENTATION_ARTIFACT_FILTER_NAME, "orientation bias detected by the orientation bias mixture model")); | |||
addFilterLine(new VCFFilterHeaderLine(BAD_HAPLOTYPE_FILTER_NAME, "Variant near filtered variant on same haplotype.")); | |||
|
|||
addFilterLine(new VCFFilterHeaderLine(STRICT_STRAND_BIAS_FILTER_NAME, "evidence for alt allele is not represented in both directions")); | |||
addFilterLine(new VCFFilterHeaderLine(N_RATIO_FILTER_NAME, "ratio of N to alt exceeds specified ratio")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capitalize R in ratio
@@ -119,7 +120,7 @@ private static void addFilterLine(final VCFFilterHeaderLine line) { | |||
addInfoLine(new VCFInfoHeaderLine(NON_DIPLOID_RATIO_KEY, 1, VCFHeaderLineType.Float, "Overall non-diploid ratio (alleles/(alleles+non-alleles))")); | |||
addInfoLine(new VCFInfoHeaderLine(BASE_COUNTS_KEY, 4, VCFHeaderLineType.Integer, "Counts of each base")); | |||
addInfoLine(new VCFInfoHeaderLine(LOW_MQ_KEY, 3, VCFHeaderLineType.Float, "3-tuple: <fraction of reads with MQ=0>,<fraction of reads with MQ<=10>,<total number of reads>")); | |||
addInfoLine(new VCFInfoHeaderLine(N_BASE_COUNT_KEY, 1, VCFHeaderLineType.Float, "Percentage of N bases in the pileup")); | |||
addInfoLine(new VCFInfoHeaderLine(N_COUNT_KEY, 1, VCFHeaderLineType.Float, "Percentage of N bases in the pileup")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a count or a percentage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be count
final SAMFileHeader samHeader = M2TestingUtils.createSamHeader(); | ||
final SAMFileGATKReadWriter writer = M2TestingUtils.getBareBonesSamWriter(samFile, samHeader); | ||
|
||
// create 2 Ts with strand bias |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the 2Ts here.... am I missing something or is this a spurious comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that was leftover from before...
} | ||
|
||
@Test | ||
public void testLiquidBiopsy() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't really a test of liquid biopsy, could you rename the test?
Something like, testStrictStrandBiasAndN?
@takutosato or @davidbenjamin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small comments, nothing bigger than a minor refactoring.
@@ -113,6 +115,12 @@ | |||
@Argument(fullName = UNIQUE_ALT_READ_COUNT_LONG_NAME, shortName = "unique", optional = true, doc = "Filter a variant if a site contains fewer than this many unique (i.e. deduplicated) reads supporting the alternate allele") | |||
public int uniqueAltReadCount = 0; | |||
|
|||
@Argument(fullName = N_RATIO_LONG_NAME, optional = true, doc = "Filter a variant if the ratio of Ns in the pileup. A nRatio of 0 will not apply filter.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Filter a variant if the ratio of Ns in the pileup _______________"
@@ -113,6 +115,12 @@ | |||
@Argument(fullName = UNIQUE_ALT_READ_COUNT_LONG_NAME, shortName = "unique", optional = true, doc = "Filter a variant if a site contains fewer than this many unique (i.e. deduplicated) reads supporting the alternate allele") | |||
public int uniqueAltReadCount = 0; | |||
|
|||
@Argument(fullName = N_RATIO_LONG_NAME, optional = true, doc = "Filter a variant if the ratio of Ns in the pileup. A nRatio of 0 will not apply filter.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just say "A ratio" in the second sentence.
@@ -27,6 +27,8 @@ | |||
private final Optional<String> normalSample; | |||
final OverlapDetector<MinorAlleleFractionRecord> tumorSegments; | |||
public static final String FILTERING_STATUS_VCF_KEY = "filtering_status"; | |||
private final int ALT_FWD_INDEX = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than explicitly declare array indices here it would be better to have a method like public static int getAltForwardCountsFromFlattenedContingencyTable(final int[] contingencyTable)
inside StrandBiasBySample
.
@@ -705,6 +704,80 @@ public void testReadBasedAnnotations() throws IOException { | |||
Assert.assertFalse(vc.get().getFilters().contains(GATKVCFConstants.STRAND_ARTIFACT_FILTER_NAME)); | |||
} | |||
|
|||
|
|||
public File createSamWithNsandStrandBias(final int numAlts, final int numNs, final int numRefs) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be private
|
||
gb.attribute(GATKVCFConstants.N_COUNT_KEY, Count); | ||
} | ||
@Override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put empty lines between methods.
public List<String> getKeyNames() { return Arrays.asList(GATKVCFConstants.N_COUNT_KEY); } | ||
private Boolean doesReadHaveN(final GATKRead read, final VariantContext vc) { | ||
final int offset = ReadUtils.getReadCoordinateForReferenceCoordinate(read.getSoftStart(), read.getCigar(), vc.getStart(), ReadUtils.ClippingTail.RIGHT_TAIL, true); | ||
if (offset == ReadUtils.CLIPPING_GOAL_NOT_REACHED || AlignmentUtils.isInsideDeletion(read.getCigar(), offset)) { return false;} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not combine these lines as return offset != ReadUtils.CLIPPING_GOAL_NOT_REACHED && !AlignmentUtils.isInsideDeletion(read.getCigar(), offset) && read.getBase(offset) == 'N'
?
@davidbenjamin back to you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
@@ -119,6 +121,12 @@ | |||
@Argument(fullName = UNIQUE_ALT_READ_COUNT_LONG_NAME, shortName = "unique", optional = true, doc = "Filter a variant if a site contains fewer than this many unique (i.e. deduplicated) reads supporting the alternate allele") | |||
public int uniqueAltReadCount = 0; | |||
|
|||
@Argument(fullName = N_RATIO_LONG_NAME, optional = true, doc = "Filter a variant if the ratio of Ns in the pileup is greater or equal to this value. A ratio of 0 will not apply filter.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Ratio of 0 will not apply the filter" - Is this true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that's not true anymore, a ratio of 0 will apply the filter to everything
I'm just going to remove this
* adding bloodbiopsy filters/annotations * getting test to pass * adding annotation to variant annotator integration test file * fixing annotation * responding to mark * responding to david * forgot to delete old variable * fixing filter argument description