Skip to content

Blood Biopsy Annotations and Filters #5317

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Oct 18, 2018
Merged

Blood Biopsy Annotations and Filters #5317

merged 9 commits into from
Oct 18, 2018

Conversation

maddyduran
Copy link
Contributor

  • an annotation to count the number of Ns at a variant site
  • a strict strand bias filter
  • a filter based on the ratio of N count to alt count

@maddyduran maddyduran self-assigned this Oct 17, 2018
@maddyduran maddyduran requested a review from fleharty October 17, 2018 14:27
@codecov-io
Copy link

codecov-io commented Oct 17, 2018

Codecov Report

Merging #5317 into master will increase coverage by 0.001%.
The diff coverage is 89.535%.

@@               Coverage Diff               @@
##              master     #5317       +/-   ##
===============================================
+ Coverage     86.793%   86.794%   +0.001%     
- Complexity     30082     30124       +42     
===============================================
  Files           1842      1843        +1     
  Lines         139291    139480      +189     
  Branches       15358     15376       +18     
===============================================
+ Hits          120895    121060      +165     
- Misses         12810     12827       +17     
- Partials        5586      5593        +7
Impacted Files Coverage Δ Complexity Δ
...ute/hellbender/utils/variant/GATKVCFConstants.java 80% <ø> (ø) 4 <0> (ø) ⬇️
...e/hellbender/utils/variant/GATKVCFHeaderLines.java 95.783% <100%> (+0.051%) 10 <0> (ø) ⬇️
...ls/walkers/mutect/M2FiltersArgumentCollection.java 100% <100%> (ø) 1 <0> (ø) ⬇️
...er/tools/walkers/annotator/StrandBiasBySample.java 86.957% <100%> (+1.242%) 9 <2> (+2) ⬆️
...r/tools/walkers/mutect/Mutect2IntegrationTest.java 94.369% <100%> (+0.588%) 71 <3> (+3) ⬆️
...te/hellbender/tools/walkers/annotator/CountNs.java 75% <75%> (ø) 8 <8> (?)
...r/tools/walkers/mutect/Mutect2FilteringEngine.java 85.232% <76%> (-1.089%) 63 <4> (+4)
...nstitute/hellbender/utils/gcs/BucketUtilsTest.java 56.303% <0%> (-3.103%) 13% <0%> (+1%)
...ls/genomicsdb/GenomicsDBImportIntegrationTest.java 86.682% <0%> (-2.388%) 77% <0%> (ø)
...tmutpileup/ValidateBasicSomaticShortMutations.java 80.172% <0%> (-0.529%) 19% <0%> (ø)
... and 9 more

Copy link
Contributor

@fleharty fleharty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@madduran very minor comments. Back to you.

/**
* Apply an annotation that reports the number of Ns seen at a given site. This is intended for use on consensus called data.
*/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove newline



/**
* Apply an annotation that reports the number of Ns seen at a given site. This is intended for use on consensus called data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should mention here this is a read based annotation, not a fragment based annotation.


public static final String FILTERING_STATS_LONG_NAME = "stats";


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove newline here

@@ -327,6 +327,49 @@ private void applyFilteredHaplotypeFilter(final M2FiltersArgumentCollection MTFA
}
}

private void applyStrictStrandFilter(final M2FiltersArgumentCollection MTFAC, final VariantContext vc, final FilterResult filterResult) {

if (! MTFAC.strictStrandBias) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra whitespace between ) {

}
final int[] strandBiasCounts = GATKProtectedVariantContextUtils.getAttributeAsIntArray(tumorGenotype, GATKVCFConstants.STRAND_BIAS_BY_SAMPLE_KEY, ()->null, -1);

final int ALT_FWD_INDEX = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are constants that should have wider scope. I suggest putting them with other constants around line 22.

@@ -77,7 +77,8 @@ private static void addFilterLine(final VCFFilterHeaderLine line) {
addFilterLine(new VCFFilterHeaderLine(DUPLICATED_EVIDENCE_FILTER_NAME, "evidence for alt allele is overrepresented by apparent duplicates"));
addFilterLine(new VCFFilterHeaderLine(READ_ORIENTATION_ARTIFACT_FILTER_NAME, "orientation bias detected by the orientation bias mixture model"));
addFilterLine(new VCFFilterHeaderLine(BAD_HAPLOTYPE_FILTER_NAME, "Variant near filtered variant on same haplotype."));

addFilterLine(new VCFFilterHeaderLine(STRICT_STRAND_BIAS_FILTER_NAME, "evidence for alt allele is not represented in both directions"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalize Evidence

@@ -77,7 +77,8 @@ private static void addFilterLine(final VCFFilterHeaderLine line) {
addFilterLine(new VCFFilterHeaderLine(DUPLICATED_EVIDENCE_FILTER_NAME, "evidence for alt allele is overrepresented by apparent duplicates"));
addFilterLine(new VCFFilterHeaderLine(READ_ORIENTATION_ARTIFACT_FILTER_NAME, "orientation bias detected by the orientation bias mixture model"));
addFilterLine(new VCFFilterHeaderLine(BAD_HAPLOTYPE_FILTER_NAME, "Variant near filtered variant on same haplotype."));

addFilterLine(new VCFFilterHeaderLine(STRICT_STRAND_BIAS_FILTER_NAME, "evidence for alt allele is not represented in both directions"));
addFilterLine(new VCFFilterHeaderLine(N_RATIO_FILTER_NAME, "ratio of N to alt exceeds specified ratio"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalize R in ratio

@@ -119,7 +120,7 @@ private static void addFilterLine(final VCFFilterHeaderLine line) {
addInfoLine(new VCFInfoHeaderLine(NON_DIPLOID_RATIO_KEY, 1, VCFHeaderLineType.Float, "Overall non-diploid ratio (alleles/(alleles+non-alleles))"));
addInfoLine(new VCFInfoHeaderLine(BASE_COUNTS_KEY, 4, VCFHeaderLineType.Integer, "Counts of each base"));
addInfoLine(new VCFInfoHeaderLine(LOW_MQ_KEY, 3, VCFHeaderLineType.Float, "3-tuple: <fraction of reads with MQ=0>,<fraction of reads with MQ<=10>,<total number of reads>"));
addInfoLine(new VCFInfoHeaderLine(N_BASE_COUNT_KEY, 1, VCFHeaderLineType.Float, "Percentage of N bases in the pileup"));
addInfoLine(new VCFInfoHeaderLine(N_COUNT_KEY, 1, VCFHeaderLineType.Float, "Percentage of N bases in the pileup"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a count or a percentage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be count

final SAMFileHeader samHeader = M2TestingUtils.createSamHeader();
final SAMFileGATKReadWriter writer = M2TestingUtils.getBareBonesSamWriter(samFile, samHeader);

// create 2 Ts with strand bias
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the 2Ts here.... am I missing something or is this a spurious comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that was leftover from before...

}

@Test
public void testLiquidBiopsy() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't really a test of liquid biopsy, could you rename the test?
Something like, testStrictStrandBiasAndN?

@maddyduran
Copy link
Contributor Author

@takutosato or @davidbenjamin
would one of you mind looking at this?

Copy link
Contributor

@davidbenjamin davidbenjamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comments, nothing bigger than a minor refactoring.

@@ -113,6 +115,12 @@
@Argument(fullName = UNIQUE_ALT_READ_COUNT_LONG_NAME, shortName = "unique", optional = true, doc = "Filter a variant if a site contains fewer than this many unique (i.e. deduplicated) reads supporting the alternate allele")
public int uniqueAltReadCount = 0;

@Argument(fullName = N_RATIO_LONG_NAME, optional = true, doc = "Filter a variant if the ratio of Ns in the pileup. A nRatio of 0 will not apply filter.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Filter a variant if the ratio of Ns in the pileup _______________"

@@ -113,6 +115,12 @@
@Argument(fullName = UNIQUE_ALT_READ_COUNT_LONG_NAME, shortName = "unique", optional = true, doc = "Filter a variant if a site contains fewer than this many unique (i.e. deduplicated) reads supporting the alternate allele")
public int uniqueAltReadCount = 0;

@Argument(fullName = N_RATIO_LONG_NAME, optional = true, doc = "Filter a variant if the ratio of Ns in the pileup. A nRatio of 0 will not apply filter.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just say "A ratio" in the second sentence.

@@ -27,6 +27,8 @@
private final Optional<String> normalSample;
final OverlapDetector<MinorAlleleFractionRecord> tumorSegments;
public static final String FILTERING_STATUS_VCF_KEY = "filtering_status";
private final int ALT_FWD_INDEX = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than explicitly declare array indices here it would be better to have a method like public static int getAltForwardCountsFromFlattenedContingencyTable(final int[] contingencyTable) inside StrandBiasBySample.

@@ -705,6 +704,80 @@ public void testReadBasedAnnotations() throws IOException {
Assert.assertFalse(vc.get().getFilters().contains(GATKVCFConstants.STRAND_ARTIFACT_FILTER_NAME));
}


public File createSamWithNsandStrandBias(final int numAlts, final int numNs, final int numRefs) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be private


gb.attribute(GATKVCFConstants.N_COUNT_KEY, Count);
}
@Override
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put empty lines between methods.

public List<String> getKeyNames() { return Arrays.asList(GATKVCFConstants.N_COUNT_KEY); }
private Boolean doesReadHaveN(final GATKRead read, final VariantContext vc) {
final int offset = ReadUtils.getReadCoordinateForReferenceCoordinate(read.getSoftStart(), read.getCigar(), vc.getStart(), ReadUtils.ClippingTail.RIGHT_TAIL, true);
if (offset == ReadUtils.CLIPPING_GOAL_NOT_REACHED || AlignmentUtils.isInsideDeletion(read.getCigar(), offset)) { return false;}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not combine these lines as return offset != ReadUtils.CLIPPING_GOAL_NOT_REACHED && !AlignmentUtils.isInsideDeletion(read.getCigar(), offset) && read.getBase(offset) == 'N'?

@maddyduran
Copy link
Contributor Author

@davidbenjamin back to you

Copy link
Contributor

@davidbenjamin davidbenjamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@@ -119,6 +121,12 @@
@Argument(fullName = UNIQUE_ALT_READ_COUNT_LONG_NAME, shortName = "unique", optional = true, doc = "Filter a variant if a site contains fewer than this many unique (i.e. deduplicated) reads supporting the alternate allele")
public int uniqueAltReadCount = 0;

@Argument(fullName = N_RATIO_LONG_NAME, optional = true, doc = "Filter a variant if the ratio of Ns in the pileup is greater or equal to this value. A ratio of 0 will not apply filter.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Ratio of 0 will not apply the filter" - Is this true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that's not true anymore, a ratio of 0 will apply the filter to everything
I'm just going to remove this

@maddyduran maddyduran merged commit f95b6fe into master Oct 18, 2018
@maddyduran maddyduran deleted the md_bloodbiopsy2 branch October 18, 2018 18:49
EdwardDixon pushed a commit to EdwardDixon/gatk that referenced this pull request Nov 9, 2018
* adding bloodbiopsy filters/annotations

* getting test to pass

* adding annotation to variant annotator integration test file

* fixing annotation

* responding to mark

* responding to david

* forgot to delete old variable

* fixing filter argument description
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants