Skip to content

Seems VariantsToTable not properly handle AD greater than 100 #6115

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bshifaw opened this issue Aug 23, 2019 · 6 comments
Closed

Seems VariantsToTable not properly handle AD greater than 100 #6115

bshifaw opened this issue Aug 23, 2019 · 6 comments
Assignees

Comments

@bshifaw
Copy link

bshifaw commented Aug 23, 2019

User provided input files that i tested and one of the AD values did get concatenated but not all AD values greater than 100 were concatenated.

User post


I am trying to extract info from a vcf file using the following command and encountered a problem:

gatk VariantsToTable -R $REF -V final_SNP.vcf -F CHROM -F POS -F REF -F ALT -F QUAL -GF AD -GF GQ -GF PL -GF GT -O snpPE_final.tsv<br />
```<br />
For SNPs with its AD value less than 100, the results are fine, but for SNPs with its AD value greater than 100, VariantsToTable just concatenates the two AD values. Here is an entry in the vcf file:<br />
```<br />
1   15880   .   G   A   3785.46 PASS    AC=2;AF=0.500;AN=4;BaseQRankSum=6.325;DP=296;ExcessHet=4.7712;FS=3.153;MLEAC=2;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=12.79;ReadPosRankSum=-1.165;SOR=0.888    GT:AD:DP:GQ:PL  0/1:58,35:93:99:895,0,2296  0/1:98,105:203:99:2900,0,3782<br />
```<br />
And here is the corresponding row in the tsv file:<br />
```<br />
CHROM   POS REF ALT QUAL    S1.AD   S1.GQ   S1.PL   S1.GT   S2.AD   S2.GQ   S2.PL   S2.GT<br />
1   15880   G   A   3785.46 58,35   99  895,0,2296  G/A 98105   99  2900,0,3782 G/A<br />
```<br />
The AD values in the S2.AD column should be 98,105, not 98105. I use GATK4-4.1.2.0-1 and openjdk 1.8.0_152-release on Ubuntu 18.04.

This Issue was generated from your [forums] 
[forums]: https://gatkforums.broadinstitute.org/gatk/discussion/24368/seems-variantstotable-not-properly-handle-ad-greater-than-100/p1
@lbergelson
Copy link
Member

Huh. Something dumb is happening here. Definitely a bug on our end.

@lbergelson lbergelson added the bug label Aug 23, 2019
@meganshand
Copy link
Contributor

@bshifaw I couldn't recreate this with the latest version or 4.1.2.0 when I made a VCF with that line in it. Can you please attach the actual input VCF that had the problem?

@asmirnov239 asmirnov239 self-assigned this Aug 26, 2019
@asmirnov239
Copy link
Collaborator

asmirnov239 commented Aug 26, 2019

The files are in the user-liaison channel. I can take care of this bug.

@dblhlx
Copy link

dblhlx commented Aug 28, 2019

I am the OP in the GATK forum. This bug affects only the SNPs with a three digital ALT AD value, and the two AD values of a SNP are separated by a comma. If VariantsToTable, LibreOffice, or Excel treats the comma as a thousands separator, AD concatenation would happen. Is this possible?

@dblhlx
Copy link

dblhlx commented Aug 28, 2019

Yes, it turns out this is not strictly a GATK bug. I opened the tsv file with Visual Studio Code and found a comma was there between the two AD values for the affected SNPs. So it is LibreOffice or Excel, not VariantsToTable, treats the comma as a thousands separator. IMHO, using something other than a comma to separate the two AD values should solve the problem on the GATK end.

@ldgauthier
Copy link
Contributor

In the VCF 4.3 spec (http://samtools.github.io/hts-specs/VCFv4.3.pdf) AD is now a reserved key for the FORMAT field giving a list of values with length equal to the number of alleles including the reference. Given that this is included in the spec now, we can't change the delimiter while still using the AD key.

You may find some of the changes I introduced in #5697 to be helpful. If you split multi-allelics (--split-multi-allelic) and specify AD as being allele-specific (with -ASGF AD) then you should get a line each allele, each with a scalar for depth.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants