Skip to content

SelectVariants should perform site-level filtering before genotype-level filtering #7497

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
droazen opened this issue Oct 7, 2021 · 3 comments
Labels
learn GATK Suitable for GATK beginners VariantWalker

Comments

@droazen
Copy link
Contributor

droazen commented Oct 7, 2021

When there are many samples in a VCF and decoding the genotypes is an expensive operation, it makes sense to do all the site-level filtering first before fully decoding the record and examining the genotypes.

Unfortunately, SelectVariants currently does some of the genotype-level filtering before the site-level filtering, which can cause performance issues with large callsets.

@droazen
Copy link
Contributor Author

droazen commented Oct 7, 2021

From a discussion with @LeeTL1220

@takutosato
Copy link
Contributor

@LeeTL1220, did moving the jexl filtering up as you did in your branch above resolve your issue?

The only potential issues is that now you are applying the jexl filter on the variant context before subsetting any genotypes, but Jexls are only for INFO fields, right?

@takutosato
Copy link
Contributor

After talking with the engine team, it seems the jexl expression can refer to genotype fields. There might be some work on the jexl parser (which @lbergelson mentioned) that will eventually allow for a better solution, but for now the interim solution is to add a separate jexl argument (something like --select-info-expression) that applies only to INFO fields, and doing this filtering before unpacking the genotypes. I will need to double check though that the jexl parsing code (in htsjdk, see VariantJEXLContext.java) does not already unpack genotypes, which would defeat the purpose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
learn GATK Suitable for GATK beginners VariantWalker
Projects
None yet
Development

No branches or pull requests

2 participants