You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When there are many samples in a VCF and decoding the genotypes is an expensive operation, it makes sense to do all the site-level filtering first before fully decoding the record and examining the genotypes.
Unfortunately, SelectVariants currently does some of the genotype-level filtering before the site-level filtering, which can cause performance issues with large callsets.
The text was updated successfully, but these errors were encountered:
@LeeTL1220, did moving the jexl filtering up as you did in your branch above resolve your issue?
The only potential issues is that now you are applying the jexl filter on the variant context before subsetting any genotypes, but Jexls are only for INFO fields, right?
After talking with the engine team, it seems the jexl expression can refer to genotype fields. There might be some work on the jexl parser (which @lbergelson mentioned) that will eventually allow for a better solution, but for now the interim solution is to add a separate jexl argument (something like --select-info-expression) that applies only to INFO fields, and doing this filtering before unpacking the genotypes. I will need to double check though that the jexl parsing code (in htsjdk, see VariantJEXLContext.java) does not already unpack genotypes, which would defeat the purpose.
When there are many samples in a VCF and decoding the genotypes is an expensive operation, it makes sense to do all the site-level filtering first before fully decoding the record and examining the genotypes.
Unfortunately,
SelectVariants
currently does some of the genotype-level filtering before the site-level filtering, which can cause performance issues with large callsets.The text was updated successfully, but these errors were encountered: