-
Notifications
You must be signed in to change notification settings - Fork 603
Added a force output sites argument to GenotypeGVCFs #6263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
6c8b780
to
3fb979f
Compare
As far as I can tell the failures are just Travis acting up. |
@davidbenjamin What's the difference between |
@ldgauthier I have not had a chance to check out this code in detail (I'll try to get someone in my lab to do this today); however, I would hope that the difference is that --force-output-intervals would output at those intervals (whether variant or not) AND include any additional variant sites. The latter, --include-non-variants and -L would only output the intervals from -L. |
Yes, thank you for jogging my memory @bbimber. @davidbenjamin adding something to that extend in the BadArgumentException message would be helpful. |
@ldgauthier I wrote a detailed error message. Anything else? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
@davidbenjamin Hello, we started using this feature more seriously and have a question. the output from GenotypeGVCFs with --force-output-sites seems to include sites with <NON_REF> as the ALT allele. Is this expected? I am guessing that these sites used to be filtered (no passing evidence of variation), and the point of this feature is the include sites based on whitelist? The problem is that some downstream tools dont know what to do with this. There is probably some subtlety here, but I would think either a given sample has callable variation, it is REF, or it is no-call? Is <NON_REF> in the output by design? |
yes, it seems to. it seems this was a simple bug where GenotypeGVCFsEngine.removeNonRefAlleles() wasnt working as intended for multi-allelic sites; however, I'm not sure I understand the entire genotyping process well enough to be certain on this. we can iterate on #6406 |
Closes #6239.
@ldgauthier @bbimber Here's what I settled on:
GenotypeGVCFs
gets a new argument for force-genotyping intervals. This is parsed exactly like any other interval argument, so the following are all valid: