-
Notifications
You must be signed in to change notification settings - Fork 602
Funcotator / Clinical Pipeline should move from ExAC to gnomAD #5259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This will seemingly make the clinical pipeline data sources so large they are completely unusable. Gnomad data for the whole genome is apparently 106Gb, exome data is 16Gb (this would be OK, but is at the upper limit) (http://gnomad.broadinstitute.org/downloads). @LeeTL1220 - what are your thoughts? |
Sent an email to Niall and Alyssa asking about subsetting the gnomAD files. |
Also, the |
One potential plan:
|
To facilitate the download of gnomAD records, I removed all info annotations that had nothing to do with The WDLs/jsons are now in the All variants were kept, even those that were filtered. The network IO slows down Funcotator significantly, but not enough to make it unusable. For this reason, and partly because gnomAD requires an internet connection, the gnomAD data sources are disabled by default. |
Feature request
Tool(s) or class(es) involved
Funcotator
Description
Currently the data sources for the clinical pipeline work contain ExAC. This must be updated to use gnomAD.
The change will require a new release of the data sources which must be connected to the data source downloader tool.
Additionally, these new data sources must be validated in four ways:
The text was updated successfully, but these errors were encountered: