You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Apparently SelectVariants is ~10x slower when the samples in the VCF header are not sorted, due to the need to reorder the genotypes on output. We should at least warn the user when unsorted sample names are detected in the input.
Thanks for opening this issue. I hope that this performance issue can be fixed in HTSJDK soon, but I agree a warning would be useful in the intervening time.
To clarify, I believe this applies to any tool that loads a VCF but does not need to parse genotypes - not just SelectVariants. For instance, I saw a 5-10x slowdown in SVAnnotate with unsorted sample IDs for a VCF with ~2500 samples.
Another note: GATK did not reorder the sample IDs in the output VCF during my tests of SVAnnotate, but did reorder IDs during SelectVariants.
A warning message would be helpful, although I doubt most Terra users read their logs unless there's an error. What are the chances this can get addressed in htsjdk?
Apparently
SelectVariants
is ~10x slower when the samples in the VCF header are not sorted, due to the need to reorder the genotypes on output. We should at least warn the user when unsorted sample names are detected in the input.(discovered by @epiercehoffman)
The text was updated successfully, but these errors were encountered: