Skip to content

Sample name is not recognized in "get_sample_ploidy_metadata" method when generating "segments" VCF file using PostprocessGermlineCNVCalls #4724

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
shulik7 opened this issue May 1, 2018 · 5 comments · Fixed by #5490

Comments

@shulik7
Copy link

shulik7 commented May 1, 2018

Hi,

I am trying to call germline CNVs for a set of samples. After running DetermineGermlineContigPloidy and GermlineCNVCaller, I am using PostprocessGermlineCNVCalls to generate the VCF files with CNV calls. The "interval" VCF files are generated successfully. But I got the following error message when segmenting contigs:

org.broadinstitute.hellbender.utils.python.PythonScriptExecutorException:
python exited with 1
Command Line: python /tmp/shulik7/segment_gcnv_calls.2338024416841754264.py --ploidy_calls_path /scratch/users/shulik7/test_GATK_CNV/Postprocess/../DetermineGermlineContigPloidy/model/test_run-calls/ --model_shards /scratch/shulik7/test_GATK_CNV/Postprocess/../GermlineCNVCaller/cnvs/test_run-model --calls_shards /scratch/shulik7/test_GATK_CNV/Postprocess/../GermlineCNVCaller/cnvs/test_run-calls --output_path /tmp/shulik7/gcnv-segmented-calls28280883609685538 --sample_index 0
Stdout: 11:32:16.728 INFO segment_gcnv_calls - Loading ploidy calls...
11:32:16.729 INFO gcnvkernel.io.io_metadata - Loading germline contig ploidy and global read depth metadata...
11:32:16.730 INFO segment_gcnv_calls - Instantiating the Viterbi segmentation engine...
11:32:18.585 INFO gcnvkernel.postprocess.viterbi_segmentation - Assembling interval list and copy-number class posterior from model shards...
11:32:25.158 INFO gcnvkernel.structs.metadata - Generating intervals metadata...
11:32:27.543 INFO gcnvkernel.postprocess.viterbi_segmentation - Compiling theano forward-backward function...
11:32:34.406 INFO gcnvkernel.postprocess.viterbi_segmentation - Compiling theano Viterbi function...
11:32:40.598 INFO gcnvkernel.postprocess.viterbi_segmentation - Compiling theano variational HHMM...
11:32:42.862 INFO gcnvkernel.postprocess.viterbi_segmentation - Processing sample index: 0, sample name: test_sample_0...
11:32:43.631 INFO gcnvkernel.postprocess.viterbi_segmentation - Segmenting contig (1/24) (contig name: 1)...

Stderr: Traceback (most recent call last):
File "/tmp/shulik7/segment_gcnv_calls.2338024416841754264.py", line 73, in
viterbi_engine.write_copy_number_segments_for_single_sample(args.sample_index)
File "/home/shulik7/miniconda3/envs/gatk/lib/python3.6/site-packages/gcnvkernel/postprocess/viterbi_segmentation.py", line 265, in write_copy_number_segments_for_single_sample
for segment in self._viterbi_segments_generator_for_single_sample(sample_index):
File "/home/shulik7/miniconda3/envs/gatk/lib/python3.6/site-packages/gcnvkernel/postprocess/viterbi_segmentation.py", line 160, in _viterbi_segments_generator_for_single_sample
.get_sample_ploidy_metadata(sample_name)
File "/home/shulik7/miniconda3/envs/gatk/lib/python3.6/site-packages/gcnvkernel/structs/metadata.py", line 278, in get_sample_ploidy_metadata
return self.sample_ploidy_metadata_dict[sample_name]
KeyError: 'test_sample_0'

    at org.broadinstitute.hellbender.utils.python.PythonExecutorBase.getScriptException(PythonExecutorBase.java:75)
    at org.broadinstitute.hellbender.utils.runtime.ScriptExecutor.executeCuratedArgs(ScriptExecutor.java:126)
    at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeArgs(PythonScriptExecutor.java:170)
    at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeScript(PythonScriptExecutor.java:151)
    at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeScript(PythonScriptExecutor.java:121)
    at org.broadinstitute.hellbender.tools.copynumber.PostprocessGermlineCNVCalls.executeSegmentGermlineCNVCallsPythonScript(PostprocessGermlineCNVCalls.java:500)
    at org.broadinstitute.hellbender.tools.copynumber.PostprocessGermlineCNVCalls.generateSegmentsVCFFileFromAllShards(PostprocessGermlineCNVCalls.java:436)
    at org.broadinstitute.hellbender.tools.copynumber.PostprocessGermlineCNVCalls.traverse(PostprocessGermlineCNVCalls.java:297)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:893)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:134)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)

The sample_name.txt file in ../DetermineGermlineContigPloidy/model/test_run-calls/SAMPLE_0/ folder has the file name in it:
$cat ../DetermineGermlineContigPloidy/model/test_run-calls/SAMPLE_0/sample_name.txt
test_sample_0

The version of GATK4 I am running is 4.0.3.0

@mbabadi
Copy link
Contributor

mbabadi commented May 1, 2018

Dear @shulik7, according to the error message, gcnvkernel is expecting ploidy calls at the following path: /scratch/users/shulik7/test_GATK_CNV/Postprocess/../DetermineGermlineContigPloidy/model/test_run-calls/.

Could you please assert that the above path is indeed the ploidy calls paths, and if so, whether it contains test_sample_0 under one of SAMPLE_x subdirs? if the path is valid, please try running PostprocessGermlineCNVCalls with an absolute path and report back.

@shulik7
Copy link
Author

shulik7 commented May 1, 2018

Hi @mbabadi ,
Yes, there are sub directories SAMPLE_0 to SAMPLE_39 under the path I provided. I tried to use the absolute path and the problem resolved. Thanks for the suggestion!

@samuelklee
Copy link
Contributor

Thanks for bringing this issue to our attention, @shulik7. @mbabadi, does it resolve if we pass a canonical path, rather than just an absolute path?

@mbabadi
Copy link
Contributor

mbabadi commented May 3, 2018

@samuelklee we have to use .getCanonicalPath() instead of .getAbsolutePath() when passing paths to Python. It looks like python does not like foo/../../bar too much.

@hkirmak
Copy link

hkirmak commented Jan 8, 2025

Hi, I am encountering the same keyerror problem even though I provide the absolute path. I tried it with different version of gatk including 4.3.0.0, 4.5.0.0, 4.6.1.0 (which generated the output below, same keyerror problem but instead of theano it uses PYTENSOR_FLAGS. I checked all the files and all the files are present and contain the sample name as S29.

2025-01-08 13:56 INFO: CNV case call: data/cnv_raw/S29_germlinecnvcaller/S29_germlinecnvcaller-calls
2025-01-08 13:56 INFO: CNV model: hg38_acnv_models/roche_4100_ces/cnv_model/model
2025-01-08 13:56 INFO: Contig ploidy call: data/S29_determine_ploidy/S29_determine_ploidy-calls/SAMPLE_0
2025-01-08 13:56 INFO: gatk PostprocessGermlineCNVCalls --calls-shard-path data/cnv_raw/S29_germlinecnvcaller/S29_germlinecnvcaller-calls --model-shard-path hg38_acnv_models/roche_4100_ces/cnv_model/model --sample-index 0 --autosomal-ref-copy-number 2 --allosomal-contig chrX --allosomal-contig chrY --contig-ploidy-calls data/S29_determine_ploidy/S29_determine_ploidy-calls/SAMPLE_0 --output-genotyped-intervals data/cnv_call/S29_genotyped_intervals.vcf --output-genotyped-segments data/cnv_call/S29_genotyped_segments.vcf --output-denoised-copy-ratios data/cnv_call/S29_genotyped_denoised_copy_ratios.vcf
2025-01-08 13:57 INFO: Using GATK jar .snakemake/conda/febadccea00892907b6e487236c1170a_/share/gatk4-4.6.1.0-0/gatk-package-4.6.1.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar .snakemake/conda/febadccea00892907b6e487236c1170a_/share/gatk4-4.6.1.0-0/gatk-package-4.6.1.0-local.jar PostprocessGermlineCNVCalls --calls-shard-path data/cnv_raw/S29_germlinecnvcaller/S29_germlinecnvcaller-calls --model-shard-path hg38_acnv_models/roche_4100_ces/cnv_model/model --sample-index 0 --autosomal-ref-copy-number 2 --allosomal-contig chrX --allosomal-contig chrY --contig-ploidy-calls data/S29_determine_ploidy/S29_determine_ploidy-calls/SAMPLE_0 --output-genotyped-intervals data/cnv_call/S29_genotyped_intervals.vcf --output-genotyped-segments data/cnv_call/S29_genotyped_segments.vcf --output-denoised-copy-ratios data/cnv_call/S29_genotyped_denoised_copy_ratios.vcf
13:56:42.145 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:.snakemake/conda/febadccea00892907b6e487236c1170a_/share/gatk4-4.6.1.0-0/gatk-package-4.6.1.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
SLF4J(W): Class path contains multiple SLF4J providers.
SLF4J(W): Found provider [org.apache.logging.slf4j.SLF4JServiceProvider@682e422c]
SLF4J(W): Found provider [ch.qos.logback.classic.spi.LogbackServiceProvider@5bb8e6fc]
SLF4J(W): See https://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J(I): Actual provider is of type [org.apache.logging.slf4j.SLF4JServiceProvider@682e422c]
13:56:42.191 INFO  PostprocessGermlineCNVCalls - ------------------------------------------------------------
13:56:42.192 INFO  PostprocessGermlineCNVCalls - The Genome Analysis Toolkit (GATK) v4.6.1.0
13:56:42.192 INFO  PostprocessGermlineCNVCalls - For support and documentation go to https://software.broadinstitute.org/gatk/
13:56:42.192 INFO  PostprocessGermlineCNVCalls - Executing as hatice@hatice on Linux v6.8.0-51-generic amd64
13:56:42.192 INFO  PostprocessGermlineCNVCalls - Java runtime: OpenJDK 64-Bit Server VM v17.0.11-internal+0-adhoc..src
13:56:42.192 INFO  PostprocessGermlineCNVCalls - Start Date/Time: January 8, 2025 at 1:56:42 PM TRT
13:56:42.192 INFO  PostprocessGermlineCNVCalls - ------------------------------------------------------------
13:56:42.192 INFO  PostprocessGermlineCNVCalls - ------------------------------------------------------------
13:56:42.192 INFO  PostprocessGermlineCNVCalls - HTSJDK Version: 4.1.3
13:56:42.193 INFO  PostprocessGermlineCNVCalls - Picard Version: 3.3.0
13:56:42.193 INFO  PostprocessGermlineCNVCalls - Built for Spark Version: 3.5.0
13:56:42.193 INFO  PostprocessGermlineCNVCalls - HTSJDK Defaults.COMPRESSION_LEVEL : 2
13:56:42.193 INFO  PostprocessGermlineCNVCalls - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
13:56:42.194 INFO  PostprocessGermlineCNVCalls - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
13:56:42.194 INFO  PostprocessGermlineCNVCalls - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
13:56:42.194 INFO  PostprocessGermlineCNVCalls - Deflater: IntelDeflater
13:56:42.194 INFO  PostprocessGermlineCNVCalls - Inflater: IntelInflater
13:56:42.194 INFO  PostprocessGermlineCNVCalls - GCS max retries/reopens: 20
13:56:42.194 INFO  PostprocessGermlineCNVCalls - Requester pays: disabled
13:56:42.194 INFO  PostprocessGermlineCNVCalls - Initializing engine
13:56:55.662 INFO  PostprocessGermlineCNVCalls - Done initializing engine
13:56:55.848 INFO  ProgressMeter - Starting traversal
13:56:55.848 INFO  ProgressMeter -        Current Locus  Elapsed Minutes     Records Processed   Records/Minute
13:56:55.848 INFO  ProgressMeter -             unmapped              0.0                     0              NaN
13:56:55.848 INFO  ProgressMeter - Traversal complete. Processed 0 total records in 0.0 minutes.
13:56:55.848 INFO  PostprocessGermlineCNVCalls - Generating intervals VCF file...
13:56:55.967 INFO  PostprocessGermlineCNVCalls - Writing intervals VCF file to data/cnv_call/S29_genotyped_intervals.vcf...
13:56:55.967 INFO  PostprocessGermlineCNVCalls - Analyzing shard 1 / 1...
13:56:56.555 INFO  PostprocessGermlineCNVCalls - Generating segments...
13:57:38.724 INFO  PostprocessGermlineCNVCalls - Shutting down engine
[January 8, 2025 at 1:57:38 PM TRT] org.broadinstitute.hellbender.tools.copynumber.PostprocessGermlineCNVCalls done. Elapsed time: 0.94 minutes.
Runtime.totalMemory()=1207959552
org.broadinstitute.hellbender.utils.python.PythonScriptExecutorException: 
python exited with 1
Command Line: python /tmp/segment_gcnv_calls.1199447359357923802.py --ploidy_calls_path data/S29_determine_ploidy/S29_determine_ploidy-calls/SAMPLE_0 --model_shards hg38_acnv_models/roche_4100_ces/cnv_model/model --calls_shards data/cnv_raw/S29_germlinecnvcaller/S29_germlinecnvcaller-calls --output_path /tmp/gcnv-segmented-calls14400794845073966734 --sample_index 0
Stdout: 13:57:05.312 INFO segment_gcnv_calls - PYTENSOR_FLAGS environment variable has been set to: device=cpu,floatX=float64,optimizer=fast_run,compute_test_value=ignore,openmp=true,blas__ldflags=-lmkl_rt,openmp_elemwise_minsize=10,exception_verbosity=high
13:57:05.312 INFO segment_gcnv_calls - Loading ploidy calls...
13:57:05.312 INFO gcnvkernel.io.io_metadata - Loading germline contig ploidy and global read depth metadata...
13:57:05.312 INFO segment_gcnv_calls - Instantiating the Viterbi segmentation engine...
13:57:05.370 INFO gcnvkernel.postprocess.viterbi_segmentation - Assembling interval list and copy-number class posterior from model shards...
13:57:05.536 INFO gcnvkernel.io.io_intervals_and_counts - The given interval list provides the following interval annotations: {'GC_CONTENT'}
13:57:05.730 INFO gcnvkernel.structs.metadata - Generating intervals metadata...
13:57:05.799 INFO gcnvkernel.postprocess.viterbi_segmentation - Compiling pytensor forward-backward function...
13:57:22.603 INFO gcnvkernel.postprocess.viterbi_segmentation - Compiling pytensor Viterbi function...
13:57:28.831 INFO gcnvkernel.postprocess.viterbi_segmentation - Compiling pytensor variational HHMM...
13:57:38.394 INFO gcnvkernel.postprocess.viterbi_segmentation - Processing sample index: 0, sample name: S29...
13:57:38.422 INFO gcnvkernel.postprocess.viterbi_segmentation - Segmenting contig (1/24) (contig name: chr1)...

Stderr: Traceback (most recent call last):
  File "/tmp/segment_gcnv_calls.1199447359357923802.py", line 93, in <module>
    viterbi_engine.write_copy_number_segments()
  File ".snakemake/conda/febadccea00892907b6e487236c1170a_/lib/python3.10/site-packages/gcnvkernel-0.9-py3.10.egg/gcnvkernel/postprocess/viterbi_segmentation.py", line 256, in write_copy_number_segments
  File ".snakemake/conda/febadccea00892907b6e487236c1170a_/lib/python3.10/site-packages/gcnvkernel-0.9-py3.10.egg/gcnvkernel/postprocess/viterbi_segmentation.py", line 141, in _viterbi_segments_generator
  File ".snakemake/conda/febadccea00892907b6e487236c1170a_/lib/python3.10/site-packages/gcnvkernel-0.9-py3.10.egg/gcnvkernel/structs/metadata.py", line 263, in get_sample_ploidy_metadata
KeyError: 'S29'

	at org.broadinstitute.hellbender.utils.python.PythonExecutorBase.getScriptException(PythonExecutorBase.java:75)
	at org.broadinstitute.hellbender.utils.runtime.ScriptExecutor.executeCuratedArgs(ScriptExecutor.java:112)
	at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeArgs(PythonScriptExecutor.java:193)
	at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeScript(PythonScriptExecutor.java:168)
	at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeScript(PythonScriptExecutor.java:139)
	at org.broadinstitute.hellbender.tools.copynumber.PostprocessGermlineCNVCalls.executeSegmentGermlineCNVCallsPythonScript(PostprocessGermlineCNVCalls.java:739)
	at org.broadinstitute.hellbender.tools.copynumber.PostprocessGermlineCNVCalls.generateSegmentsVCFFileFromAllShards(PostprocessGermlineCNVCalls.java:485)
	at org.broadinstitute.hellbender.tools.copynumber.PostprocessGermlineCNVCalls.onTraversalSuccess(PostprocessGermlineCNVCalls.java:456)
	at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1123)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:150)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:203)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:222)
	at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:166)
	at org.broadinstitute.hellbender.Main.mainEntry(Main.java:209)
	at org.broadinstitute.hellbender.Main.main(Main.java:306)
``

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants