Skip to content

Add ability to save assembly-mapped reads #494

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 16, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### `Added`

- [#395](https://github.com/nf-core/mag/pull/395) - Add support for fast domain-level classification of bins using Tiara, to allow bins to be separated into eukaryotic and prokaryotic-specific processes.
- [#395](https://github.com/nf-core/mag/pull/395) - Adds support for fast domain-level classification of bins using Tiara, to allow bins to be separated into eukaryotic and prokaryotic-specific processes.
- [#422](https://github.com/nf-core/mag/pull/422) - Adds support for normalization of read depth with BBNorm (added by @erikrikarddaniel and @fabianegli)
- [#439](https://github.com/nf-core/mag/pull/439) - Adds ability to enter the pipeline at the binning stage by providing a CSV of pre-computed assemblies (by @prototaxites)
- [#459](https://github.com/nf-core/mag/pull/459) - Adds ability to skip damage correction step in the ancient DNA workflow and just run pyDamage (by @jfy133)
- [#364](https://github.com/nf-core/mag/pull/364) - Adds geNomad nf-core modules for identifying viruses in assemblies (by @PhilPalmer and @CarsonJM)
- [#481](https://github.com/nf-core/mag/pull/481) - Adds MetaEuk for annotation of eukaryotic MAGs, and MMSeqs2 to enable downloading databases for MetaEuk (by @prototaxites)
- [#437](https://github.com/nf-core/mag/pull/429) - `--gtdb_db` also now supports directory input of an pre-uncompressed GTDB archive directory (reported by @alneberg, fix by @jfy133)
- [#494](https://github.com/nf-core/mag/pull/494) - Adds support for saving the BAM files from Bowtie2 mapping of input reads back to assembly (fix by @jfy133)

### `Changed`

Expand All @@ -38,7 +39,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [#449](https://github.com/nf-core/mag/pull/447) - Fix results file overwriting in Ancient DNA workflow (reported by @alexhbnr, fix by @jfy133)
- [#470](https://github.com/nf-core/mag/pull/470) - Fix binning preparation from running even when binning was requested to be skipped (reported by @prototaxites, fix by @jfy133)
- [#480](https://github.com/nf-core/mag/pull/480) - Improved `-resume` reliability through better meta map preservation (reported by @prototaxites, fix by @jfy133)
- [#493](https://github.com/nf-core/mag/pull/493) - Update `METABAT2` nf-core module so that it reduced the number of unnecessary file moves, enabling virtual filesystems, fix by @adamrtalbot)
- [#493](https://github.com/nf-core/mag/pull/493) - Update `METABAT2` nf-core module so that it reduced the number of unnecessary file moves, enabling virtual filesystems (fix by @adamrtalbot)

### `Dependencies`

Expand Down
14 changes: 11 additions & 3 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -305,9 +305,17 @@ process {
ext.args = params.bowtie2_mode ? params.bowtie2_mode : params.ancient_dna ? '--very-sensitive -N 1' : ''
ext.prefix = { "${meta.id}.assembly" }
publishDir = [
path: { "${params.outdir}/Assembly/${assembly_meta.assembler}/QC/${assembly_meta.id}" },
mode: params.publish_dir_mode,
pattern: "*.log"
[
path: { "${params.outdir}/Assembly/${assembly_meta.assembler}/QC/${assembly_meta.id}" },
mode: params.publish_dir_mode,
pattern: "*.log"
],
[
path: { "${params.outdir}/Assembly/${assembly_meta.assembler}/QC/${assembly_meta.id}" },
Copy link
Contributor

@prototaxites prototaxites Aug 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe Assembly/${assembly_meta.assembler}/(bams|maps)/${assembly_meta.id} or similar? Or even AssemblyMaps/${assembly_meta.assembler}/${assembly_meta.id}/? Not sure that bams/bais fit into the context of 'QC'!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think QC would be okay because the alignment can be used to quality check the assemblies, but agree that keeping it broader (like bam/maps) might work better because these BAM files have other uses as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, I'm keeping them alongside where the bowtie2 log file is saved... I don't want to rock the boat too much v😬

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you agree we can reconfigure the output in the future, then could you give a ✅!

mode: params.publish_dir_mode,
pattern: "*.{bam,bai}",
enabled: params.save_assembly_mapped_reads
],
]
}

Expand Down
3 changes: 3 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -193,6 +193,7 @@ Trimmed (short) reads are assembled with both megahit and SPAdes. Hybrid assembl
- `QC/[sample/group]/`: Directory containing QUAST files and Bowtie2 mapping logs
- `MEGAHIT-[sample].bowtie2.log`: Bowtie2 log file indicating how many reads have been mapped from the sample that the metagenome was assembled from, only present if `--coassemble_group` is not set.
- `MEGAHIT-[sample/group]-[sampleToMap].bowtie2.log`: Bowtie2 log file indicating how many reads have been mapped from the respective sample ("sampleToMap").
- `MEGAHIT-[sample].[bam/bai]`: Optionally saved BAM file of the Bowtie2 mapping of reads against the assembly.

</details>

Expand All @@ -211,6 +212,7 @@ Trimmed (short) reads are assembled with both megahit and SPAdes. Hybrid assembl
- `QC/[sample/group]/`: Directory containing QUAST files and Bowtie2 mapping logs
- `SPAdes-[sample].bowtie2.log`: Bowtie2 log file indicating how many reads have been mapped from the sample that the metagenome was assembled from, only present if `--coassemble_group` is not set.
- `SPAdes-[sample/group]-[sampleToMap].bowtie2.log`: Bowtie2 log file indicating how many reads have been mapped from the respective sample ("sampleToMap").
- `SPAdes-[sample].[bam/bai]`: Optionally saved BAM file of the Bowtie2 mapping of reads against the assembly.

</details>

Expand All @@ -229,6 +231,7 @@ SPAdesHybrid is a part of the [SPAdes](http://cab.spbu.ru/software/spades/) soft
- `QC/[sample/group]/`: Directory containing QUAST files and Bowtie2 mapping logs
- `SPAdesHybrid-[sample].bowtie2.log`: Bowtie2 log file indicating how many reads have been mapped from the sample that the metagenome was assembled from, only present if `--coassemble_group` is not set.
- `SPAdesHybrid-[sample/group]-[sampleToMap].bowtie2.log`: Bowtie2 log file indicating how many reads have been mapped from the respective sample ("sampleToMap").
- `SPAdesHybrid-[sample].[bam/bai]`: Optionally saved BAM file of the Bowtie2 mapping of reads against the assembly.

</details>

Expand Down
1 change: 1 addition & 0 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ params {
// binning options
bowtie2_mode = null
binning_map_mode = 'group'
save_assembly_mapped_reads = false
skip_binning = false
min_contig_size = 1500
min_length_unbinned_contigs = 1000000
Expand Down
17 changes: 10 additions & 7 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -513,8 +513,7 @@
},
"skip_gtdbtk": {
"type": "boolean",
"description": "Skip the running of GTDB, as well as the automatic download of the database",
"default": "false"
"description": "Skip the running of GTDB, as well as the automatic download of the database"
},
"gtdb_db": {
"type": "string",
Expand All @@ -523,23 +522,23 @@
},
"gtdbtk_min_completeness": {
"type": "number",
"default": 50.0,
"default": 50,
"description": "Min. bin completeness (in %) required to apply GTDB-tk classification.",
"help_text": "Completeness assessed with BUSCO analysis (100% - %Missing). Must be greater than 0 (min. 0.01) to avoid GTDB-tk errors. If too low, GTDB-tk classification results can be impaired due to not enough marker genes!",
"minimum": 0.01,
"maximum": 100
},
"gtdbtk_max_contamination": {
"type": "number",
"default": 10.0,
"default": 10,
"description": "Max. bin contamination (in %) allowed to apply GTDB-tk classification.",
"help_text": "Contamination approximated based on BUSCO analysis (%Complete and duplicated). If too high, GTDB-tk classification results can be impaired due to contamination!",
"minimum": 0,
"maximum": 100
},
"gtdbtk_min_perc_aa": {
"type": "number",
"default": 10.0,
"default": 10,
"description": "Min. fraction of AA (in %) in the MSA for bins to be kept.",
"minimum": 0,
"maximum": 100
Expand All @@ -553,7 +552,7 @@
},
"gtdbtk_pplacer_cpus": {
"type": "number",
"default": 1.0,
"default": 1,
"description": "Number of CPUs used for the by GTDB-Tk run tool pplacer.",
"help_text": "A low number of CPUs helps to reduce the memory required/reported by GTDB-Tk. See also the [GTDB-Tk documentation](https://ecogenomics.github.io/GTDBTk/faq.html#gtdb-tk-reaches-the-memory-limit-pplacer-crashes)."
},
Expand Down Expand Up @@ -649,7 +648,6 @@
"properties": {
"run_virus_identification": {
"type": "boolean",
"default": false,
"description": "Run virus identification."
},
"genomad_min_score": {
Expand Down Expand Up @@ -715,6 +713,11 @@
"description": "Bowtie2 alignment mode",
"help_text": "Bowtie2 alignment mode options, for example: `--very-fast` , `--very-sensitive-local -N 1` , ..."
},
"save_assembly_mapped_reads": {
"type": "boolean",
"description": "Save the output of mapping raw reads back to assembled contigs",
"help_text": "Specify to save the BAM and BAI files generated when mapping input reads back to the assembled contigs (performed in preparation for binning and contig depth estimations)."
},
"bin_domain_classification": {
"type": "boolean",
"description": "Enable domain-level (prokaryote or eukaryote) classification of bins using Tiara. Processes which are domain-specific will then only receive bins matching the domain requirement.",
Expand Down