-
Notifications
You must be signed in to change notification settings - Fork 1
API and usage of main functions in Python programs
The main functions in codoff are codoff_main_coords
and codoff_main_gbk
. These can be called upon as functions in Python.
codoff_main_coords(full_genome_file, focal_scaffold, focal_start_coord, outfile=None, plot_outfile=None, verbose=True)
A full genome file can be provided in either GenBank or FASTA format. If the latter, pyrodigal is used for gene calling, so it only works for bacteria. Afterwards, coordinates provided by users for the focal region of interest are used to partition which locus tags for CDS features belong to the focal region and which belong to the background genome. It calls the private function _stat_calc_and_simulation() to perform the main statistical calculations and simulations used for inference of the empirical P-value.
Argument | Type | Description |
---|---|---|
full_genome_file |
str | The path to the full genome file in GenBank or format. [Required]. |
focal_scaffold |
str | The scaffold identifier for the focal region. [Required]. |
focal_start_coord |
int | The start coordinate for the focal region. [Required]. |
focal_end_coord |
int | The end coordinate for the focal region. [Required]. |
outfile |
str | The path to the output file [Default is None ]. |
plot_outfile |
str | The path to the plot output file. If not provided, no plot will be made [Default is None ]. |
verbose |
bool | Whether to print progress messages to stderr [Default is True ]. |
codoff_main_gbk(full_genome_file, focal_genbank_files, outfile=None, plot_outfile=None, focal verbose=True)
A full genome and a specific region must each be provided in GenBank format, with locus_tags overlapping. locus_tags in the focal region GenBank that are not in the full genome GenBank will be ignored. It calls the private function _stat_calc_and_simulation() to perform the main statistical calculations and simulations used for inference of the empirical P-value.
Argument | Type | Description |
---|---|---|
full_genome_file |
str | The path to the full genome file in GenBank format. [Required]. |
focal_genbank_files |
list | A list of paths to GenBank files corresponding to the focal region. Note, these should not be multiple independent BGCs, rather, the ability to take multiple focal region GenBanks is to allow for fragmented pieces of the same BGC due to assembly incompleteness. [_Required]. |
outfile |
str | The path to the output file [Default is None ]. |
plot_outfile |
str | The path to the plot output file. If not provided, no plot will be made [Default is None ]. |
verbose |
bool | Whether to print progress messages to stderr [Default is True ]. |
Both functions will return a dictionary with the following attributes:
Key | Type | Value |
---|---|---|
emp_pval_freq | float | The empirical P-value indicating significance of discordance between focal region and genome-wide codon usage profiles. |
cosine_distance | float | The cosine distance between the focal region and genome-wide codon usage profiles. |
rho | float | Spearman's rho between the focal region and genome-wide codon usage profiles. |
codon_order | list of strs | A listing of codons which is in the same order as the following two lists. |
focal_region_codons | list of ints | A list of codon counts for focal region. |
background_genome_codons | list of ints | A list of codon counts for focal region. |
import os
import sys
from codoff import codoff
genome_fna = 'Some_Genome.fna' # nucleotide FASTA file (can be multi-FASTA)
# provide coordinate information for region of interest (e.g. BGC, etc.)
scaffold = 'ABC0001.1' # should match the header of some sequence in the FASTA until the first space.
start_coord = 10051
end_coord = 95060
result = codoff.codoff_main_coords(genome_fna, scaffold, start_coord, end_coord)
import os
import sys
from codoff import codoff
antismash_bgc_gbk = 'Some_BGC_of_Interest.gbk' # annotated BGC GenBank - e.g. one produced by antiSMASH
full_genome_gbk = 'Matching_Full_Genome.gbk' # annotated full-genome GenBank file - also produced by antiSMASH
output_file = 'codoff_results.tsv' # optional
output_plot = 'codoff_simulation_histogram.svg' # optional
result = codoff.codoff_main_gbk(full_genome_gbk,
[antismash_bgc_gbk],
outfile=output_file,
plot_outfile=output_plot,
verbose=True)