Skip to content

API and usage of main functions in Python programs

Rauf Salamzade edited this page Apr 3, 2025 · 3 revisions

The main functions in codoff are codoff_main_coords and codoff_main_gbk. These can be called upon as functions in Python.

API

codoff_main_coords(full_genome_file, focal_scaffold, focal_start_coord, outfile=None, plot_outfile=None, verbose=True)

A full genome file can be provided in either GenBank or FASTA format. If the latter, pyrodigal is used for gene calling, so it only works for bacteria. Afterwards, coordinates provided by users for the focal region of interest are used to partition which locus tags for CDS features belong to the focal region and which belong to the background genome. It calls the private function _stat_calc_and_simulation() to perform the main statistical calculations and simulations used for inference of the empirical P-value.

Argument Type Description
full_genome_file str The path to the full genome file in GenBank or format. [Required].
focal_scaffold str The scaffold identifier for the focal region. [Required].
focal_start_coord int The start coordinate for the focal region. [Required].
focal_end_coord int The end coordinate for the focal region. [Required].
outfile str The path to the output file [Default is None].
plot_outfile str The path to the plot output file. If not provided, no plot will be made [Default is None].
verbose bool Whether to print progress messages to stderr [Default is True].

codoff_main_gbk(full_genome_file, focal_genbank_files, outfile=None, plot_outfile=None, focal verbose=True)

A full genome and a specific region must each be provided in GenBank format, with locus_tags overlapping. locus_tags in the focal region GenBank that are not in the full genome GenBank will be ignored. It calls the private function _stat_calc_and_simulation() to perform the main statistical calculations and simulations used for inference of the empirical P-value.

Argument Type Description
full_genome_file str The path to the full genome file in GenBank format. [Required].
focal_genbank_files list A list of paths to GenBank files corresponding to the focal region. Note, these should not be multiple independent BGCs, rather, the ability to take multiple focal region GenBanks is to allow for fragmented pieces of the same BGC due to assembly incompleteness. [_Required].
outfile str The path to the output file [Default is None].
plot_outfile str The path to the plot output file. If not provided, no plot will be made [Default is None].
verbose bool Whether to print progress messages to stderr [Default is True].

Results

Both functions will return a dictionary with the following attributes:

Key Type Value
emp_pval_freq float The empirical P-value indicating significance of discordance between focal region and genome-wide codon usage profiles.
cosine_distance float The cosine distance between the focal region and genome-wide codon usage profiles.
rho float Spearman's rho between the focal region and genome-wide codon usage profiles.
codon_order list of strs A listing of codons which is in the same order as the following two lists.
focal_region_codons list of ints A list of codon counts for focal region.
background_genome_codons list of ints A list of codon counts for focal region.

Usage examples

codoff_main_coords():

import os
import sys
from codoff import codoff

genome_fna = 'Some_Genome.fna' # nucleotide FASTA file (can be multi-FASTA) 

# provide coordinate information for region of interest (e.g. BGC, etc.) 
scaffold = 'ABC0001.1' # should match the header of some sequence in the FASTA until the first space.
start_coord = 10051
end_coord = 95060

result = codoff.codoff_main_coords(genome_fna, scaffold, start_coord, end_coord)

codoff_main_gbk():

import os
import sys
from codoff import codoff

antismash_bgc_gbk = 'Some_BGC_of_Interest.gbk' # annotated BGC GenBank - e.g. one produced by antiSMASH
full_genome_gbk = 'Matching_Full_Genome.gbk' # annotated full-genome GenBank file - also produced by antiSMASH

output_file = 'codoff_results.tsv' # optional
output_plot = 'codoff_simulation_histogram.svg' # optional

result = codoff.codoff_main_gbk(full_genome_gbk, 
                                [antismash_bgc_gbk], 
                                outfile=output_file, 
                                plot_outfile=output_plot, 
                                verbose=True)