Sample code to accompany the L1 evolutionary dynamics across eukaryotes manuscript. Shows how to perform two independent extraction methods: iterative search using LASTZ on genomic data, versus translated nucleotide search of NCBI databases using TBLASTN. Subsequent analyses use programs such as MUSCLE, USEARCH, HMMER, etc.
Supplementary_TexSourceFiles.zip contains the latex source documents used to compile the Supplementary Material.
Order of execution: L1-extraction (LASTZ, TBLASTN) -> ORF-identification -> Dendrogram-construction -> RT-identification -> Clustering-analysis
downloadGenome.sh -> bundle.go -> renameToSeq.sh -> lastzExtractFromGenome.sh -> confirmLastzHits.sh
tblastnExtractFromDatabase.sh -> getNuclSeq.sh -> confirmTblastnHits.sh -> rerun LASTZ pipeline
extendFlankingRegions.sh -> confirmORF2.sh -> confirmORF1.sh -> probableORF1.sh -> annotateNuclSeqs.sh
cluster.sh -> alignActiveClusters.sh -> inferPhylogeny.sh
extractRTfromORF2.sh -> cluster, align and make tree (e.g. use scripts from Dendrogram-construction/)
blastAndCluster.sh (tst.awk needs to be in the same directory)