1212
1313
1414###############################################################
15- # AlignmentProcessor0.11 Package
16- #
17- # Dependencies: Python 3
18- # Python 3 version of Biopython
19- # Perl (if using KaKs_Calculator)
20- # PAML (if using CodeML)
21- # R (if using CodeML)
22- # ape R package (if using CodeML)
15+ AlignmentProcessor0.11 Package
16+
17+ Dependencies: Python 3
18+ Python 3 version of Biopython
19+ Perl (if using KaKs_Calculator)
20+ PAML (if using CodeML)
21+ R (if using CodeML)
22+ ape R package (if using CodeML)
2323###############################################################
2424
25- ### Contents ###
26250 . Introduction
27261 . Obtaining a fasta alignment
28272 . Running AlignmentProcessor
29283 . Individual Scripts
30294 . Outputs
31305 . Test
3231
33- #-------------------------------
32+
3433# 0. Introduction
35- #-------------------------------
34+
3635
3736AlignmentProcessor is a pipeline meant to quickly convert a multi-fasta
3837alignment file into a format that can be read by KaKs_Calculator or PAML and
@@ -94,9 +93,8 @@ To install ape, enter:
9493 biocLite("ape")
9594
9695
97- #-------------------------------
9896# 1. Obtaining a fasta alignment
99- #-------------------------------
97+
10098# UCSC Fasta Alignment
10199It is possible to download CDS fasta alignments from the UCSC Table browser.
102100This does, unfortunately, limit you to currently available alignments.
@@ -144,9 +142,7 @@ Since this method offer the most flexibility for working with alignments,
144142AlignmentProcessor was written with this output format in mind and no further
145143formatting is required.
146144
147- #-------------------------------
148145# 2. Running AlignmentProcessor
149- #-------------------------------
150146
151147AlignmentProcessor is designed to convert the file into a usable format and
152148run the substitutions quickly, so everything can be run with one command. Each
@@ -286,9 +282,7 @@ invoke KaKs_Calculator on the whole directory again.
286282 python AlignmentProcessor0.11.py --phylip --codeml -% 0.6 \
287283 -r anoCar2 -i anolis_gallus.fa -o codemlOutput/
288284
289- #-------------------------------
290285# 3. Individual Scripts
291- #-------------------------------
292286
293287Each script performs one or two functions on the input file or files, and
294288saves the output to a new subdirectory. The location of the working directory
@@ -299,14 +293,14 @@ not.
299293
300294Remember that the order of the arguments does matter for these scripts.
301295
302- # 00_ConvertHeader.py
296+ 00_ConvertHeader.py
303297
304298 This script will convert headers for CDS fasta alignments from
305299 UCSC to be in the format: >"build_name"."gene_ID"
306300
307301 python convertHeader.py <path to input file>
308302
309- # 01_SplitFastaFiles.py
303+ 01_SplitFastaFiles.py
310304
311305 This script will split the input multi-fasta alignment into one file
312306 per gene. It will produce an output file for a gene if it has at least
@@ -315,15 +309,15 @@ Remember that the order of the arguments does matter for these scripts.
315309 python 01_splitFastaFiles.py <input fasta alignment> \
316310 <path to output directory>
317311
318- # 02_RemoveHeader.py
312+ 02_RemoveHeader.py
319313
320314 This program will read through a directory that contains
321315 aligned multiple FASTA files and replace FASTA headers with each
322316 species' common name.
323317
324318 python 02_RemoveHeaderOnDir.py <path to input and output directories>
325319
326- # 03_CheckFrame.py
320+ 03_CheckFrame.py
327321
328322 This script removes gaps introduced in the reference sequence by the
329323 alignment and removes corresponding sites in other species. It assumes
@@ -335,7 +329,7 @@ Remember that the order of the arguments does matter for these scripts.
335329 python 03_CheckFrameOnDir.py <path to input and output directories> \
336330 <reference_species>
337331
338- # 04_CountBases.py
332+ 04_CountBases.py
339333
340334 This program will check a multiple FASTA file to see that each species
341335 retains a certain percentage of its nucleotide sequence. If not, it
@@ -348,7 +342,7 @@ Remember that the order of the arguments does matter for these scripts.
348342 python 05_CountBasesOnDir.py <threshold percentage as a decimal> \
349343 <path to input and output directories>
350344
351- # 05_ReplaceStopCodons.py
345+ 05_ReplaceStopCodons.py
352346
353347 This program will remove the internal stop codons (TAA, TAG, TGA)
354348 and replace with gaps (---) from the nucleotide alignment. Some
@@ -365,7 +359,7 @@ Remember that the order of the arguments does matter for these scripts.
365359 python 05_ReplaceStopCodonsOnDir.py \
366360 <path to input and output directories> --retainStops(optional)
367361
368- # 06_FASTAtoAXT.py
362+ 06_FASTAtoAXT.py
369363
370364 This program executes 06_parseFastaIntoAXT.pl on an entire directory,
371365 allowing all of the contents of the directory to be converted to
@@ -376,22 +370,22 @@ Remember that the order of the arguments does matter for these scripts.
376370
377371 python FASTAtoAXTonDirectory.py <path to input and output directories>
378372
379- # 06_FASTAtoPhylip.py
373+ 06_FASTAtoPhylip.py
380374
381375 This program will convert all files in an input directory
382376 from fasta format to a phylip format.
383377
384378 python 07_FASTAtoPhylip.py <number of species> \
385379 <path to input and output directories>
386380
387- # 07_KaKsonDir.py
381+ 07_KaKsonDir.py
388382
389383 This program executes KaKs_Calculator on every file in a directory.
390384
391385 python 07_KaKsonDirectory.py <path to input and output directories> \
392386 <name of reference species>
393387
394- # 07_CodeMLonDir.py
388+ 07_CodeMLonDir.py
395389
396390 This script will run codeml on every file in a directory. It requires
397391 the codeml.ctl file, and likely a tree file which it will supply to
@@ -405,7 +399,7 @@ Remember that the order of the arguments does matter for these scripts.
405399 python 07_CodeMLonDir.py <path to codeml control file> \
406400 <path to input and output directories>
407401
408- # 07_pruneTree.py
402+ 07_pruneTree.py
409403
410404 This script will dynamically trim input trees for CodeML if any sequences
411405 have been removed. Species whose sequences were removed in steps 4 or 5
@@ -415,17 +409,15 @@ Remember that the order of the arguments does matter for these scripts.
415409 python 07_pruneTree.py <path to input directory> \
416410 <list of species remaining in alignment> <path to tmep output directory>
417411
418- # 08_compileKaKs.py
412+ 08_compileKaKs.py
419413
420414 This script concatonates the output from KaKs_Calculator into a text
421415 file. It adds a column for gene (or sequence) IDs, and prints the gene
422416 ID from the filename.
423417
424418 python compileCSV.py <path to input and output directories>
425419
426- #-------------------------------
427420# 4. Outputs
428- #-------------------------------
429421
430422A directory is created for each step, and, prior to the file conversion step,
431423each directory will contain a series of single-gene fasta alignments. These
@@ -452,9 +444,7 @@ the files in a separate step. AlignmentProcessor will not, however, run
452444KaKs_Calculator and CodeML simultaneously, as this could require too much
453445memory.
454446
455- #-------------------------------
456447# 5. Run AlignmentProcessor on test data
457- #-------------------------------
458448
459449# To test KaKs_Calculator:
460450Change directory into the AlignmentProcessor folder. Paste the following into
0 commit comments