1111 GNU General Public License for more details.
1212
1313
14- ###############################################################
14+ ##############################################################################
1515# AlignmentProcessor0.21 Package
1616
1717# Dependencies:
1818 Python 3
1919 Python 3 version of Biopython
2020 PAML (if using CodeML)
2121 PhyML (if using CodeML)
22- ###############################################################
22+ ##############################################################################
2323
2424
2525### Contents ###
@@ -155,9 +155,9 @@ invoke KaKs_Calculator on the whole directory again.
155155
156156# Example Usage:
157157
158- python AlignmentProcessor.py --ucsc --axt/phylip --kaks/codeml \
159- --retainStops -% <decimal> -f <forward branch of codeml tree> \
160- -r <reference species> -i <input fasta file> \
158+ python AlignmentProcessor.py --ucsc --axt/phylip --kaks/codeml
159+ --retainStops -% <decimal> -f <forward branch of codeml tree>
160+ -r <reference species> -i <input fasta file>
161161 -o <path to output directory>
162162
163163# Required Arguments:
@@ -195,6 +195,9 @@ invoke KaKs_Calculator on the whole directory again.
195195 otherwise it will not run). Otherwise the program will quit
196196 after converting the files.
197197
198+ -m indicates the method for KaKs_Calculator to use to calculate
199+ substitution rates (see below).
200+
198201 --codeml will run codeml on all of the files in the
199202 06_phylipFiles directory. You must also supply a
200203 control file for CodeML which must be located in the
@@ -207,10 +210,16 @@ invoke KaKs_Calculator on the whole directory again.
207210 AlignmentProcessor can call multiple instances of CodeML to
208211 shorten overall run time. (Default = 1)
209212
210- -f the build name (or common name if you use the --changeNames flag) of
211- the species on the forward branch of the phylogneic tree supplied by
212- PhyML. This species does not have to be the same as the reference
213- species.
213+ -f the first ten characters (standard phylip format truncates the species
214+ names to ten characters) of the build name (or common name if you use
215+ the --changeNames flag) of the species on the forward branch of the
216+ phylogneic tree supplied by PhyML. This species does not have to be
217+ the same as the reference species.
218+
219+ --noCleanUp tells the program to keep temporary CodeML control files,
220+ tree files, and other PhyML and CodeML output/temporary
221+ files which will be located in the tmp directory.
222+ These files are removed by default.
214223
215224# Additional Commands
216225
@@ -223,7 +232,7 @@ invoke KaKs_Calculator on the whole directory again.
223232 common names. Must be run without any other arguments
224233
225234 --addNameToList will add an entry to the 02_nameList.txt file.
226- e.g. python AlignmentProcessor.py -- addNameToList \
235+ e.g. python AlignmentProcessor.py -- addNameToList
227236 <build> <common name>
228237
229238
@@ -238,25 +247,49 @@ invoke KaKs_Calculator on the whole directory again.
238247 new row, followed by a tab, then the desired common name. Make sure
239248 there are no spaces in either name.
240249
250+ # KaKs_Calculator Method
251+
252+ KaKs_Calculator can calculate substitution rates in a number of different
253+ ways which can be specified to AlignmentProcessor using the "-m" flag.
254+ These methods include estimations, that are generally much faster, and
255+ maximum liklihood models, that should be more accurate. See the
256+ KaKs_Calculator documentation in the KaKs_Calculotor folder for more
257+ information.
258+
259+ Estimations:
260+ NG (default in AlignmentProcessor)
261+ LWL
262+ LPB
263+ MLWL
264+ YN
265+ MYN
266+
267+ Maximum Liklihood:
268+ GY
269+ MS (recommended fof maximum liklihood)
270+
241271# The CodeML control file
242272
243273 Codeml requires that all of its parameters be specified in one control
244274 file (http://abacus.gene.ucl.ac.uk/software/pamlDOC.pdf). Provide a
245275 control file with your desired parameters and AlignmentProcessor will
246276 use it as template.
247277
248- The control file must be titled titled “codeml.ctl”, and it must be
249- located in the output directory. Examples are included with PAML and one
250- is included in AlignmentProcessor's test directory.
278+ The control file must have a “.ctl” extension, and it must be
279+ located in the output directory. Only provide one control file in this
280+ directory. Examples are included with PAML and the controlFiles directory
281+ contains example control files for branch site, branch specific, and
282+ pairwise analyses. These can simply be copied into your output directory
283+ or you may supply one of your own.
251284
252285# Invoking the Ka/Ks pipeline with a UCSC alignment:
253286
254- python AlignmentProcessor0.21.py --axt --kaks --ucsc -r anoCar2 \
287+ python AlignmentProcessor0.21.py --axt --kaks --ucsc -r anoCar2
255288 -i anolis_gallus.fa -o pairwiseKaKs/
256289
257290# Invoking the CodeML pipeline with a de novo alignment:
258291
259- python AlignmentProcessor0.21.py --phylip --codeml -% 0.6 \
292+ python AlignmentProcessor0.21.py --phylip --codeml -% 0.6
260293 -r anoCar2 -i anolis_gallus.fa -o codemlOutput/
261294
262295-------------------------------
@@ -285,7 +318,7 @@ Remember that the order of the arguments does matter for these scripts.
285318 per gene. It will produce an output file for a gene if it has at least
286319 two sequences.
287320
288- python 01_splitFastaFiles.py <input fasta alignment> \
321+ python 01_splitFastaFiles.py <input fasta alignment>
289322 <path to output directory>
290323
29132402_RemoveHeader.py
@@ -294,7 +327,7 @@ Remember that the order of the arguments does matter for these scripts.
294327 FASTA files and remove gene IDs from the fasta headers. It will replace
295328 build names with each species' common name if "--changeNames" is specified.
296329
297- python 02_RemoveHeaderOnDir.py <path to input and output directories> \
330+ python 02_RemoveHeaderOnDir.py <path to input and output directories>
298331 --changeNames(optional)
299332
30033303_CheckFrame.py
@@ -306,7 +339,7 @@ Remember that the order of the arguments does matter for these scripts.
306339 frame. It will then replace codons with missing nucleotides with gaps
307340 to remove unknown amino acids from the sequence.
308341
309- python 03_CheckFrameOnDir.py <path to input and output directories> \
342+ python 03_CheckFrameOnDir.py <path to input and output directories>
310343 <reference_species>
311344
31234504_CountBases.py
@@ -319,7 +352,7 @@ Remember that the order of the arguments does matter for these scripts.
319352 but the script itself does not, so you MUST specify one if you invoke
320353 it on its own.
321354
322- python 05_CountBasesOnDir.py <threshold percentage as a decimal> \
355+ python 05_CountBasesOnDir.py <threshold percentage as a decimal>
323356 <path to input and output directories>
324357
32535805_ReplaceStopCodons.py
@@ -335,7 +368,7 @@ Remember that the order of the arguments does matter for these scripts.
335368 any gene which does not have at least two remaining sequences will not be
336369 written to file.
337370
338- python 05_ReplaceStopCodonsOnDir.py \
371+ python 05_ReplaceStopCodonsOnDir.py
339372 <path to input and output directories> --retainStops(optional)
340373
34137406_FASTAtoAXT.py
@@ -349,15 +382,15 @@ Remember that the order of the arguments does matter for these scripts.
349382 This program will convert all files in an input directory
350383 from fasta format to a phylip format.
351384
352- python 07_FASTAtoPhylip.py <number of species> \
385+ python 07_FASTAtoPhylip.py <number of species>
353386 <path to input and output directories>
354387
35538807_KaKsonDir.py
356389
357390 This program executes KaKs_Calculator on every file in a directory.
358391
359- python 07_KaKsonDirectory.py <path to input and output directories> \
360- <name of reference species >
392+ python 07_KaKsonDirectory.py <path to input and output directories>
393+ <method >
361394
36239507_CodeMLonDir.py
363396
@@ -371,7 +404,7 @@ Remember that the order of the arguments does matter for these scripts.
371404 Note: Since this script has greater utility as a stand-alone program, it
372405 utilizes flags so that the order of the arguments does not matter.
373406
374- python 07_CodeMLonDir.py -t <# of threads> -f <name of forward branch> \
407+ python 07_CodeMLonDir.py -t <# of threads> -f <name of forward branch>
375408 -i <path to input and output directories>
376409
377410
@@ -421,7 +454,7 @@ memory.
421454Change directory into the AlignmentProcessor folder. Paste the following into
422455a terminal:
423456
424- python AlignmentProcessor.py --ucsc --axt --kaks -r anoCar2 \
457+ python AlignmentProcessor.py --ucsc --axt --kaks -r anoCar2
425458-i test/kaksTest.fa -o test/
426459
427460This will return a tsv file with 11 lines.
@@ -431,7 +464,7 @@ The test directory already contains a sample CodeML control file, so
431464all you need to do is change into the AlignmentProcessor directory and paste
432465the following:
433466
434- python AlignmentProcessor.py --ucsc --phylip --codeml -t 2 -r anoCar2 \
467+ python AlignmentProcessor.py --ucsc --phylip --codeml -t 2 -r anoCar2
435468-f anoCar2 -i test/codemlTest.fa -o test/
436469
437470There should be 8 .mlc files in the 07_codeml directory.
0 commit comments