Skip to content

Commit 2add209

Browse files
authored
Update README.md
1 parent f426e49 commit 2add209

1 file changed

Lines changed: 23 additions & 33 deletions

File tree

README.md

Lines changed: 23 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -12,27 +12,26 @@
1212

1313

1414
###############################################################
15-
# AlignmentProcessor0.11 Package
16-
#
17-
# Dependencies: Python 3
18-
# Python 3 version of Biopython
19-
# Perl (if using KaKs_Calculator)
20-
# PAML (if using CodeML)
21-
# R (if using CodeML)
22-
# ape R package (if using CodeML)
15+
AlignmentProcessor0.11 Package
16+
17+
Dependencies: Python 3
18+
Python 3 version of Biopython
19+
Perl (if using KaKs_Calculator)
20+
PAML (if using CodeML)
21+
R (if using CodeML)
22+
ape R package (if using CodeML)
2323
###############################################################
2424

25-
### Contents ###
2625
0. Introduction
2726
1. Obtaining a fasta alignment
2827
2. Running AlignmentProcessor
2928
3. Individual Scripts
3029
4. Outputs
3130
5. Test
3231

33-
#-------------------------------
32+
3433
# 0. Introduction
35-
#-------------------------------
34+
3635

3736
AlignmentProcessor is a pipeline meant to quickly convert a multi-fasta
3837
alignment file into a format that can be read by KaKs_Calculator or PAML and
@@ -94,9 +93,8 @@ To install ape, enter:
9493
biocLite("ape")
9594

9695

97-
#-------------------------------
9896
# 1. Obtaining a fasta alignment
99-
#-------------------------------
97+
10098
# UCSC Fasta Alignment
10199
It is possible to download CDS fasta alignments from the UCSC Table browser.
102100
This does, unfortunately, limit you to currently available alignments.
@@ -144,9 +142,7 @@ Since this method offer the most flexibility for working with alignments,
144142
AlignmentProcessor was written with this output format in mind and no further
145143
formatting is required.
146144

147-
#-------------------------------
148145
# 2. Running AlignmentProcessor
149-
#-------------------------------
150146

151147
AlignmentProcessor is designed to convert the file into a usable format and
152148
run the substitutions quickly, so everything can be run with one command. Each
@@ -286,9 +282,7 @@ invoke KaKs_Calculator on the whole directory again.
286282
python AlignmentProcessor0.11.py --phylip --codeml -% 0.6 \
287283
-r anoCar2 -i anolis_gallus.fa -o codemlOutput/
288284

289-
#-------------------------------
290285
# 3. Individual Scripts
291-
#-------------------------------
292286

293287
Each script performs one or two functions on the input file or files, and
294288
saves the output to a new subdirectory. The location of the working directory
@@ -299,14 +293,14 @@ not.
299293

300294
Remember that the order of the arguments does matter for these scripts.
301295

302-
# 00_ConvertHeader.py
296+
00_ConvertHeader.py
303297

304298
This script will convert headers for CDS fasta alignments from
305299
UCSC to be in the format: >"build_name"."gene_ID"
306300

307301
python convertHeader.py <path to input file>
308302

309-
# 01_SplitFastaFiles.py
303+
01_SplitFastaFiles.py
310304

311305
This script will split the input multi-fasta alignment into one file
312306
per gene. It will produce an output file for a gene if it has at least
@@ -315,15 +309,15 @@ Remember that the order of the arguments does matter for these scripts.
315309
python 01_splitFastaFiles.py <input fasta alignment> \
316310
<path to output directory>
317311

318-
# 02_RemoveHeader.py
312+
02_RemoveHeader.py
319313

320314
This program will read through a directory that contains
321315
aligned multiple FASTA files and replace FASTA headers with each
322316
species' common name.
323317

324318
python 02_RemoveHeaderOnDir.py <path to input and output directories>
325319

326-
# 03_CheckFrame.py
320+
03_CheckFrame.py
327321

328322
This script removes gaps introduced in the reference sequence by the
329323
alignment and removes corresponding sites in other species. It assumes
@@ -335,7 +329,7 @@ Remember that the order of the arguments does matter for these scripts.
335329
python 03_CheckFrameOnDir.py <path to input and output directories> \
336330
<reference_species>
337331

338-
# 04_CountBases.py
332+
04_CountBases.py
339333

340334
This program will check a multiple FASTA file to see that each species
341335
retains a certain percentage of its nucleotide sequence. If not, it
@@ -348,7 +342,7 @@ Remember that the order of the arguments does matter for these scripts.
348342
python 05_CountBasesOnDir.py <threshold percentage as a decimal> \
349343
<path to input and output directories>
350344

351-
# 05_ReplaceStopCodons.py
345+
05_ReplaceStopCodons.py
352346

353347
This program will remove the internal stop codons (TAA, TAG, TGA)
354348
and replace with gaps (---) from the nucleotide alignment. Some
@@ -365,7 +359,7 @@ Remember that the order of the arguments does matter for these scripts.
365359
python 05_ReplaceStopCodonsOnDir.py \
366360
<path to input and output directories> --retainStops(optional)
367361

368-
# 06_FASTAtoAXT.py
362+
06_FASTAtoAXT.py
369363

370364
This program executes 06_parseFastaIntoAXT.pl on an entire directory,
371365
allowing all of the contents of the directory to be converted to
@@ -376,22 +370,22 @@ Remember that the order of the arguments does matter for these scripts.
376370

377371
python FASTAtoAXTonDirectory.py <path to input and output directories>
378372

379-
# 06_FASTAtoPhylip.py
373+
06_FASTAtoPhylip.py
380374

381375
This program will convert all files in an input directory
382376
from fasta format to a phylip format.
383377

384378
python 07_FASTAtoPhylip.py <number of species> \
385379
<path to input and output directories>
386380

387-
# 07_KaKsonDir.py
381+
07_KaKsonDir.py
388382

389383
This program executes KaKs_Calculator on every file in a directory.
390384

391385
python 07_KaKsonDirectory.py <path to input and output directories> \
392386
<name of reference species>
393387

394-
# 07_CodeMLonDir.py
388+
07_CodeMLonDir.py
395389

396390
This script will run codeml on every file in a directory. It requires
397391
the codeml.ctl file, and likely a tree file which it will supply to
@@ -405,7 +399,7 @@ Remember that the order of the arguments does matter for these scripts.
405399
python 07_CodeMLonDir.py <path to codeml control file> \
406400
<path to input and output directories>
407401

408-
# 07_pruneTree.py
402+
07_pruneTree.py
409403

410404
This script will dynamically trim input trees for CodeML if any sequences
411405
have been removed. Species whose sequences were removed in steps 4 or 5
@@ -415,17 +409,15 @@ Remember that the order of the arguments does matter for these scripts.
415409
python 07_pruneTree.py <path to input directory> \
416410
<list of species remaining in alignment> <path to tmep output directory>
417411

418-
# 08_compileKaKs.py
412+
08_compileKaKs.py
419413

420414
This script concatonates the output from KaKs_Calculator into a text
421415
file. It adds a column for gene (or sequence) IDs, and prints the gene
422416
ID from the filename.
423417

424418
python compileCSV.py <path to input and output directories>
425419

426-
#-------------------------------
427420
# 4. Outputs
428-
#-------------------------------
429421

430422
A directory is created for each step, and, prior to the file conversion step,
431423
each directory will contain a series of single-gene fasta alignments. These
@@ -452,9 +444,7 @@ the files in a separate step. AlignmentProcessor will not, however, run
452444
KaKs_Calculator and CodeML simultaneously, as this could require too much
453445
memory.
454446

455-
#-------------------------------
456447
# 5. Run AlignmentProcessor on test data
457-
#-------------------------------
458448

459449
# To test KaKs_Calculator:
460450
Change directory into the AlignmentProcessor folder. Paste the following into

0 commit comments

Comments
 (0)