Skip to content

Commit 68291c9

Browse files
committed
Added scripts for analyses in manuscript.
1 parent ca3f970 commit 68291c9

14 files changed

Lines changed: 1312 additions & 0 deletions

manuscript/AvesMultiReadMe.sh

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
###############################################################
2+
# This is a readme for analyzing fasta alignmnets
3+
# downloaded from UCSC.
4+
#
5+
# Required programs: AlignmentProcessor1.2
6+
# CodeML4.9
7+
###############################################################
8+
9+
#-------------------------------
10+
# 1. Alignments
11+
#-------------------------------
12+
13+
# From the UCSC table browser page:
14+
Select the human hg19 genome, genes and gene predictions, and ensembl genes.
15+
Select CDS FASTA in ouput format, enter a file name, and select gzip compressed before hitting get output. On the next page,
16+
Select the multiz100way alignment under MAF table. Select every species under birds, and deselect everything else.
17+
18+
# From Ensembl BioMart:
19+
# http://dec2013.archive.ensembl.org/biomart/martview/00c5ddbaf1b0544dc54db6cd39534529
20+
Download a list of hg19/GRCh37 gene IDs with corresponding transcript IDs and chromosome. Save as a csv.
21+
22+
#-------------------------------
23+
# 2. Run AlignmentProcessor to obtain pyhilp output
24+
#-------------------------------
25+
26+
# Make sure to clone AlignmentProcessor from GitHub: https://github.com/WilsonSayresLab/AlignmentProcessor
27+
28+
Upload fasta alignment and avesCodeml.sh to cluster and submit job.
29+
30+
#-------------------------------
31+
# 4. Add gene IDs and locus data
32+
#-------------------------------
33+
34+
# First download concatenated CodeML results from the cluster.
35+
# Sort ascending by transcript ID (I did it in Excel since I inspected the files as well), then add gene IDs and locus data
36+
37+
join -t "," --header --check-order -1 2 -2 1 "h19GeneTranscriptIDs.csv" "branchSpecific/avesCodeMLNull.csv" > "branchSpecific/avesNullOutput.csv"
38+
join -t "," --header --check-order -1 2 -2 1 "h19GeneTranscriptIDs.csv" "branchSpecific/avesCodeMLAlt.csv" > "branchSpecific/avesAltOutput.csv"
39+
40+
#-------------------------------
41+
# 5. Permute result files to obtain a p-value for mean and median tree lengths
42+
#-------------------------------
43+
44+
# To compile the Cython scripts, change into the PermutationScripts directory and type: python setup.py build-ext --inplace
45+
46+
cd PermutationScripts/
47+
48+
# dN
49+
python permutation.py --c1 3 --c2 3 --i1 branchSpecific/avesNullOutput.csv --i2 branchSpecific/avesAltOutput.csv
50+
51+
# dS
52+
python permutation.py --c1 4 --c2 4 --i1 branchSpecific/avesNullOutput.csv --i2 branchSpecific/avesAltOutput.csv
53+
54+
# dN/dS
55+
python permutation.py --c1 5 --c2 5 --i1 branchSpecific/avesNullOutput.csv --i2 branchSpecific/avesAltOutput.csv
56+
57+
# Tree length
58+
python permutation.py --c1 6 --c2 6 --i1 branchSpecific/avesNullOutput.csv --i2 branchSpecific/avesAltOutput.csv

manuscript/AvesPairwiseReadMe.sh

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
###############################################################
2+
# This is a readme for analyzing pairwise fasta alignments
3+
# output from lastz.
4+
#
5+
# Required programs: Lastz
6+
# AlignmentProcessor1.2
7+
# PhyML
8+
# paml
9+
###############################################################
10+
11+
#-------------------------------
12+
# 1. Run Lastz for Alignment
13+
#-------------------------------
14+
15+
# Download scripts from the Avian Genome Project and upload to server.
16+
# Submit falcon_chicken.sh script on Saguaro. See comments at the end for changing file, target, and query names.
17+
18+
#-------------------------------
19+
# 2. Run Stich Gene Blocks on Galaxy
20+
#-------------------------------
21+
22+
# Upload the Lastz output maf file to Galaxy, and import the GalGal4 bed12 file from UCSC (select Genes and Gene Predictions,
23+
# Ensembl genes, genome, and BED format before getting output. On the next page select whole gene). Remove the "chr_UN" and "chr"
24+
# chromosome prefixes from the file, and resubmit the file to Galaxy. Also submit the chicken genome used in the alignment as a
25+
# custom build and set the builds of all three files to the custom build. (Alternatively, just use the UCSC chicken genome in the alignment.)
26+
27+
#-------------------------------
28+
# 3. Run AlignmentProcessor
29+
#-------------------------------
30+
31+
# KaKs_Calculator
32+
cd AlignmentProcessor/
33+
python AlignmentProcessor.py -t 4 --axt --kaks -r galGal4 -i Pairwise/galgal_falper.fa -o KaKs/
34+
35+
# CodeML
36+
cd AlignmentProcessor/
37+
cp controlFiles/pairwiseAlt.ctl PairwiseCodeML1.2
38+
python AlignmentProcessor.py -t 4 --phylip --codeml -r galGal4 -i Pairwise/galgal_falper.fa -o PairwiseCodeML/
39+
40+
#-------------------------------
41+
# 4. Prepare Output for analysis in R
42+
#-------------------------------
43+
44+
# Download a list of gene and transcript IDs and their chromosome for galGal4 from Ensembl BioMart (v83).
45+
# Join this list with the Ka/Ks results to add gene, scaffold, and chromosome information to the results.
46+
# Prior to joining, open each list in Excel and sort both files either ascending or descending by their transcript IDs.
47+
# It does not matter which one you use, as long as you use the same one for each (join will not work properly if they are sorted differently).
48+
49+
# KaKs
50+
join -t "," --header --check-order -1 2 -2 1 "Pairwise/galGal4GeneTranscriptIDs.txt" "KaKs1.2/KaKs.csv" > "KaKs1.2/galGal4KaKs.csv"
51+
52+
# CodeML
53+
python bin/ConcatenateCodeML.py --pairwise -i PairwiseCodeML1.2/04_CodemlOutput -o PairwiseCodeML1.2/pairwiseAlt.csv
54+
join -t "," --header --check-order -1 2 -2 1 "Pairwise/galGal4GeneTranscriptIDs.txt" "PairwiseCodeML1.2/pairwiseAlt.csv" > "PairwiseCodeML1.2/galGal4Pairwise.csv"
55+
56+
#-------------------------------
57+
# 5. Permutation test for Z chromosome vs. Autosomes
58+
#-------------------------------
59+
60+
# Make two csv files from the joined file in step 4, one for only autosomal genes and one for only Z-linked genes.
61+
# Manually remove any genes with NAs for Ka, Ks, or Ka/Ks.
62+
63+
cd PermutationScripts/
64+
65+
python permutation.py --c1 7 --c2 7 -i1 KaKsOut/galGal_falPerAutosomalKaKs.csv -i2 KaKsOut/galGal_falPerZKaKs.csv
66+
67+

0 commit comments

Comments
 (0)