Skip to content

Commit d913017

Browse files
authored
add quickstart to README.md
The commit expands the README to include a quickstart on the landing page showing dependencies to download and how to set up the databases for default yeast/human runs.
1 parent 0de893b commit d913017

1 file changed

Lines changed: 256 additions & 3 deletions

File tree

README.md

Lines changed: 256 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,270 @@
11
# GenoPipe
22

3+
34
## Toolkit for characterizing the genotype of NGS datasets
45

56
There are 3 primary modules for genotype identification:
67

78
### EpitopeID
9+
10+
- Identify and determine the genomic location of epitopes relative to genomic loci
11+
12+
- Some epitope sequences are provided in the default tag database
813

9-
Identify and determine the genomic location of epitopes relative to genomic loci
14+
| sacCer3(yeast) | hg19(human) |
15+
| -------------- | ----------- |
16+
| AID | LAP-tag |
17+
| CBP | |
18+
| Extended-Tap | |
19+
| FLAG-3x | |
20+
| FRB | |
21+
| GFP | |
22+
| HA_v1 | |
23+
| HA_v2 | |
24+
| HA_v3 | |
25+
| HaloTag | |
26+
| MNase_v2 | |
27+
| Myc-3x | |
28+
| ProteinA | |
1029

1130
### DeletionID
1231

13-
Identify signficant depletion of aligned NGS tags in the genome relative to a background model. This module is useful for confirming gene knockouts.
32+
- Identify signficant depletion of aligned NGS tags in the genome relative to a background model. This module is useful for confirming gene knockouts.
33+
34+
- Default database includes reference files that direct the search for depleted reads within gene annotation intervals from the sacCer3(yeast) genome build.
1435

1536
### StrainID
1637

17-
Compare a database of VCF files against an aligned BAM file to check for the presence of SNPs in order to determine a likely cell line/strain used in the experiment
38+
- Compare a database of VCF files against an aligned BAM file to check for the presence of SNPs in order to determine a likely cell line/strain used in the experiment
39+
40+
- Default database includes reference VCF files for the following strains:
41+
42+
| sacCer3(yeast) | hg19(human) |
43+
| -------------- | ----------- |
44+
| BY4741 | A549 |
45+
| BY4742 | HCT116 |
46+
| CEN.PK2-1Ca | HEKTE |
47+
| D273-10B | HELA |
48+
| FL100 | HepG2 |
49+
| JK9-3d | K562 |
50+
| RM11-1A | LnCap |
51+
| RedStar | MCF7 |
52+
| SEY6210 | SKnSH |
53+
| Sigma1278b-10560-6B | |
54+
| W303 | |
55+
| X2180-1A | |
56+
| Y55 | |
57+
58+
59+
[Figure 1]
60+
61+
62+
## Quickstart
63+
64+
This guide is for how to run each of the three GenoPipe modules on data from yeast(sacCer3) and human(hg19) samples. See the full documentation for how to modify and generate reference files for other genome builds.
65+
66+
### Dependencies
67+
68+
You will need the following software to run all GenoPipe modules:
69+
70+
[Samtools v1.5+](http://www.htslib.org/)
71+
72+
[Bedtools v2.27+](https://bedtools.readthedocs.io/en/latest/)
73+
74+
[BWA v0.7.15+](http://bio-bwa.sourceforge.net/bwa.shtml)
75+
76+
[Python v3.6.8+](https://www.python.org/)
77+
78+
- [scipy v1.5.4+](https://www.scipy.org/)
79+
80+
- [pysam v0.16.0.1+](https://pysam.readthedocs.io/en/latest/api.html)
81+
82+
-
83+
84+
[Perl](https://www.perl.org/)
85+
86+
[wget](https://www.gnu.org/software/wget/)
87+
88+
conda install:
89+
90+
```
91+
conda create -n <myenv> -c conda-forge -c bioconda python=3.6 perl=5 bwa=0.7.15 bedtools=2.27 samtools=1.5 pysam=0.16 scipy=1.5.4 wget=1.14
92+
```
93+
94+
### Download
95+
96+
To download GenoPipe, you can clone the repostitory. No builds needed.
97+
98+
```
99+
git clone GenoPipe
100+
cd GenoPipe
101+
```
102+
103+
104+
###EpitopeID
105+
genomes and epitope sequences
106+
107+
- yeast epitope tags
108+
109+
Saccer33xMyc
110+
111+
112+
113+
1. Check FASTQ filenames
114+
115+
EpitopeID takes gzipped FASTQ files as input. The file name should end with a `_R1` or `_R2` and use the extension `fastq.gz` (the standard naming convention of Illumina libraries).
116+
117+
Example:
118+
119+
The following would be valid file names for EpitopeID input files where “SampleA” is single-end data and SampleB is paired-ended.
120+
121+
```
122+
SampleA_R1.fastq.gz
123+
SampleB_R1.fastq.gz
124+
SampleB_R2.fastq.gz
125+
```
126+
127+
2. Set-up the database
128+
129+
The following instructions are for setting up the database of reference files used by EpitopeID using the provided genome builds and epitope tag sequences. To customize your database, see the full documentation.
130+
131+
For downloading yeast genome...
132+
133+
```
134+
cd utility_scripts/
135+
bash download_sacCer3_Genome.sh
136+
```
137+
138+
For downloading human genome...
139+
140+
```
141+
cd utility_scripts/
142+
bash download_hg19_Genome.sh
143+
```
144+
145+
146+
3. Run EpitopeID
147+
148+
When providing path locations, it is important that you provide **absolute paths** (i.e. path should start with `/` or `~/`).
149+
150+
For yeast (sacCer3) samples...
151+
```
152+
cd GenoPipe/EpitopeID
153+
bash identify-Epitope.sh -i /path/to/FASTQ -o /path/to/output -d /path/to/GenoPipe/EpitopeID/hg19_EpiID
154+
```
155+
156+
For human (hg19) samples...
157+
```
158+
cd GenoPipe/EpitopeID
159+
bash identify-Epitope.sh -i /path/to/FASTQ -o /path/to/output -d /path/to/GenoPipe/EpitopeID/hg19_EpiID
160+
```
161+
162+
163+
Joe Schmoe Example:
164+
165+
In the following example, GenoPipe, the directory including all the input yeast FASTQ files, and the new directory for storing EpitopeID reports are stored on the Desktop of Joe Schmoe. Filepaths would need to be changed according to a user's preferred directory structure.
166+
167+
```
168+
# Download GenoPipe
169+
cd /User/joeschmoe/Desktop/
170+
git clone GenoPipe
171+
# Download Genomic FASTA and move to appropriate directory
172+
cd /User/joeschmoe/Desktop/GenoPipe/EpitopeID/utility_scripts/genome_data/
173+
bash download_sacCer3_Genome.sh
174+
mv genome.fa* ../../sacCer3_EpiID/FASTA_genome/
175+
cd ../../
176+
# Run EpitopeID
177+
bash identify-Epitope.sh -i /User/joeschmoe/Desktop/myfastq -o /User/joeschmoe/Desktop/myreports_EID -d /User/joeschmoe/Desktop/GenoPipe/EpitopeID/sacCer3_EpiID
178+
```
179+
180+
181+
182+
183+
### DeletionID
184+
185+
1. Align FASTQ input files
186+
187+
DeletionID uses BAM files as its input. Make sure that the reads are aligned to sacCer3 if you are using the default interval database. Any aligner that outputs standard BAM format can be used to generate the BAM input. DeletionID was tested on [BWA-MEM](http://bio-bwa.sourceforge.net/bwa.shtml).
188+
189+
2. Run DeletionID
190+
191+
192+
For yeast (sacCer3) samples...
193+
194+
```
195+
cd GenoPipe/DeletionID
196+
bash identify-Deletion.sh -i /path/to/BAM -o /path/to/output -d /path/to/GenoPipe/DeletionID/sacCer3_Del
197+
```
198+
199+
Joe Schmoe Example:
200+
201+
In the following example, GenoPipe, the directory including all the input yeast BAM files, and the new directory for storing DeletionID reports are stored on the Desktop of Joe Schmoe. Filepaths would need to be changed according to a user's preferred directory structure.
202+
203+
```
204+
cd /User/joeschmoe/Desktop/GenoPipe/DeletionID
205+
# Run DeletionID
206+
bash identify-Deletion.sh -i /User/joeschmoe/Desktop/mybam -o /User/joeschmoe/Desktop/myreports_DID -d /User/joeschmoe/Desktop/GenoPipe/DeletionID/sacCer3_Del
207+
```
208+
209+
### StrainID
210+
211+
1. Align FASTQ input files
212+
213+
StrainID uses BAM files as its input. Make sure that the reads are aligned to the appropriate sacCer3 or hg19 genome build if you are using the default interval database. Any aligner that outputs standard BAM format can be used to generate the BAM input. StrainID was tested on [BWA-MEM](http://bio-bwa.sourceforge.net/bwa.shtml).
214+
215+
2. Run StrainID
216+
217+
For yeast (sacCer3) samples...
218+
219+
```
220+
cd GenoPipe/StrainID
221+
bash identify-Strain.sh -i /path/to/BAM -o /path/to/output -g /path/to/sacCer3.fa -v /path/to/GenoPipe/StrainID/sacCer3_VCF
222+
```
223+
224+
For human (hg19) samples...
225+
226+
```
227+
cd GenoPipe/StrainID
228+
bash identify-Strain.sh -i /path/to/BAM -o /path/to/output -g /path/to/hg19.fa -v /path/to/GenoPipe/StrainID/hg19_VCF
229+
```
230+
231+
Joe Schmoe Example:
232+
233+
In the following example, GenoPipe, the directory including all the input yeast BAM files, and the new directory for storing DeletionID reports are stored on the Desktop of Joe Schmoe. Filepaths would need to be changed according to a user's preferred directory structure.
234+
235+
```
236+
cd /User/joeschmoe/Desktop/GenoPipe/
237+
cd EpitopeID/utility_scripts/genome_data
238+
bash download_sacCer3_Genome.sh
239+
mv genome.fa /User/joeschmoe/Desktop/GenoPipe/sacCer3.fa
240+
# Run StrainID
241+
cd ../../../StrainID
242+
bash identify-Strain.sh -i /User/joeschmoe/Desktop/mybam -o /User/joeschmoe/Desktop/myreports_SID -g /User/joeschmoe/Desktop/GenoPipe/sacCer3.fa -v /User/joeschmoe/Desktop/GenoPipe/StrainID/sacCer3_VCF
243+
```
244+
245+
246+
Full Joe Schmoe examples' directory structure:
247+
248+
```
249+
/User/joeschmoe/Desktop
250+
|--GenoPipe
251+
| |--EpitopeID
252+
| |--DeletionID
253+
| |--StrainID
254+
|--myfastq
255+
| |--SampleA_R1.fastq.gz
256+
| |--SampleB_R1.fastq.gz
257+
| |--SampleB_R2.fastq.gz
258+
|--mybam
259+
| |--SampleA.bam
260+
| |--SampleB.bam
261+
|--myreports_EID
262+
| |--SampleA_R1-ID.tab
263+
| |--SampleB_R1-ID.tab
264+
|--myreports_DID
265+
| |--SampleA_deletion.tab
266+
| |--SampleB_deletion.tab
267+
|--myreports_SID
268+
|--SampleA_strain.tab
269+
|--SampleB_strain.tab
270+
```

0 commit comments

Comments
 (0)