Skip to content

Commit fb5799b

Browse files
committed
update docs eid section order
Move sections around for a more intuitive organization of the sections
1 parent 9713fa7 commit fb5799b

1 file changed

Lines changed: 30 additions & 32 deletions

File tree

docusaurus/docs/epitopeid.md

Lines changed: 30 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -50,14 +50,6 @@ The provided database files are missing the genomic reference file for storage r
5050
For EpitopeID, this is the "database" or directory with four types of reference files used by `identify-Epitope.sh`. You will notice that EpitopeID provides reference files for both yeast (`sacCer3_EpiID`) and human (`hg19_EpiID`) so you can quickly get started without building up the database from scratch. However, you are free to customize and build your own set of files (e.g. add different epitope tags to check for, use a different genome build).
5151

5252
### Database structure
53-
Below is a list of the the hardcoded filenames that EpitopeID looks for during execution and some information on the provided yeast and human defaults.
54-
55-
* The `FASTA_tag/ALL_TAG.fa` is the FASTA formatted collection of all epitope sequences to search for. The yeast tag database includes the [AID, Extended-Tap, FRB, HA_v1, HA_v3, MNase, ProtA, CBP, FLAG-3x, GFP, HA_v2, HaloTag, and Myc(3x)][tag-ref] sequences. The human tag database only includes the [LAP][lap-ref] tag but it is easy to customize the list to include other epitopes for EpitopeID to look for.
56-
* The `FASTA_genome/genome.fa` is the reference genome used for the genomic alignments that the other annotations are base on. Even if you use the provided databases from Github, the genomic reference file still needs to be downloaded and moved to `FASTA_genome/genome.fa`. (Genome was not include for data storage reasons)
57-
* The `annotation/genome_annotation.gff.gz` file defines the bin coordinates to use when localizing the epitope insertion in PE datasets. The yeast default uses SGD gene annotation coordinates to defines one bin for the length of each gene, 250bp bins flanking each set of gene coordinates, and 250bp bins breaking up the remaining intergenic regions. The human default similarly bins out the genome using 1000bp windows on the NCBI Refseq annotations.
58-
* The `blacklist_filter/blacklist.bed`
59-
60-
6153
Whether you use the provided reference files or create your own, the database should use the following directory structure both to ensure that EpitopeID can find the correct reference files and for organization, clarity, and consistency.
6254
```
6355
/name/of/epiDB
@@ -81,6 +73,13 @@ Whether you use the provided reference files or create your own, the database sh
8173
| |--blacklist.bed
8274
```
8375

76+
Below is a list of the the hardcoded filenames that EpitopeID looks for during execution and some information on the provided yeast and human defaults.
77+
78+
* The `FASTA_tag/ALL_TAG.fa` is the FASTA formatted collection of all epitope sequences to search for. The yeast tag database includes the [AID, Extended-Tap, FRB, HA_v1, HA_v3, MNase, ProtA, CBP, FLAG-3x, GFP, HA_v2, HaloTag, and Myc(3x)][tag-ref] sequences. The human tag database only includes the [LAP][lap-ref] tag but it is easy to customize the list to include other epitopes for EpitopeID to look for.
79+
* The `FASTA_genome/genome.fa` is the reference genome used for the genomic alignments that the other annotations are base on. Even if you use the provided databases from Github, the genomic reference file still needs to be downloaded and moved to `FASTA_genome/genome.fa`. (Genome was not include for data storage reasons)
80+
* The `annotation/genome_annotation.gff.gz` file defines the bin coordinates to use when localizing the epitope insertion in PE datasets. The yeast default uses SGD gene annotation coordinates to defines one bin for the length of each gene, 250bp bins flanking each set of gene coordinates, and 250bp bins breaking up the remaining intergenic regions. The human default similarly bins out the genome using 1000bp windows on the NCBI Refseq annotations.
81+
* The `blacklist_filter/blacklist.bed`
82+
8483
Below is more information on how to use the utility scripts to download and customize your reference files.
8584

8685

@@ -220,6 +219,29 @@ cd $EPITOPEID/utility_scripts
220219
221220
``` -->
222221

222+
223+
## Output Report (`-o`)
224+
225+
The output report is saved to the user-provided output directory in a file named based on the input FASTQ files (`/path/to/output/XXXXX_R1-ID.tab`). Below is a sample report based on the results from running EpitopeID on the ENCODE ENCFF415CJF sample.
226+
227+
```
228+
EpitopeID EpitopeCount
229+
LAP-tag 435
230+
231+
GeneID EpitopeID EpitopeLocation EpitopeCount pVal
232+
NR4A1|chr12:52416616-52453291 LAP-tag C-term 9 3.580493355965414e-24
233+
```
234+
235+
The first part of the report shows which epitopes in `Tag_DB` were identified in the sample (**EpitopeID column**) and how many reads mapped to this epitope (**EpitopeCount**) to help quantify the coverage of the epitopes which relates to the confidence of the call.
236+
237+
The second part of the report shows which epitopes localized to which regions/tiles of the genome significantly (sorted by pvalue if multiple hits). The columns specify the coordinate interval (**GeneID**), which epitope maps to this locus (**EpitopeID**), if this occurs on the N or C-terminus (**EpitopeLocation**), the number of reads mapping to this tile (**EpitopeCount**), and the poisson-calculated associated p-value to indicate confidence of the site (**pVal**).
238+
239+
240+
## Threading (`-t`)
241+
242+
This optional input is used to specify the number of threads to used for the BWA alignment commands. Defaults to 1.
243+
244+
223245
## Example: Set-up EpitopeID and run on yeast example
224246
```bash
225247
git clone www.github/CEGRcode/GenoPipe
@@ -244,30 +266,6 @@ bash identify_Epitope.sh -i ../samples/ -o ../output/ -d sacCer3_EpiID -t 4
244266
```
245267

246268

247-
## Threading (`-t`)
248-
249-
This optional input is used to specify the number of threads to used for the BWA alignment commands. Defaults to 1.
250-
251-
252-
253-
## Output Report (`-o`)
254-
255-
The output report is saved to the user-provided output directory in a file named based on the input FASTQ files (`/path/to/output/XXXXX_R1-ID.tab`). Below is a sample report based on the results from running EpitopeID on the ENCODE ENCFF415CJF sample.
256-
257-
```
258-
EpitopeID EpitopeCount
259-
LAP-tag 435
260-
261-
GeneID EpitopeID EpitopeLocation EpitopeCount pVal
262-
NR4A1|chr12:52416616-52453291 LAP-tag C-term 9 3.580493355965414e-24
263-
```
264-
265-
The first part of the report shows which epitopes in `Tag_DB` were identified in the sample (**EpitopeID column**) and how many reads mapped to this epitope (**EpitopeCount**) to help quantify the coverage of the epitopes which relates to the confidence of the call.
266-
267-
The second part of the report shows which epitopes localized to which regions/tiles of the genome significantly (sorted by pvalue if multiple hits). The columns specify the coordinate interval (**GeneID**), which epitope maps to this locus (**EpitopeID**), if this occurs on the N or C-terminus (**EpitopeLocation**), the number of reads mapping to this tile (**EpitopeCount**), and the poisson-calculated associated p-value to indicate confidence of the site (**pVal**).
268-
269-
270-
271269
## FAQs
272270

273271
* Q: My epitope sequence isn't part of the sequences in the default provided reference files (either `sacCer3_EpiID` or `hg19_EpiID`). Can I still use EpitopeID for checking my samples?

0 commit comments

Comments
 (0)