Skip to content

Commit c40d6b7

Browse files
authored
Update README.md
1 parent cf57f82 commit c40d6b7

1 file changed

Lines changed: 43 additions & 13 deletions

File tree

README.md

Lines changed: 43 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,12 @@
11
# Allo
22

3-
A multi-mapped read rescue strategy for peak-based gene regulatory analyses.
3+
A multi-mapped read rescue strategy for gene regulatory analyses.
44

5-
## Installation
5+
### Releases
6+
7+
As of **v1.1.0**, Allo has neural networks trained for DNase-seq and ATAC-seq under the MACS2 parameters "--nomodel --shift -100 --extsize 200" for ATAC-seq and MACS2 default parameters for DNase-seq. Additionally, Allo now has the option to remove introns as identified by splice junction information in the CIGAR string of an aligned read. This affects the window used to sum uniquely mapped reads. Information below regarding the use of Allo for RNA-seq data processing.
68

9+
## Installation
710
### Package managers
811

912
* Bioconda: [![Anaconda-Server Badge](https://anaconda.org/bioconda/allo/badges/version.svg)](https://anaconda.org/bioconda/allo)
@@ -22,23 +25,24 @@ pip install -e .
2225
```
2326

2427
## Usage
25-
### Pre-processing
26-
Using Allo requires a few pre-processing steps. In most ChIP pipelines, the default behavior of aligners is to assign multi-mapped reads to random locations within their mappings without retaining information on the other locations. Both Bowtie1/2 and BWA can be used for single-end. Unfortunately, BWA cannot be used for paired-end reads prior to Allo due to constraints in how it outputs multi-mapped reads. The following arguments should be used:
28+
### Peak-based applications (ChIP-seq, ATAC-seq, DNase-seq, etc)
29+
#### Pre-processing
30+
Using Allo requires a few pre-processing steps. In most ChIP-seq, ATAC-seq, and DNase-seq pipelines, the default behavior of aligners is to assign multi-mapped reads to random locations within their mappings without retaining information on the other locations. Both Bowtie1/2 and BWA can be used for single-end. Unfortunately, BWA cannot be used for paired-end reads prior to Allo due to constraints in how it outputs multi-mapped reads. The following arguments should be used:
2731

2832
*Bowtie1*
2933

3034
```
3135
#Single-end
32-
bowtie -x INDEX -q FASTQ -S SAMOUT --best --strata -m 50 -k 50 -p THREADS
36+
bowtie -x INDEX -q FASTQ -S SAMOUT --best --strata -m 25 -k 25 -p THREADS
3337
#Paired-end
34-
bowtie -x INDEX -1 READ1 -2 READ2 -S SAMOUT --best --strata -m 50 -k 50 -p THREADS
38+
bowtie -x INDEX -1 READ1 -2 READ2 -S SAMOUT --best --strata -m 25 -k 25 -p THREADS
3539
```
3640
*Bowtie2*
3741
```
3842
#Single-end
39-
bowtie2 -x INDEX -q FASTQ -S SAMOUT -k 50 -p THREADS
43+
bowtie2 -x INDEX -q FASTQ -S SAMOUT -k 25 -p THREADS
4044
#Paired-end
41-
bowtie2 -x INDEX -1 READ1 -2 READ2 -S SAMOUT -k 50 --no-mixed --no-discordant -p THREADS
45+
bowtie2 -x INDEX -1 READ1 -2 READ2 -S SAMOUT -k 25 --no-mixed --no-discordant -p THREADS
4246
```
4347
*BWA*
4448
```
@@ -53,7 +57,7 @@ Finally, the output of the aligners must be sorted by read name in order to use
5357
samtools collate -o ALIGNEROUTPUT_SORT.SAM ALIGNEROUTPUT_FILTER.SAM
5458
```
5559

56-
### Running Allo
60+
#### Running Allo
5761
The basic command for Allo:
5862
```
5963
allo ALIGNEROUTPUT_SORT.SAM -seq PAIRED_OR_SINGLE -o OUTPUTNAME -m MIXED_OR_NARROW_PEAKS
@@ -68,18 +72,44 @@ Very short test files are supplied to make sure Allo runs to completion on your
6872
allo testRunPE.sam -seq pe
6973
```
7074

71-
### Post-processing and tips
72-
Allo adds a ZA tag to every MMR that is allocated. For reads that are allocated to regions that all contain 0 UMRs (random assignment), a ZZ tag is used instead. This allows users to remove reads that only map to zero UMR regions if they wish. The value within either tag corresponds to the number of places a read/pair mapped to. In order to get only uniquely mapped reads, grep could be used with the -v option to exclude lines with ZA or ZZ tags. On the same note, awk can used to filter reads with a specific number of mapping locations (can also be done with the -max option within Allo). Outside of adding these tags, Allo does not change anything within the read alignment columns for allocated reads.
75+
#### Additional tips
76+
It is recommended to run Allo on both the control and target sequencing files in order to balance out background in the samples. We recommend running Allo using the --random argument on the control file. This generally results in higher confidence peaks.
7377

74-
Tip: It is recommended to run Allo on both the control and target sequencing files in order to balance out background in the samples. We recommend running Allo using the --random argument on the control file. This generally results in higher confidence peaks.
78+
### Pre-processing for RNA-seq
79+
Allo is compatible with STAR alignments. We recommend using the "--outFilterType BySJout" argument if you choose to use the "--splice" function in Allo in order to only consider high quality junctions. An example of a paired-end STAR alignment keeping up to 25 locations per read is shown below:
80+
```
81+
STAR --genomeDir GENOMEDIR --readFilesIn fASTQ_1 FASTQ_2 --outSAMtype BAM Unsorted --outSAMmultNmax 25 --outFilterType BySJout --outFileNamePrefix ALIGNEROUTPUT
82+
```
83+
84+
To use Allo, first sort your file:
85+
```
86+
samtools collate -o ALIGNEROUTPUT_SORT.BAM ALIGNEROUTPUT_FILTER.BAM
87+
```
7588

89+
Following this, we recommend running Allo on read count only mode as the neural networks available are not trained on RNA-seq profiles. Additionally, the --splice argument can be used if the user would like Allo to splice introns out when summing uniquely mapped reads.
90+
```
91+
allo ALIGNEROUTPUT_SORT.BAM -seq PAIRED_OR_SINGLE -o OUTPUTNAME --readcount --splice
92+
```
93+
94+
#### Downstream analysis
95+
Following the use of Allo, users can utilize FeatureCounts with the argument "-M" which retains multi-mapped reads.
96+
```
97+
featureCounts -a GTF_FILE -o COUNTS.out *.bam -M
98+
```
99+
100+
101+
## Output information
102+
Allo adds a ZA tag to every MMR that is allocated. For reads that are allocated to regions that all contain 0 UMRs (random assignment), a ZZ tag is used instead. This allows users to remove reads that only map to zero UMR regions if they wish. The value within either tag corresponds to the number of places a read/pair mapped to. In order to get only uniquely mapped reads, grep could be used with the -v option to exclude lines with ZA or ZZ tags. On the same note, awk can used to filter reads with a specific number of mapping locations (can also be done with the -max option within Allo). Outside of adding these tags, Allo does not change anything within the read alignment columns for allocated reads.
76103

77104
### Options
78105
| Argument | Options | Explanation |
79106
| ------------- | ------------- | ------------- |
80107
| -o | any string | Output file name |
81108
| -seq | "se" "pe" | Single-end or paired-end sequencing mode, REQUIRED |
82-
| -m | "mixed" "narrow" | Use CNN trained on either a narrow peak dataset or a dataset with mixed peaks, narrow by default |
109+
| --mixed | | Use CNN trained on histone ChIP-seq datasets with mixed peaks, narrow by default |
110+
| --dnase | | Use CNN trained on histone DNase-seq datasets, narrow by default |
111+
| --atac | | Use CNN trained on histone ATAC-seq datasets, narrow by default |
112+
| --splice | | Remove introns as identified by splice junctions when summing the uniquely-mapped read counts |
83113
| -p | any int | Number of processes, 1 by default |
84114
| --keep-unmap | | Keep unmapped reads and reads that include N in their sequence |
85115
| --remove-zeros | | Do not report multi-mapped reads that map to regions with 0 uniquely mapped reads (random assignment) |

0 commit comments

Comments
 (0)