You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+118Lines changed: 118 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,3 +9,121 @@ Any restrictions to use for profit or non-academics: Alternative commercial lice
9
9
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
10
10
11
11
Please contact Dr. Hani Z. Girgis (hzgirgis@buffalo.edu) if you need more information.
12
+
13
+
EnhancerDetector is a deep learning-based classification tool for predicting the presence of enhancers in DNA sequences. It uses species-specific convolutional neural networks (CNNs) trained on experimentally validated datasets for human, mouse, and Drosophila melanogaster (fly). Class activation mapping (CAM) can optionally be used to visualize the regions in each sequence that most influenced the model’s decision.
14
+
15
+
## Files:
16
+
Models: Contains trained CNN models for each supported species (human, mouse, and fly) used by EnhancerDetector. The fly model uses an ensemble of three classifiers. The folder also includes indexers for converting DNA to numerical format.
17
+
18
+
Output: This folder stores the outputs generated by EnhancerDetector, including enhancer predictions and class activation maps.
19
+
20
+
Test_Input: This folder contains test input files for EnhancerDetector. These files demonstrate the required input format and can be used to test if the tool is functioning correctly. input_human.fasta contains a list of hg38 human sequences in FASTA format. input_mouse.fasta contains a list of mm10 mouse sequences in FASTA format. input_fly.fasta contains a list of dm6 melanogaster sequences in FASTA format.
21
+
22
+
EnhancerDetector.ipynb: This Jupyter notebook contains the code to run EnhancerDetector. If you are comfortable using Jupyter notebooks, you can modify the parameters in the first few cells and execute the notebook to generate an output evaluation from EnhancerDetector.
23
+
24
+
## Tool:
25
+
EnhancerDetector.py: This python code contains the code to run EnhancerDetector. This code is to be executed via terminal, it takes a input fasta file along with a input for whether the user wants class activation maps generated, which species model to use and a output directory.
NOTE: These are the version the program was created on, future versions of these libraries may or may not work.
41
+
42
+
## Parameters:
43
+
EnhancerDetector uses four parameters:
44
+
45
+
--species: choose between human, mouse and fly. This will determine which model to use for which species you want to evaluate.
46
+
47
+
--input: the input directory of the fasta file
48
+
49
+
--cam: this will determine if Class Activation Maps are generated for the given sequences.
50
+
51
+
--outdir: the output dirctory that the output of both the evaluation and cams will be placed in.
52
+
53
+
## To Run Tool:
54
+
1. Clone EnhancerDetector and head to where it was cloned.
55
+
56
+
2. Make sure EnhancerDetector is unzipped.
57
+
58
+
3. Run EnhancerDetector.py
59
+
> python EnhancerDetector.py --species human --input human_sequences.fa --outdir Output
60
+
61
+
4. Results will be saved in the Output/ folder as Model_Output.txt, which lists each input sequence and its predicted enhancer probability.
62
+
63
+
5. If you want a CAM model generated for the input sequences run:
64
+
> python EnhancerDetector.py --species human --input human_sequences.fa --cam --outdir Output
65
+
66
+
6. In the output folder, the generated CAM is called sequence_CAM.pdf
67
+
68
+
Opening it will show a heatmap for the given sequence, the dark red regions show the main area that influenced EnhancerDetector's final decision. Please read the main paper for more details.
69
+
70
+
If you wish to run EnhancerDetector via the jupyter notebook:
71
+
72
+
1. Open EnhancerDetector.ipynb
73
+
74
+
2. Locate the third cell then locate and edit the following parameters:
5. Inside the Output folder will be the results and CAM models for each given sequence.
122
+
123
+
6. If you want to use the jupyter notebook then open EnhancerDetector.ipynb
124
+
125
+
7. By default the similar_sequences_file should already be set to the human test cases. Change the parameters to the mouse/fly and switch the use_fly to True if using the fly.
126
+
127
+
8. Set output_cam_pdf to true if you want to generate the CAM models.
128
+
129
+
9. Run the entire notebook and the outputs will be generated in the Output folder.
0 commit comments