Skip to content

Commit 0742c8c

Browse files
Update README.md
1 parent 690de40 commit 0742c8c

1 file changed

Lines changed: 118 additions & 0 deletions

File tree

README.md

Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,121 @@ Any restrictions to use for profit or non-academics: Alternative commercial lice
99
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
1010

1111
Please contact Dr. Hani Z. Girgis (hzgirgis@buffalo.edu) if you need more information.
12+
13+
EnhancerDetector is a deep learning-based classification tool for predicting the presence of enhancers in DNA sequences. It uses species-specific convolutional neural networks (CNNs) trained on experimentally validated datasets for human, mouse, and Drosophila melanogaster (fly). Class activation mapping (CAM) can optionally be used to visualize the regions in each sequence that most influenced the model’s decision.
14+
15+
## Files:
16+
Models: Contains trained CNN models for each supported species (human, mouse, and fly) used by EnhancerDetector. The fly model uses an ensemble of three classifiers. The folder also includes indexers for converting DNA to numerical format.
17+
18+
Output: This folder stores the outputs generated by EnhancerDetector, including enhancer predictions and class activation maps.
19+
20+
Test_Input: This folder contains test input files for EnhancerDetector. These files demonstrate the required input format and can be used to test if the tool is functioning correctly. input_human.fasta contains a list of hg38 human sequences in FASTA format. input_mouse.fasta contains a list of mm10 mouse sequences in FASTA format. input_fly.fasta contains a list of dm6 melanogaster sequences in FASTA format.
21+
22+
EnhancerDetector.ipynb: This Jupyter notebook contains the code to run EnhancerDetector. If you are comfortable using Jupyter notebooks, you can modify the parameters in the first few cells and execute the notebook to generate an output evaluation from EnhancerDetector.
23+
24+
## Tool:
25+
EnhancerDetector.py: This python code contains the code to run EnhancerDetector. This code is to be executed via terminal, it takes a input fasta file along with a input for whether the user wants class activation maps generated, which species model to use and a output directory.
26+
27+
## Requirements:
28+
EnhancerDetector uses several libraries:
29+
30+
Python version: 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]
31+
32+
TensorFlow version: 2.13.0
33+
34+
Biopython version: 1.83
35+
36+
NumPy version: 1.24.3
37+
38+
Matplotlib version: 3.8.3
39+
40+
NOTE: These are the version the program was created on, future versions of these libraries may or may not work.
41+
42+
## Parameters:
43+
EnhancerDetector uses four parameters:
44+
45+
--species: choose between human, mouse and fly. This will determine which model to use for which species you want to evaluate.
46+
47+
--input: the input directory of the fasta file
48+
49+
--cam: this will determine if Class Activation Maps are generated for the given sequences.
50+
51+
--outdir: the output dirctory that the output of both the evaluation and cams will be placed in.
52+
53+
## To Run Tool:
54+
1. Clone EnhancerDetector and head to where it was cloned.
55+
56+
2. Make sure EnhancerDetector is unzipped.
57+
58+
3. Run EnhancerDetector.py
59+
> python EnhancerDetector.py --species human --input human_sequences.fa --outdir Output
60+
61+
4. Results will be saved in the Output/ folder as Model_Output.txt, which lists each input sequence and its predicted enhancer probability.
62+
63+
5. If you want a CAM model generated for the input sequences run:
64+
> python EnhancerDetector.py --species human --input human_sequences.fa --cam --outdir Output
65+
66+
6. In the output folder, the generated CAM is called sequence_CAM.pdf
67+
68+
Opening it will show a heatmap for the given sequence, the dark red regions show the main area that influenced EnhancerDetector's final decision. Please read the main paper for more details.
69+
70+
If you wish to run EnhancerDetector via the jupyter notebook:
71+
72+
1. Open EnhancerDetector.ipynb
73+
74+
2. Locate the third cell then locate and edit the following parameters:
75+
76+
similar_sequences_file = directory_of_input/sequences.fa
77+
78+
network = f'{model_folder}/{Species_Folder}/model.keras'
79+
80+
indexer_dir = f'{model_folder}/{Species_Folder}/indexer.pkl'
81+
82+
4. If you want a CAM model generated then locate the second cell and locate output_cam_pdf.
83+
84+
Set this parameter to output_cam_pdf = True
85+
86+
5. If you want to change the output directory locate and edit output_dir with your output directory.
87+
88+
6. For the Fly Model in the third cell you will see the use_fly parameter, just set that to True and it will automatically use those models.
89+
90+
7. Once you edit the parameters, run the entire notebook and the outputs will be generated in the output directory.
91+
92+
## To Run our Tests:
93+
1. Look inside the Test_Input folder, inside are three fasta files:
94+
95+
input_human.fasta = Includes ten human sequences in fasta format, the first five are likely enhancers while the last five are non-enhancers.
96+
97+
input_mouse.fasta = Includes ten mouse sequences in fasta format, the first five are likely enhancers while the last five are non-enhancers.
98+
99+
input_fly.fasta = Includes ten fly sequences in fasta format, the first five are likely enhancers while the last five are non-enhancers.
100+
101+
3. In the main directory run EnhancerDetector.py
102+
For Human
103+
> python EnhancerDetector.py --species human --input input_human.fa --outdir Output
104+
105+
For Mouse
106+
> python EnhancerDetector.py --species mouse --input input_mouse.fa --outdir Output
107+
108+
For Fly
109+
> python EnhancerDetector.py --species fly --input input_fly.fa --outdir Output
110+
111+
If you want to generate a CAM output for each sequence run:
112+
For Human
113+
> python EnhancerDetector.py --species human --input input_human.fa --cam --outdir Output
114+
115+
For Mouse
116+
> python EnhancerDetector.py --species mouse --input input_mouse.fa --cam --outdir Output
117+
118+
For Fly
119+
> python EnhancerDetector.py --species fly --input input_fly.fa --cam --outdir Output
120+
121+
5. Inside the Output folder will be the results and CAM models for each given sequence.
122+
123+
6. If you want to use the jupyter notebook then open EnhancerDetector.ipynb
124+
125+
7. By default the similar_sequences_file should already be set to the human test cases. Change the parameters to the mouse/fly and switch the use_fly to True if using the fly.
126+
127+
8. Set output_cam_pdf to true if you want to generate the CAM models.
128+
129+
9. Run the entire notebook and the outputs will be generated in the Output folder.

0 commit comments

Comments
 (0)