You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Pipeline for predicting ChIP-seq peaks in novel cell types using chromatin accessibility.
@@ -6,6 +12,10 @@ Pipeline for predicting ChIP-seq peaks in novel cell types using chromatin acces
6
12
7
13
Epitome leverages chromatin accessibility (either DNase-seq or ATAC-seq) to predict epigenetic events in a novel cell type of interest. Such epigenetic events include transcription factor binding sites and histone modifications. Epitome computes chromatin accessibility similarity between ENCODE cell types and the novel cell type, and uses this information to transfer known epigentic signal to the novel cell type of interest.
8
14
15
+
# Documentation
16
+
17
+
Epitome documentation is hosted at [readthedocs](https://epitome.readthedocs.io/en/latest/). Documentation for Epitome includes tutorials for creating Epitome datasets, training, testing, and evaluated models.
model = EpitomeModel(dataset, test_celltypes= ["K562"]) # cell line reserved for testing
36
35
37
-
Next, train the model. Here, we train the model for 5000 iterations:
36
+
Next, train the model. Here, we train the model for 5000 batches:
38
37
39
38
.. code:: python
40
39
41
40
model.train(5000)
42
41
43
-
You can then evaluate model performance on held out test cell lines specified in the model declaration. In this case, we will evaluate on K562 on the first 10,000 points.
42
+
Train a Model that Stops Early
43
+
-------------------------------
44
+
If you are not sure how many batches your model should train for or are concerned
45
+
about your model overfitting, you can specify the max_valid_batches parameter when
46
+
initializing the model, which will create a train_validation dataset the size of
47
+
max_valid_batches. This forces the model to validate on the train-validation dataset
48
+
and generate the train-validation loss every 200 training batches. The model may
49
+
stop training early (before max_train_batches) if the model's train-validation
50
+
losses stop improving during training. Else, the model will continue to train
51
+
until max_train_batches.
52
+
53
+
First, we have created a model that has a train-validation set size of 1000:
54
+
55
+
.. code:: python
56
+
57
+
model = EpitomeModel(dataset,
58
+
test_celltypes= ["K562"], # cell line reserved for testing
59
+
max_valid_batches=1000) # train_validation set size reserved while training
60
+
61
+
Next, we train the model for a maximum of 5000 batches. If the train-validation
62
+
loss stops improving, the model will stop training early:
Finally, you can evaluate model performance on held out test cell lines specified
94
+
in the model declaration. In this case, we will evaluate on K562 on the first 10,000 points.
44
95
45
96
.. code:: python
46
97
47
98
results = model.test(10000,
48
99
mode= Dataset.TEST,
49
100
calculate_metrics=True)
50
101
51
-
The output of `results` will contain the predictions and truth values, a dictionary of assay specific performance metrics, and the average auROC and auPRC across all evaluated assays.
102
+
The output of `results` will contain the predictions and truth values, a dictionary
103
+
of assay specific performance metrics, and the average auROC and auPRC across all
0 commit comments