You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
curl gawk man pkg-config python3 python3-pip git wget zip unzip xzip \
11
+
awscli gzip samtools tabix \
12
+
&& rm -Rf /var/lib/apt/lists/*
13
+
14
+
# Install dictys
15
+
# Name of conda environment to create
16
+
ARG CONDAENV_NAME=dictys
17
+
#Commit version to install. If empty, uses local version (./local).
18
+
ARG COMMIT_VERSION=master
19
+
# Python version
20
+
ARG PYTHONVERSION_CONDA=3.9
21
+
# CUDA version. When empty, uses CPU instead
22
+
ARG CUDAVERSION_CONDA=
23
+
COPY local /dictys/local
24
+
RUN cd /dictys \
25
+
&& if [ "a${COMMIT_VERSION}" != "a" ]; then wget -O install.sh https://raw.githubusercontent.com/pinellolab/dictys/"${COMMIT_VERSION}"/doc/scripts/install.sh; localpath=""; else cp local/doc/scripts/install.sh ./; localpath="/dictys/local"; fi \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y curl gawk man pkg-config python3 python3-pip git wget zip unzip xzip \
10
+
&& rm -Rf /var/lib/apt/lists/*
11
+
12
+
# Install dictys
13
+
# Name of conda environment to create
14
+
ARG CONDAENV_NAME=dictys
15
+
#Commit version to install. If empty, uses local version (./local).
16
+
ARG COMMIT_VERSION=master
17
+
# Python version
18
+
ARG PYTHONVERSION_CONDA=3.9
19
+
# CUDA version. When empty, uses CPU instead
20
+
ARG CUDAVERSION_CONDA=
21
+
COPY local /dictys/local
22
+
RUN cd /dictys \
23
+
&& if [ "a${COMMIT_VERSION}" != "a" ]; then wget -O install.sh https://raw.githubusercontent.com/pinellolab/dictys/"${COMMIT_VERSION}"/doc/scripts/install.sh; localpath=""; else cp local/doc/scripts/install.sh ./; localpath="/dictys/local"; fi \
Copy file name to clipboardExpand all lines: docs/source/dataset.rst
+7-26Lines changed: 7 additions & 26 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,25 +1,16 @@
1
1
Datasets
2
2
========
3
-
Here, we explain how to access datasets without installing geneRNIB. The available datasets include **OPSCA, Nakatake, Replogle, Adamson, Norman, Xaira_HCT116, Xaira_HEK293T** and **ParseBioscience**.
4
-
It should be noted that three datasets of **Xaira_HCT116, Xaira_HEK293T** and **ParseBioscience** are not added to the initial manuscript yet.
5
-
All datasets provide RNA data, while the `OPSCA` dataset also includes ATAC data.
6
-
The perturbation signature of these datasets are given below.
7
-
You need `awscli` to download the datasets. If you don't have it installed, you can download it from [here](https://aws.amazon.com/cli/). You do not need to sign in to download the datasets.
8
-
3
+
The list of datasets integrated into geneRNIB is provided below with their perturbation signatures as well as the type of perturbation used in each dataset.
9
4
.. image:: images/datasets.png
10
5
:width:80%
11
6
:align:center
12
7
----
13
8
14
-
Downloading the test datasets
15
-
---------------------------------------------
9
+
All datasets provide RNA data, while the `OPSCA` and `IBD` datasets also includes scATAC data.
This command downloads the data to `resources_test/`. The content of this folder is needed for testing component integration.
22
-
13
+
pip install awscli
23
14
24
15
Downloading the main datasets
25
16
---------------------------------------------
@@ -30,7 +21,8 @@ Downloading the main datasets
30
21
31
22
This command downloads the data to `resources/grn_benchmark/`, which is the default directory for geneRNIB for further GRN inference and evaluation.
32
23
33
-
Additionally, you will find the `resources/grn_benchmark/prior/` folder, which contains supplementary files such as the list of known transcription factors (TFs). This list is used for GRN inference (causal TF-gene masking) and in the evaluation metrics to include only edges where the source gene is among these TFs. Additional files in this folder, such as those with `consensus` tags, are used in the evaluation metrics to standardize permitted edges per different metric.
24
+
Additionally, you will find the `resources/grn_benchmark/prior/` folder, which contains supplementary files such as the list of known transcription factors (TFs).
25
+
Files containing `consensus` tags are used in the evaluation metrics to standardize comparisons.
34
26
35
27
Downloading the extended datasets
36
28
-----------------------------
@@ -46,6 +38,7 @@ To download the extended datasets, use:
Copy file name to clipboardExpand all lines: docs/source/evaluation.rst
+6-20Lines changed: 6 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,45 +3,31 @@ GRN evaluation
3
3
=================
4
4
The evaluation metrics used in geneRNIB are summarized below. For a detailed description of each metric, refer to the geneRNIB paper.
5
5
6
-
We originally defined **eight evaluation metrics**, grouped into three categories: **Regression 1, Regression 2, and Wasserstein Distance**.
7
-
However, we recently removed **Regression 1** as it did not prove to be effective for perturbational settings.
8
-
9
-
- The **regression-based metrics** assess the predictive power of an inferred GRN by using regression models to predict perturbation data (evaluation data) based on the feature space constructed from the inferred network.
10
-
- The **Wasserstein distance-based metric** evaluates GRN edges by measuring the distributional shift in target gene expression between observations and perturbation data for a given transcription factor (TF).
11
-
12
-
Wasserstein distance-based metrics are only applicable for datasets that are gene perturbations and are in single cell format. Thus, currently the following datasets are supported:
13
-
- Replogle
14
-
- Xaira:HEK293T
15
-
- Xaira:HCT116
16
-
- Norman
17
-
- Adamson
6
+
18
7
19
8
.. image:: images/metrics.png
20
9
:width:90%
21
10
:align:center
22
11
----
23
12
24
-
The evaluation metrics expect the inferred network to be in the form of an AnnData object with specific format as explained here. It should be noted that the metric currently evaluate only the **top TF-gene pairs**, currently limited to **50,000 edges**, ranked by their assigned weight.
13
+
The evaluation metrics expect the inferred network to be in the form of an AnnData object with specific format as explained here.
14
+
It should be noted that the metric currently evaluate only the **top TF-gene pairs**, currently limited to **50,000 edges**, ranked by their assigned weight.
25
15
26
16
The inferred network should have a tabular format with the following columns:
See `resources_test/grn_models/op/collectri.h5ad` for an example of the expected format.
33
-
34
-
For the regression based approaches, we used the pseudobulk version of the perturbation data while for the Wasserstein distance, the single cell data are used.
22
+
See `resources/grn_benchmark/prior/collectri.h5ad` for an example of the expected format.
35
23
36
-
It should be noted that for Wasserstein distance, we have already computed all possible combination of TF-gene pairs and stored it in the `resources/grn_benchmark/prior/` folder.
37
-
This substantially reduces the computation time during evaluation.
38
24
39
25
To run the evalution for a given GRN and dataset, use the following command:
40
26
```bash
41
-
bash scripts/run_grn_evaluation.sh --prediction=<inferred GRN (e.g.collectri.h5ad)> --save_dir=<e.g.output/> --dataset=<e.g. replogle> --build_images=<true or false. true for the first time running> --run_test=<true or false. true to run on test data>
27
+
bash scripts/run_grn_evaluation.sh --prediction=<inferred GRN (e.g.collectri.h5ad)> --save_dir=<e.g.output/> --dataset=<e.g. replogle> --build_images=<true or false. true for the first time running>
Copy file name to clipboardExpand all lines: docs/source/index.rst
+1-3Lines changed: 1 addition & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -28,6 +28,7 @@ To see the comparitive performance of the integrated GRN inference methods, refe
28
28
:align:center
29
29
----
30
30
31
+
31
32
Pls see the GitHub page for the list of currently integrated methods. The methods are implemented in Python and R, and they can be used to infer GRNs from the datasets provided by geneRNIB.
32
33
33
34
In addition, three baseline methods are integrated into geneRNIB. These methods are used to evaluate the performance of new methods. The baseline methods are:
@@ -53,9 +54,6 @@ In addition, three baseline methods are integrated into geneRNIB. These methods
53
54
.. - author
54
55
55
56
56
-
.. note::
57
-
58
-
This project is under active development and this documentation is still a draft.
0 commit comments