Skip to content

Commit 4ae3959

Browse files
committed
all metrics is drafted
1 parent 20df013 commit 4ae3959

12 files changed

Lines changed: 98 additions & 55 deletions

File tree

dockers/dictys_0/Dockerfile

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# Lingfei Wang, 2022-2023. All rights reserved.
2+
FROM continuumio/miniconda3
3+
USER root
4+
SHELL ["/bin/bash", "-c"]
5+
6+
#System update
7+
RUN DEBIAN_FRONTEND=noninteractive apt-get update \
8+
&& DEBIAN_FRONTEND=noninteractive apt-get upgrade -y \
9+
&& DEBIAN_FRONTEND=noninteractive apt-get install -y \
10+
curl gawk man pkg-config python3 python3-pip git wget zip unzip xzip \
11+
awscli gzip samtools tabix \
12+
&& rm -Rf /var/lib/apt/lists/*
13+
14+
# Install dictys
15+
# Name of conda environment to create
16+
ARG CONDAENV_NAME=dictys
17+
#Commit version to install. If empty, uses local version (./local).
18+
ARG COMMIT_VERSION=master
19+
# Python version
20+
ARG PYTHONVERSION_CONDA=3.9
21+
# CUDA version. When empty, uses CPU instead
22+
ARG CUDAVERSION_CONDA=
23+
COPY local /dictys/local
24+
RUN cd /dictys \
25+
&& if [ "a${COMMIT_VERSION}" != "a" ]; then wget -O install.sh https://raw.githubusercontent.com/pinellolab/dictys/"${COMMIT_VERSION}"/doc/scripts/install.sh; localpath=""; else cp local/doc/scripts/install.sh ./; localpath="/dictys/local"; fi \
26+
&& chmod u+x install.sh \
27+
&& COMMIT_VERSION="${COMMIT_VERSION}" CONDAENV_NAME="${CONDAENV_NAME}" PYTHONVERSION_CONDA="${PYTHONVERSION_CONDA}" CUDAVERSION_CONDA="${CUDAVERSION_CONDA}" LOCAL_VERSION="$localpath" ./install.sh \
28+
&& cd / \
29+
&& rm -Rf /dictys
30+
31+
#Create entry point
32+
RUN echo '#!/bin/bash' > /usr/bin/run_dictys \
33+
&& echo "source activate ${CONDAENV_NAME}" >> /usr/bin/run_dictys \
34+
&& echo 'dictys "$@"' >> /usr/bin/run_dictys \
35+
&& chmod u+x /usr/bin/run_dictys
36+
37+
ENTRYPOINT ["/usr/bin/run_dictys"]
38+
39+
40+
CMD ["/bin/bash"]

dockers/dictys_0/Dockerfile_0

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# Lingfei Wang, 2022-2023. All rights reserved.
2+
FROM continuumio/miniconda3
3+
USER root
4+
SHELL ["/bin/bash", "-c"]
5+
6+
#System update
7+
RUN DEBIAN_FRONTEND=noninteractive apt-get update \
8+
&& DEBIAN_FRONTEND=noninteractive apt-get upgrade -y \
9+
&& DEBIAN_FRONTEND=noninteractive apt-get install -y curl gawk man pkg-config python3 python3-pip git wget zip unzip xzip \
10+
&& rm -Rf /var/lib/apt/lists/*
11+
12+
# Install dictys
13+
# Name of conda environment to create
14+
ARG CONDAENV_NAME=dictys
15+
#Commit version to install. If empty, uses local version (./local).
16+
ARG COMMIT_VERSION=master
17+
# Python version
18+
ARG PYTHONVERSION_CONDA=3.9
19+
# CUDA version. When empty, uses CPU instead
20+
ARG CUDAVERSION_CONDA=
21+
COPY local /dictys/local
22+
RUN cd /dictys \
23+
&& if [ "a${COMMIT_VERSION}" != "a" ]; then wget -O install.sh https://raw.githubusercontent.com/pinellolab/dictys/"${COMMIT_VERSION}"/doc/scripts/install.sh; localpath=""; else cp local/doc/scripts/install.sh ./; localpath="/dictys/local"; fi \
24+
&& chmod u+x install.sh \
25+
&& COMMIT_VERSION="${COMMIT_VERSION}" CONDAENV_NAME="${CONDAENV_NAME}" PYTHONVERSION_CONDA="${PYTHONVERSION_CONDA}" CUDAVERSION_CONDA="${CUDAVERSION_CONDA}" LOCAL_VERSION="$localpath" ./install.sh \
26+
&& cd / \
27+
&& rm -Rf /dictys
28+
29+
#Create entry point
30+
RUN echo '#!/bin/bash' > /usr/bin/run_dictys \
31+
&& echo ". activate ${CONDAENV_NAME}" >> /usr/bin/run_dictys \
32+
&& echo 'dictys "$@"' >> /usr/bin/run_dictys \
33+
&& chmod u+x /usr/bin/run_dictys
34+
35+
#ENTRYPOINT ["/usr/bin/run_dictys"]
36+
37+
38+
CMD ["/bin/bash"]

docs/source/dataset.rst

Lines changed: 7 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,16 @@
11
Datasets
22
========
3-
Here, we explain how to access datasets without installing geneRNIB. The available datasets include **OPSCA, Nakatake, Replogle, Adamson, Norman, Xaira_HCT116, Xaira_HEK293T** and **ParseBioscience**.
4-
It should be noted that three datasets of **Xaira_HCT116, Xaira_HEK293T** and **ParseBioscience** are not added to the initial manuscript yet.
5-
All datasets provide RNA data, while the `OPSCA` dataset also includes ATAC data.
6-
The perturbation signature of these datasets are given below.
7-
You need `awscli` to download the datasets. If you don't have it installed, you can download it from [here](https://aws.amazon.com/cli/). You do not need to sign in to download the datasets.
8-
3+
The list of datasets integrated into geneRNIB is provided below with their perturbation signatures as well as the type of perturbation used in each dataset.
94
.. image:: images/datasets.png
105
:width: 80%
116
:align: center
127
----
138

14-
Downloading the test datasets
15-
---------------------------------------------
9+
All datasets provide RNA data, while the `OPSCA` and `IBD` datasets also includes scATAC data.
1610

11+
You need `awscli` to download the datasets.
1712
.. code-block:: bash
18-
19-
aws s3 sync s3://openproblems-data/resources_test/grn resources_test/ --no-sign-request
20-
21-
This command downloads the data to `resources_test/`. The content of this folder is needed for testing component integration.
22-
13+
pip install awscli
2314
2415
Downloading the main datasets
2516
---------------------------------------------
@@ -30,7 +21,8 @@ Downloading the main datasets
3021
3122
This command downloads the data to `resources/grn_benchmark/`, which is the default directory for geneRNIB for further GRN inference and evaluation.
3223

33-
Additionally, you will find the `resources/grn_benchmark/prior/` folder, which contains supplementary files such as the list of known transcription factors (TFs). This list is used for GRN inference (causal TF-gene masking) and in the evaluation metrics to include only edges where the source gene is among these TFs. Additional files in this folder, such as those with `consensus` tags, are used in the evaluation metrics to standardize permitted edges per different metric.
24+
Additionally, you will find the `resources/grn_benchmark/prior/` folder, which contains supplementary files such as the list of known transcription factors (TFs).
25+
Files containing `consensus` tags are used in the evaluation metrics to standardize comparisons.
3426

3527
Downloading the extended datasets
3628
-----------------------------
@@ -46,6 +38,7 @@ To download the extended datasets, use:
4638
4739
aws s3 sync s3://openproblems-data/resources/grn/extended_data/ resources/extended_data/ --no-sign-request
4840
41+
4942
Downloading the raw/unprocessed data
5043
--------------------------------
5144

@@ -57,18 +50,6 @@ All previously mentioned datasets are processed versions. To access the raw, unp
5750
5851
We have not provided raw data for a few recent datasets due to very large file sizes. Pls contact us if you need the raw data for these datasets.
5952

60-
Downloading the GRN models
61-
---------------------------------------------
62-
To download the GRN models used in geneRNIB, run:
63-
64-
.. code-block:: bash
65-
66-
aws s3 sync s3://openproblems-data/resources/grn/grn_models resources/grn_models/ --no-sign-request
67-
68-
These models are not necessarily the updated models as we are currently making changes to the results. To obtain a specific model,
69-
you should run the inference method or reach out to us for the latest model.
70-
71-
7253
Downloading the results
7354
---------------------------------------------
7455
To download the results of geneRNIB (needed for the leaderboard and the paper):

docs/source/evaluation.rst

Lines changed: 6 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -3,45 +3,31 @@ GRN evaluation
33
=================
44
The evaluation metrics used in geneRNIB are summarized below. For a detailed description of each metric, refer to the geneRNIB paper.
55

6-
We originally defined **eight evaluation metrics**, grouped into three categories: **Regression 1, Regression 2, and Wasserstein Distance**.
7-
However, we recently removed **Regression 1** as it did not prove to be effective for perturbational settings.
8-
9-
- The **regression-based metrics** assess the predictive power of an inferred GRN by using regression models to predict perturbation data (evaluation data) based on the feature space constructed from the inferred network.
10-
- The **Wasserstein distance-based metric** evaluates GRN edges by measuring the distributional shift in target gene expression between observations and perturbation data for a given transcription factor (TF).
11-
12-
Wasserstein distance-based metrics are only applicable for datasets that are gene perturbations and are in single cell format. Thus, currently the following datasets are supported:
13-
- Replogle
14-
- Xaira:HEK293T
15-
- Xaira:HCT116
16-
- Norman
17-
- Adamson
6+
187

198
.. image:: images/metrics.png
209
:width: 90%
2110
:align: center
2211
----
2312

24-
The evaluation metrics expect the inferred network to be in the form of an AnnData object with specific format as explained here. It should be noted that the metric currently evaluate only the **top TF-gene pairs**, currently limited to **50,000 edges**, ranked by their assigned weight.
13+
The evaluation metrics expect the inferred network to be in the form of an AnnData object with specific format as explained here.
14+
It should be noted that the metric currently evaluate only the **top TF-gene pairs**, currently limited to **50,000 edges**, ranked by their assigned weight.
2515

2616
The inferred network should have a tabular format with the following columns:
2717

2818
- `source`: TF gene name
2919
- `target`: Target gene gene
3020
- `weight`: Regulatory importance/likelihood score/etc.
3121

32-
See `resources_test/grn_models/op/collectri.h5ad` for an example of the expected format.
33-
34-
For the regression based approaches, we used the pseudobulk version of the perturbation data while for the Wasserstein distance, the single cell data are used.
22+
See `resources/grn_benchmark/prior/collectri.h5ad` for an example of the expected format.
3523

36-
It should be noted that for Wasserstein distance, we have already computed all possible combination of TF-gene pairs and stored it in the `resources/grn_benchmark/prior/` folder.
37-
This substantially reduces the computation time during evaluation.
3824

3925
To run the evalution for a given GRN and dataset, use the following command:
4026
```bash
41-
bash scripts/run_grn_evaluation.sh --prediction=<inferred GRN (e.g.collectri.h5ad)> --save_dir=<e.g.output/> --dataset=<e.g. replogle> --build_images=<true or false. true for the first time running> --run_test=<true or false. true to run on test data>
27+
bash scripts/run_grn_evaluation.sh --prediction=<inferred GRN (e.g.collectri.h5ad)> --save_dir=<e.g.output/> --dataset=<e.g. replogle> --build_images=<true or false. true for the first time running>
4228
```
4329

4430
example command:
4531
```bash
46-
bash scripts/run_grn_evaluation.sh --prediction=resources/grn_models/op/collectri.h5ad --save_dir=output/ --dataset=op --build_images=true --test_run=false
32+
bash scripts/run_grn_evaluation.sh --prediction=resources/grn_models/op/collectri.h5ad --save_dir=output/ --dataset=op --build_images=true
4733
```

docs/source/images/datasets.png

-992 KB
Loading

docs/source/images/metrics.png

651 KB
Loading

docs/source/index.rst

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ To see the comparitive performance of the integrated GRN inference methods, refe
2828
:align: center
2929
----
3030

31+
3132
Pls see the GitHub page for the list of currently integrated methods. The methods are implemented in Python and R, and they can be used to infer GRNs from the datasets provided by geneRNIB.
3233

3334
In addition, three baseline methods are integrated into geneRNIB. These methods are used to evaluate the performance of new methods. The baseline methods are:
@@ -53,9 +54,6 @@ In addition, three baseline methods are integrated into geneRNIB. These methods
5354
.. - author
5455
5556
56-
.. note::
57-
58-
This project is under active development and this documentation is still a draft.
5957
6058
Contents
6159
--------

src/metrics/all_metrics/config.vsh.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ engines:
2121
- type: python
2222
packages: [ lightgbm==4.3.0, numpy==1.26.4 , tqdm_joblib==0.0.5]
2323

24+
2425
runners:
2526
- type: executable
2627
- type: nextflow

src/metrics/all_metrics/script.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,8 +36,8 @@
3636
sys.path.append(meta["util_dir"])
3737
sys.path.append(meta["resources_dir"])
3838
print(meta["resources_dir"])
39-
from helper_ws_distance import main as main_reg2
40-
aaa
39+
from helper_ws_distance import main as main_ws_distance
40+
4141
from helper import main_all
4242

4343
from util import parse_args, format_save_score

0 commit comments

Comments
 (0)