Skip to content

Commit 66b19dd

Browse files
committed
raw scores added to the docs
1 parent 65e6781 commit 66b19dd

13 files changed

Lines changed: 274 additions & 234 deletions

File tree

README.md

Lines changed: 90 additions & 137 deletions
Original file line numberDiff line numberDiff line change
@@ -6,34 +6,29 @@ This file is automatically generated from the tasks's api/*.yaml files.
66
Do not edit this file directly.
77
-->
88

9-
Benchmarking GRN inference methods
9+
Article: [geneRNIB: a living benchmark for gene regulatory network
10+
inference](https://www.biorxiv.org/content/10.1101/2025.02.25.640181v1)
1011

11-
<!-- Leaderboard:
12-
[Performance comparision](https://add-grn--openproblems.netlify.app/results/grn_inference/) -->
13-
14-
Check for [performance comparision](https://github.com/janursa/grn_benchmark/blob/main/notebooks/process_results.ipynb) of integrated GRN inference methods.
15-
16-
Article: [geneRNIB: a living benchmark for gene regulatory network inference](https://www.biorxiv.org/content/10.1101/2025.02.25.640181v1)
17-
18-
Documentation:
12+
Documentation:
1913
[geneRNBI-doc](https://genernib-documentation.readthedocs.io/en/latest/)
2014

21-
2215
Repository:
2316
[openproblems-bio/task_grn_inference](https://github.com/openproblems-bio/task_grn_inference)
2417

2518
If you use this framework, please cite
2619

27-
```
28-
@article{nourisa2025genernib,
29-
title={geneRNIB: a living benchmark for gene regulatory network inference},
30-
author={Nourisa, Jalil and Passemiers, Antoine and Stock, Marco and Zeller-Plumhoff, Berit and Cannoodt, Robrecht and Arnold, Christian and Tong, Alexander and Hartford, Jason and Scialdone, Antonio and Moreau, Yves and others},
31-
journal={bioRxiv},
32-
pages={2025--02},
33-
year={2025},
34-
publisher={Cold Spring Harbor Laboratory}
35-
}
36-
```
20+
@article{nourisa2025genernib,
21+
title={geneRNIB: a living benchmark for gene regulatory network inference},
22+
author={Nourisa, Jalil and Passemiers, Antoine and Stock, Marco and Zeller-Plumhoff, Berit and Cannoodt, Robrecht and Arnold, Christian and Tong, Alexander and Hartford, Jason and Scialdone, Antonio and Moreau, Yves and others},
23+
journal={bioRxiv},
24+
pages={2025--02},
25+
year={2025},
26+
publisher={Cold Spring Harbor Laboratory}
27+
}
28+
29+
Repository:
30+
[openproblems-bio/task_grn_inference](https://github.com/openproblems-bio/task_grn_inference)
31+
3732
## Description
3833

3934
geneRNIB is a living benchmark platform for GRN inference. This platform
@@ -49,96 +44,41 @@ are re-assessed, and the leaderboard is updated accordingly. The aim is
4944
to evaluate both the accuracy and completeness of inferred GRNs. It is
5045
designed for both single-modality and multi-omics GRN inference.
5146

52-
## Installation
53-
54-
Install Docker, Java, and Viash using
55-
[these instructions](https://openproblems.bio/documentation/fundamentals/requirements).
56-
57-
## Download resources
58-
```bash
59-
git clone --recursive git@github.com:openproblems-bio/task_grn_inference.git
60-
61-
cd task_grn_inference
62-
```
63-
To interact with the framework,download the resources containing necessary inferene and evaluation datasets.
64-
65-
```bash
66-
pip install awscli
67-
aws s3 sync s3://openproblems-data/resources/grn/grn_benchmark resources/grn_benchmark --no-sign-request
68-
69-
```
70-
71-
72-
## Run a GRN inference method
73-
74-
To infer a GRN for a given dataset (e.g. `op`) using simple Pearson correlation:
75-
76-
```bash
77-
viash run src/methods/pearson_corr/config.vsh.yaml -- \
78-
--rna resources/grn_benchmark/inference_data/op_rna.h5ad \
79-
--tf_all resources/grn_benchmark/prior/tf_all.csv \
80-
--prediction output/net.h5ad
81-
```
82-
83-
## Evaluate a GRN model
84-
85-
```bash
86-
bash scripts/run_grn_evaluation.sh \
87-
--prediction=output/net.h5ad \
88-
--dataset=op \
89-
--build_images=true \
90-
--save_dir=output
91-
```
92-
`build_images` only needed for the first run.
93-
94-
This outputs the scores into `output/score_uns.yaml`.
95-
96-
97-
## Add a GRN inference method, evaluation metric, or dataset
98-
99-
To add a new component to the repository, follow the [Documentation](https://genernib-documentation.readthedocs.io/en/latest/).
100-
101-
## Run the entire pipline
102-
103-
Run `scripts/run_all.sh` for the entire pipeline. Due to resource intensive nature of the task, we have splitted the pipeline into two steps of GRN inference and evaluation.
104-
10547
## Authors & contributors
10648

10749
| name | roles |
10850
|:------------------|:------------|
10951
| Jalil Nourisa | author |
11052
| Robrecht Cannoodt | author |
111-
| Antoine Passimier | contributor |
11253
| Jérémie Kalfon | contributor |
54+
| Antoine Passimier | contributor |
11355
| Marco Stock | contributor |
11456
| Christian Arnold | contributor |
11557

116-
11758
## API
11859

11960
``` mermaid
12061
flowchart TB
12162
file_atac_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-chromatin-accessibility-data'>chromatin accessibility data</a>")
12263
comp_method[/"<a href='https://github.com/openproblems-bio/task_grn_inference#component-type-method'>method</a>"/]
12364
file_prediction_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-grn-prediction'>GRN prediction</a>")
124-
comp_metric_regression[/"<a href='https://github.com/openproblems-bio/task_grn_inference#component-type-feature-based-metrics'>feature-based metrics</a>"/]
125-
comp_metric_ws[/"<a href='https://github.com/openproblems-bio/task_grn_inference#component-type-wasserstein-distance-metrics'>Wasserstein distance metrics</a>"/]
12665
comp_metric[/"<a href='https://github.com/openproblems-bio/task_grn_inference#component-type-metrics'>metrics</a>"/]
12766
file_score_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-score'>score</a>")
12867
file_evaluation_bulk_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-perturbation-data--pseudo-bulk'>perturbation data (pseudo)bulk</a>")
68+
file_evaluation_de_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-perturbation-data-differential-expression'>perturbation data differential expression</a>")
12969
file_evaluation_sc_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-perturbation-data--sc-'>perturbation data (sc)</a>")
13070
file_rna_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-gene-expression-data'>gene expression data</a>")
71+
comp_control_method[/"<a href='https://github.com/openproblems-bio/task_grn_inference#component-type-control-method'>Control Method</a>"/]
13172
file_atac_h5ad-.-comp_method
13273
comp_method-.->file_prediction_h5ad
133-
file_prediction_h5ad---comp_metric_regression
134-
file_prediction_h5ad---comp_metric_ws
13574
file_prediction_h5ad---comp_metric
136-
comp_metric_regression-->file_score_h5ad
137-
comp_metric_ws-->file_score_h5ad
13875
comp_metric-->file_score_h5ad
139-
file_evaluation_bulk_h5ad---comp_metric_regression
140-
file_evaluation_sc_h5ad-.-comp_metric_ws
76+
file_evaluation_bulk_h5ad-.-comp_metric
77+
file_evaluation_de_h5ad-.-comp_metric
78+
file_evaluation_sc_h5ad-.-comp_metric
14179
file_rna_h5ad---comp_method
80+
file_rna_h5ad---comp_control_method
81+
comp_control_method-.->file_prediction_h5ad
14282
```
14383

14484
## File format: chromatin accessibility data
@@ -189,12 +129,12 @@ Arguments:
189129
| `--prediction` | `file` | (*Optional, Output*) File indicating the inferred GRN. |
190130
| `--tf_all` | `file` | NA. Default: `resources_test/grn_benchmark/prior/tf_all.csv`. |
191131
| `--max_n_links` | `integer` | (*Optional*) NA. Default: `50000`. |
192-
| `--num_workers` | `integer` | (*Optional*) NA. Default: `20`. |
132+
| `--num_workers` | `integer` | (*Optional*) NA. Default: `2`. |
193133
| `--temp_dir` | `string` | (*Optional*) NA. Default: `output/temdir`. |
194-
| `--layer` | `string` | (*Optional*) NA. Default: `X_norm`. |
134+
| `--layer` | `string` | (*Optional*) NA. Default: `lognorm`. |
195135
| `--seed` | `integer` | (*Optional*) NA. Default: `32`. |
196136
| `--dataset_id` | `string` | (*Optional*) NA. Default: `op`. |
197-
| `--is_test` | `boolean` | (*Optional*) NA. Default: `FALSE`. |
137+
| `--apply_tf_methods` | `boolean` | (*Optional*) NA. Default: `TRUE`. |
198138

199139
</div>
200140

@@ -225,9 +165,9 @@ Data structure:
225165

226166
</div>
227167

228-
## Component type: feature-based metrics
168+
## Component type: metrics
229169

230-
A regression metric to evaluate the performance of the inferred GRN
170+
A metric to evaluate the performance of the inferred GRN
231171

232172
Arguments:
233173

@@ -236,78 +176,63 @@ Arguments:
236176
| Name | Type | Description |
237177
|:---|:---|:---|
238178
| `--prediction` | `file` | File indicating the inferred GRN. |
179+
| `--evaluation_data` | `file` | (*Optional*) Perturbation dataset for benchmarking. |
180+
| `--evaluation_data_sc` | `file` | (*Optional*) Perturbation dataset for benchmarking (sinlge cell). |
181+
| `--evaluation_data_de` | `file` | (*Optional*) Perturbation dataset for benchmarking (differential expression). |
239182
| `--score` | `file` | (*Output*) File indicating the score of a metric. |
240-
| `--layer` | `string` | (*Optional*) NA. Default: `X_norm`. |
183+
| `--layer` | `string` | (*Optional*) NA. Default: `lognorm`. |
241184
| `--max_n_links` | `integer` | (*Optional*) NA. Default: `50000`. |
242-
| `--verbose` | `integer` | (*Optional*) NA. Default: `2`. |
185+
| `--tf_all` | `file` | (*Optional*) NA. |
243186
| `--num_workers` | `integer` | (*Optional*) NA. Default: `20`. |
244187
| `--apply_tf` | `boolean` | (*Optional*) NA. Default: `TRUE`. |
245-
| `--apply_skeleton` | `boolean` | (*Optional*) NA. Default: `FALSE`. |
246-
| `--skeleton` | `file` | (*Optional*) NA. |
247-
| `--evaluation_data` | `file` | Perturbation dataset for benchmarking. |
248-
| `--tf_all` | `file` | NA. |
188+
| `--regulators_consensus` | `file` | (*Optional*) NA. |
249189
| `--reg_type` | `string` | (*Optional*) NA. Default: `ridge`. |
250190

251191
</div>
252192

253-
## Component type: Wasserstein distance metrics
193+
## File format: score
254194

255-
A Wasserstein distance based metric to evaluate the performance of the
256-
inferred GRN
195+
File indicating the score of a metric.
257196

258-
Arguments:
197+
Example file: `resources_test/scores/score.h5ad`
198+
199+
Format:
259200

260201
<div class="small">
261202

262-
| Name | Type | Description |
263-
|:---|:---|:---|
264-
| `--prediction` | `file` | File indicating the inferred GRN. |
265-
| `--score` | `file` | (*Output*) File indicating the score of a metric. |
266-
| `--layer` | `string` | (*Optional*) NA. Default: `X_norm`. |
267-
| `--max_n_links` | `integer` | (*Optional*) NA. Default: `50000`. |
268-
| `--verbose` | `integer` | (*Optional*) NA. Default: `2`. |
269-
| `--num_workers` | `integer` | (*Optional*) NA. Default: `20`. |
270-
| `--apply_tf` | `boolean` | (*Optional*) NA. Default: `TRUE`. |
271-
| `--apply_skeleton` | `boolean` | (*Optional*) NA. Default: `FALSE`. |
272-
| `--skeleton` | `file` | (*Optional*) NA. |
273-
| `--evaluation_data_sc` | `file` | (*Optional*) Perturbation dataset for benchmarking (sinlge cell). |
203+
AnnData object
204+
uns: 'dataset_id', 'method_id', 'metric_ids', 'metric_values'
274205

275206
</div>
276207

277-
## Component type: metrics
278-
279-
A metric to evaluate the performance of the inferred GRN
280-
281-
Arguments:
208+
Data structure:
282209

283210
<div class="small">
284211

285-
| Name | Type | Description |
212+
| Slot | Type | Description |
286213
|:---|:---|:---|
287-
| `--prediction` | `file` | File indicating the inferred GRN. |
288-
| `--score` | `file` | (*Output*) File indicating the score of a metric. |
289-
| `--layer` | `string` | (*Optional*) NA. Default: `X_norm`. |
290-
| `--max_n_links` | `integer` | (*Optional*) NA. Default: `50000`. |
291-
| `--verbose` | `integer` | (*Optional*) NA. Default: `2`. |
292-
| `--num_workers` | `integer` | (*Optional*) NA. Default: `20`. |
293-
| `--apply_tf` | `boolean` | (*Optional*) NA. Default: `TRUE`. |
294-
| `--apply_skeleton` | `boolean` | (*Optional*) NA. Default: `FALSE`. |
295-
| `--skeleton` | `file` | (*Optional*) NA. |
214+
| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
215+
| `uns["method_id"]` | `string` | A unique identifier for the method. |
216+
| `uns["metric_ids"]` | `string` | One or more unique metric identifiers. |
217+
| `uns["metric_values"]` | `double` | The metric values obtained for the given prediction. Must be of same length as ‘metric_ids’. |
296218

297219
</div>
298220

299-
## File format: score
221+
## File format: perturbation data (pseudo)bulk
300222

301-
File indicating the score of a metric.
223+
Perturbation dataset for benchmarking
302224

303-
Example file: `resources_test/scores/score.h5ad`
225+
Example file:
226+
`resources_test/grn_benchmark/evaluation_data/op_bulk.h5ad`
304227

305228
Format:
306229

307230
<div class="small">
308231

309232
AnnData object
310-
uns: 'dataset_id', 'method_id', 'metric_ids', 'metric_values'
233+
obs: 'cell_type', 'perturbation', 'donor_id', 'perturbation_type'
234+
layers: 'X_norm'
235+
uns: 'dataset_id', 'dataset_name', 'dataset_summary', 'dataset_organism', 'normalization_id'
311236

312237
</div>
313238

@@ -317,27 +242,32 @@ Data structure:
317242

318243
| Slot | Type | Description |
319244
|:---|:---|:---|
245+
| `obs["cell_type"]` | `string` | The annotated cell type of each cell based on RNA expression. |
246+
| `obs["perturbation"]` | `string` | Name of the column containing perturbation names. |
247+
| `obs["donor_id"]` | `string` | (*Optional*) Donor id. |
248+
| `obs["perturbation_type"]` | `string` | (*Optional*) Name of the column indicating perturbation type. |
249+
| `layers["X_norm"]` | `double` | Normalized values. |
320250
| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
321-
| `uns["method_id"]` | `string` | A unique identifier for the method. |
322-
| `uns["metric_ids"]` | `string` | One or more unique metric identifiers. |
323-
| `uns["metric_values"]` | `double` | The metric values obtained for the given prediction. Must be of same length as ‘metric_ids’. |
251+
| `uns["dataset_name"]` | `string` | Nicely formatted name. |
252+
| `uns["dataset_summary"]` | `string` | Short description of the dataset. |
253+
| `uns["dataset_organism"]` | `string` | (*Optional*) The organism of the sample in the dataset. |
254+
| `uns["normalization_id"]` | `string` | Which normalization was used. |
324255

325256
</div>
326257

327-
## File format: perturbation data (pseudo)bulk
258+
## File format: perturbation data differential expression
328259

329-
Perturbation dataset for benchmarking
260+
Perturbation dataset for benchmarking (differential expression)
330261

331262
Example file:
332-
`resources_test/grn_benchmark/evaluation_data/op_bulk.h5ad`
263+
`resources_test/grn_benchmark/evaluation_data/replogle_de.h5ad`
333264

334265
Format:
335266

336267
<div class="small">
337268

338269
AnnData object
339270
obs: 'cell_type', 'perturbation', 'donor_id', 'perturbation_type'
340-
layers: 'X_norm'
341271
uns: 'dataset_id', 'dataset_name', 'dataset_summary', 'dataset_organism', 'normalization_id'
342272

343273
</div>
@@ -352,7 +282,6 @@ Data structure:
352282
| `obs["perturbation"]` | `string` | Name of the column containing perturbation names. |
353283
| `obs["donor_id"]` | `string` | (*Optional*) Donor id. |
354284
| `obs["perturbation_type"]` | `string` | (*Optional*) Name of the column indicating perturbation type. |
355-
| `layers["X_norm"]` | `double` | Normalized values. |
356285
| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
357286
| `uns["dataset_name"]` | `string` | Nicely formatted name. |
358287
| `uns["dataset_summary"]` | `string` | Short description of the dataset. |
@@ -433,3 +362,27 @@ Data structure:
433362

434363
</div>
435364

365+
## Component type: Control Method
366+
367+
Quality control methods for verifying the pipeline.
368+
369+
Arguments:
370+
371+
<div class="small">
372+
373+
| Name | Type | Description |
374+
|:---|:---|:---|
375+
| `--rna` | `file` | RNA expression data. |
376+
| `--rna_all` | `file` | (*Optional*) RNA expression data that contains all variability. Only used for positive control. |
377+
| `--prediction` | `file` | (*Optional, Output*) File indicating the inferred GRN. |
378+
| `--tf_all` | `file` | NA. Default: `resources_test/grn_benchmark/prior/tf_all.csv`. |
379+
| `--max_n_links` | `integer` | (*Optional*) NA. Default: `50000`. |
380+
| `--num_workers` | `integer` | (*Optional*) NA. Default: `20`. |
381+
| `--temp_dir` | `string` | (*Optional*) NA. Default: `output/temdir`. |
382+
| `--layer` | `string` | (*Optional*) NA. Default: `lognorm`. |
383+
| `--seed` | `integer` | (*Optional*) NA. Default: `32`. |
384+
| `--dataset_id` | `string` | (*Optional*) NA. Default: `op`. |
385+
| `--apply_tf_methods` | `boolean` | (*Optional*) NA. Default: `TRUE`. |
386+
387+
</div>
388+

0 commit comments

Comments
 (0)