@@ -6,34 +6,29 @@ This file is automatically generated from the tasks's api/*.yaml files.
66Do not edit this file directly.
77-->
88
9- Benchmarking GRN inference methods
9+ Article: [ geneRNIB: a living benchmark for gene regulatory network
10+ inference] ( https://www.biorxiv.org/content/10.1101/2025.02.25.640181v1 )
1011
11- <!-- Leaderboard:
12- [Performance comparision](https://add-grn--openproblems.netlify.app/results/grn_inference/) -->
13-
14- Check for [ performance comparision] ( https://github.com/janursa/grn_benchmark/blob/main/notebooks/process_results.ipynb ) of integrated GRN inference methods.
15-
16- Article: [ geneRNIB: a living benchmark for gene regulatory network inference] ( https://www.biorxiv.org/content/10.1101/2025.02.25.640181v1 )
17-
18- Documentation:
12+ Documentation:
1913[ geneRNBI-doc] ( https://genernib-documentation.readthedocs.io/en/latest/ )
2014
21-
2215Repository:
2316[ openproblems-bio/task_grn_inference] ( https://github.com/openproblems-bio/task_grn_inference )
2417
2518If you use this framework, please cite
2619
27- ```
28- @article{nourisa2025genernib,
29- title={geneRNIB: a living benchmark for gene regulatory network inference},
30- author={Nourisa, Jalil and Passemiers, Antoine and Stock, Marco and Zeller-Plumhoff, Berit and Cannoodt, Robrecht and Arnold, Christian and Tong, Alexander and Hartford, Jason and Scialdone, Antonio and Moreau, Yves and others},
31- journal={bioRxiv},
32- pages={2025--02},
33- year={2025},
34- publisher={Cold Spring Harbor Laboratory}
35- }
36- ```
20+ @article{nourisa2025genernib,
21+ title={geneRNIB: a living benchmark for gene regulatory network inference},
22+ author={Nourisa, Jalil and Passemiers, Antoine and Stock, Marco and Zeller-Plumhoff, Berit and Cannoodt, Robrecht and Arnold, Christian and Tong, Alexander and Hartford, Jason and Scialdone, Antonio and Moreau, Yves and others},
23+ journal={bioRxiv},
24+ pages={2025--02},
25+ year={2025},
26+ publisher={Cold Spring Harbor Laboratory}
27+ }
28+
29+ Repository:
30+ [ openproblems-bio/task_grn_inference] ( https://github.com/openproblems-bio/task_grn_inference )
31+
3732## Description
3833
3934geneRNIB is a living benchmark platform for GRN inference. This platform
@@ -49,96 +44,41 @@ are re-assessed, and the leaderboard is updated accordingly. The aim is
4944to evaluate both the accuracy and completeness of inferred GRNs. It is
5045designed for both single-modality and multi-omics GRN inference.
5146
52- ## Installation
53-
54- Install Docker, Java, and Viash using
55- [ these instructions] ( https://openproblems.bio/documentation/fundamentals/requirements ) .
56-
57- ## Download resources
58- ``` bash
59- git clone --recursive git@github.com:openproblems-bio/task_grn_inference.git
60-
61- cd task_grn_inference
62- ```
63- To interact with the framework,download the resources containing necessary inferene and evaluation datasets.
64-
65- ``` bash
66- pip install awscli
67- aws s3 sync s3://openproblems-data/resources/grn/grn_benchmark resources/grn_benchmark --no-sign-request
68-
69- ```
70-
71-
72- ## Run a GRN inference method
73-
74- To infer a GRN for a given dataset (e.g. ` op ` ) using simple Pearson correlation:
75-
76- ``` bash
77- viash run src/methods/pearson_corr/config.vsh.yaml -- \
78- --rna resources/grn_benchmark/inference_data/op_rna.h5ad \
79- --tf_all resources/grn_benchmark/prior/tf_all.csv \
80- --prediction output/net.h5ad
81- ```
82-
83- ## Evaluate a GRN model
84-
85- ``` bash
86- bash scripts/run_grn_evaluation.sh \
87- --prediction=output/net.h5ad \
88- --dataset=op \
89- --build_images=true \
90- --save_dir=output
91- ```
92- ` build_images ` only needed for the first run.
93-
94- This outputs the scores into ` output/score_uns.yaml ` .
95-
96-
97- ## Add a GRN inference method, evaluation metric, or dataset
98-
99- To add a new component to the repository, follow the [ Documentation] ( https://genernib-documentation.readthedocs.io/en/latest/ ) .
100-
101- ## Run the entire pipline
102-
103- Run ` scripts/run_all.sh ` for the entire pipeline. Due to resource intensive nature of the task, we have splitted the pipeline into two steps of GRN inference and evaluation.
104-
10547## Authors & contributors
10648
10749| name | roles |
10850| :------------------| :------------|
10951| Jalil Nourisa | author |
11052| Robrecht Cannoodt | author |
111- | Antoine Passimier | contributor |
11253| Jérémie Kalfon | contributor |
54+ | Antoine Passimier | contributor |
11355| Marco Stock | contributor |
11456| Christian Arnold | contributor |
11557
116-
11758## API
11859
11960``` mermaid
12061flowchart TB
12162 file_atac_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-chromatin-accessibility-data'>chromatin accessibility data</a>")
12263 comp_method[/"<a href='https://github.com/openproblems-bio/task_grn_inference#component-type-method'>method</a>"/]
12364 file_prediction_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-grn-prediction'>GRN prediction</a>")
124- comp_metric_regression[/"<a href='https://github.com/openproblems-bio/task_grn_inference#component-type-feature-based-metrics'>feature-based metrics</a>"/]
125- comp_metric_ws[/"<a href='https://github.com/openproblems-bio/task_grn_inference#component-type-wasserstein-distance-metrics'>Wasserstein distance metrics</a>"/]
12665 comp_metric[/"<a href='https://github.com/openproblems-bio/task_grn_inference#component-type-metrics'>metrics</a>"/]
12766 file_score_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-score'>score</a>")
12867 file_evaluation_bulk_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-perturbation-data--pseudo-bulk'>perturbation data (pseudo)bulk</a>")
68+ file_evaluation_de_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-perturbation-data-differential-expression'>perturbation data differential expression</a>")
12969 file_evaluation_sc_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-perturbation-data--sc-'>perturbation data (sc)</a>")
13070 file_rna_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-gene-expression-data'>gene expression data</a>")
71+ comp_control_method[/"<a href='https://github.com/openproblems-bio/task_grn_inference#component-type-control-method'>Control Method</a>"/]
13172 file_atac_h5ad-.-comp_method
13273 comp_method-.->file_prediction_h5ad
133- file_prediction_h5ad---comp_metric_regression
134- file_prediction_h5ad---comp_metric_ws
13574 file_prediction_h5ad---comp_metric
136- comp_metric_regression-->file_score_h5ad
137- comp_metric_ws-->file_score_h5ad
13875 comp_metric-->file_score_h5ad
139- file_evaluation_bulk_h5ad---comp_metric_regression
140- file_evaluation_sc_h5ad-.-comp_metric_ws
76+ file_evaluation_bulk_h5ad-.-comp_metric
77+ file_evaluation_de_h5ad-.-comp_metric
78+ file_evaluation_sc_h5ad-.-comp_metric
14179 file_rna_h5ad---comp_method
80+ file_rna_h5ad---comp_control_method
81+ comp_control_method-.->file_prediction_h5ad
14282```
14383
14484## File format: chromatin accessibility data
@@ -189,12 +129,12 @@ Arguments:
189129| ` --prediction ` | ` file ` | (* Optional, Output* ) File indicating the inferred GRN. |
190130| ` --tf_all ` | ` file ` | NA. Default: ` resources_test/grn_benchmark/prior/tf_all.csv ` . |
191131| ` --max_n_links ` | ` integer ` | (* Optional* ) NA. Default: ` 50000 ` . |
192- | ` --num_workers ` | ` integer ` | (* Optional* ) NA. Default: ` 20 ` . |
132+ | ` --num_workers ` | ` integer ` | (* Optional* ) NA. Default: ` 2 ` . |
193133| ` --temp_dir ` | ` string ` | (* Optional* ) NA. Default: ` output/temdir ` . |
194- | ` --layer ` | ` string ` | (* Optional* ) NA. Default: ` X_norm ` . |
134+ | ` --layer ` | ` string ` | (* Optional* ) NA. Default: ` lognorm ` . |
195135| ` --seed ` | ` integer ` | (* Optional* ) NA. Default: ` 32 ` . |
196136| ` --dataset_id ` | ` string ` | (* Optional* ) NA. Default: ` op ` . |
197- | ` --is_test ` | ` boolean ` | (* Optional* ) NA. Default: ` FALSE ` . |
137+ | ` --apply_tf_methods ` | ` boolean ` | (* Optional* ) NA. Default: ` TRUE ` . |
198138
199139</div >
200140
@@ -225,9 +165,9 @@ Data structure:
225165
226166</div >
227167
228- ## Component type: feature-based metrics
168+ ## Component type: metrics
229169
230- A regression metric to evaluate the performance of the inferred GRN
170+ A metric to evaluate the performance of the inferred GRN
231171
232172Arguments:
233173
@@ -236,78 +176,63 @@ Arguments:
236176| Name | Type | Description |
237177| :---| :---| :---|
238178| ` --prediction ` | ` file ` | File indicating the inferred GRN. |
179+ | ` --evaluation_data ` | ` file ` | (* Optional* ) Perturbation dataset for benchmarking. |
180+ | ` --evaluation_data_sc ` | ` file ` | (* Optional* ) Perturbation dataset for benchmarking (sinlge cell). |
181+ | ` --evaluation_data_de ` | ` file ` | (* Optional* ) Perturbation dataset for benchmarking (differential expression). |
239182| ` --score ` | ` file ` | (* Output* ) File indicating the score of a metric. |
240- | ` --layer ` | ` string ` | (* Optional* ) NA. Default: ` X_norm ` . |
183+ | ` --layer ` | ` string ` | (* Optional* ) NA. Default: ` lognorm ` . |
241184| ` --max_n_links ` | ` integer ` | (* Optional* ) NA. Default: ` 50000 ` . |
242- | ` --verbose ` | ` integer ` | (* Optional* ) NA. Default: ` 2 ` . |
185+ | ` --tf_all ` | ` file ` | (* Optional* ) NA. |
243186| ` --num_workers ` | ` integer ` | (* Optional* ) NA. Default: ` 20 ` . |
244187| ` --apply_tf ` | ` boolean ` | (* Optional* ) NA. Default: ` TRUE ` . |
245- | ` --apply_skeleton ` | ` boolean ` | (* Optional* ) NA. Default: ` FALSE ` . |
246- | ` --skeleton ` | ` file ` | (* Optional* ) NA. |
247- | ` --evaluation_data ` | ` file ` | Perturbation dataset for benchmarking. |
248- | ` --tf_all ` | ` file ` | NA. |
188+ | ` --regulators_consensus ` | ` file ` | (* Optional* ) NA. |
249189| ` --reg_type ` | ` string ` | (* Optional* ) NA. Default: ` ridge ` . |
250190
251191</div >
252192
253- ## Component type: Wasserstein distance metrics
193+ ## File format: score
254194
255- A Wasserstein distance based metric to evaluate the performance of the
256- inferred GRN
195+ File indicating the score of a metric.
257196
258- Arguments:
197+ Example file: ` resources_test/scores/score.h5ad `
198+
199+ Format:
259200
260201<div class =" small " >
261202
262- | Name | Type | Description |
263- | :---| :---| :---|
264- | ` --prediction ` | ` file ` | File indicating the inferred GRN. |
265- | ` --score ` | ` file ` | (* Output* ) File indicating the score of a metric. |
266- | ` --layer ` | ` string ` | (* Optional* ) NA. Default: ` X_norm ` . |
267- | ` --max_n_links ` | ` integer ` | (* Optional* ) NA. Default: ` 50000 ` . |
268- | ` --verbose ` | ` integer ` | (* Optional* ) NA. Default: ` 2 ` . |
269- | ` --num_workers ` | ` integer ` | (* Optional* ) NA. Default: ` 20 ` . |
270- | ` --apply_tf ` | ` boolean ` | (* Optional* ) NA. Default: ` TRUE ` . |
271- | ` --apply_skeleton ` | ` boolean ` | (* Optional* ) NA. Default: ` FALSE ` . |
272- | ` --skeleton ` | ` file ` | (* Optional* ) NA. |
273- | ` --evaluation_data_sc ` | ` file ` | (* Optional* ) Perturbation dataset for benchmarking (sinlge cell). |
203+ AnnData object
204+ uns: 'dataset_id', 'method_id', 'metric_ids', 'metric_values'
274205
275206</div >
276207
277- ## Component type: metrics
278-
279- A metric to evaluate the performance of the inferred GRN
280-
281- Arguments:
208+ Data structure:
282209
283210<div class =" small " >
284211
285- | Name | Type | Description |
212+ | Slot | Type | Description |
286213| :---| :---| :---|
287- | ` --prediction ` | ` file ` | File indicating the inferred GRN. |
288- | ` --score ` | ` file ` | (* Output* ) File indicating the score of a metric. |
289- | ` --layer ` | ` string ` | (* Optional* ) NA. Default: ` X_norm ` . |
290- | ` --max_n_links ` | ` integer ` | (* Optional* ) NA. Default: ` 50000 ` . |
291- | ` --verbose ` | ` integer ` | (* Optional* ) NA. Default: ` 2 ` . |
292- | ` --num_workers ` | ` integer ` | (* Optional* ) NA. Default: ` 20 ` . |
293- | ` --apply_tf ` | ` boolean ` | (* Optional* ) NA. Default: ` TRUE ` . |
294- | ` --apply_skeleton ` | ` boolean ` | (* Optional* ) NA. Default: ` FALSE ` . |
295- | ` --skeleton ` | ` file ` | (* Optional* ) NA. |
214+ | ` uns["dataset_id"] ` | ` string ` | A unique identifier for the dataset. |
215+ | ` uns["method_id"] ` | ` string ` | A unique identifier for the method. |
216+ | ` uns["metric_ids"] ` | ` string ` | One or more unique metric identifiers. |
217+ | ` uns["metric_values"] ` | ` double ` | The metric values obtained for the given prediction. Must be of same length as ‘metric_ids’. |
296218
297219</div >
298220
299- ## File format: score
221+ ## File format: perturbation data (pseudo)bulk
300222
301- File indicating the score of a metric.
223+ Perturbation dataset for benchmarking
302224
303- Example file: ` resources_test/scores/score.h5ad `
225+ Example file:
226+ ` resources_test/grn_benchmark/evaluation_data/op_bulk.h5ad `
304227
305228Format:
306229
307230<div class =" small " >
308231
309232 AnnData object
310- uns: 'dataset_id', 'method_id', 'metric_ids', 'metric_values'
233+ obs: 'cell_type', 'perturbation', 'donor_id', 'perturbation_type'
234+ layers: 'X_norm'
235+ uns: 'dataset_id', 'dataset_name', 'dataset_summary', 'dataset_organism', 'normalization_id'
311236
312237</div >
313238
@@ -317,27 +242,32 @@ Data structure:
317242
318243| Slot | Type | Description |
319244| :---| :---| :---|
245+ | ` obs["cell_type"] ` | ` string ` | The annotated cell type of each cell based on RNA expression. |
246+ | ` obs["perturbation"] ` | ` string ` | Name of the column containing perturbation names. |
247+ | ` obs["donor_id"] ` | ` string ` | (* Optional* ) Donor id. |
248+ | ` obs["perturbation_type"] ` | ` string ` | (* Optional* ) Name of the column indicating perturbation type. |
249+ | ` layers["X_norm"] ` | ` double ` | Normalized values. |
320250| ` uns["dataset_id"] ` | ` string ` | A unique identifier for the dataset. |
321- | ` uns["method_id"] ` | ` string ` | A unique identifier for the method. |
322- | ` uns["metric_ids"] ` | ` string ` | One or more unique metric identifiers. |
323- | ` uns["metric_values"] ` | ` double ` | The metric values obtained for the given prediction. Must be of same length as ‘metric_ids’. |
251+ | ` uns["dataset_name"] ` | ` string ` | Nicely formatted name. |
252+ | ` uns["dataset_summary"] ` | ` string ` | Short description of the dataset. |
253+ | ` uns["dataset_organism"] ` | ` string ` | (* Optional* ) The organism of the sample in the dataset. |
254+ | ` uns["normalization_id"] ` | ` string ` | Which normalization was used. |
324255
325256</div >
326257
327- ## File format: perturbation data (pseudo)bulk
258+ ## File format: perturbation data differential expression
328259
329- Perturbation dataset for benchmarking
260+ Perturbation dataset for benchmarking (differential expression)
330261
331262Example file:
332- ` resources_test/grn_benchmark/evaluation_data/op_bulk .h5ad `
263+ ` resources_test/grn_benchmark/evaluation_data/replogle_de .h5ad `
333264
334265Format:
335266
336267<div class =" small " >
337268
338269 AnnData object
339270 obs: 'cell_type', 'perturbation', 'donor_id', 'perturbation_type'
340- layers: 'X_norm'
341271 uns: 'dataset_id', 'dataset_name', 'dataset_summary', 'dataset_organism', 'normalization_id'
342272
343273</div >
@@ -352,7 +282,6 @@ Data structure:
352282| ` obs["perturbation"] ` | ` string ` | Name of the column containing perturbation names. |
353283| ` obs["donor_id"] ` | ` string ` | (* Optional* ) Donor id. |
354284| ` obs["perturbation_type"] ` | ` string ` | (* Optional* ) Name of the column indicating perturbation type. |
355- | ` layers["X_norm"] ` | ` double ` | Normalized values. |
356285| ` uns["dataset_id"] ` | ` string ` | A unique identifier for the dataset. |
357286| ` uns["dataset_name"] ` | ` string ` | Nicely formatted name. |
358287| ` uns["dataset_summary"] ` | ` string ` | Short description of the dataset. |
@@ -433,3 +362,27 @@ Data structure:
433362
434363</div >
435364
365+ ## Component type: Control Method
366+
367+ Quality control methods for verifying the pipeline.
368+
369+ Arguments:
370+
371+ <div class =" small " >
372+
373+ | Name | Type | Description |
374+ | :---| :---| :---|
375+ | ` --rna ` | ` file ` | RNA expression data. |
376+ | ` --rna_all ` | ` file ` | (* Optional* ) RNA expression data that contains all variability. Only used for positive control. |
377+ | ` --prediction ` | ` file ` | (* Optional, Output* ) File indicating the inferred GRN. |
378+ | ` --tf_all ` | ` file ` | NA. Default: ` resources_test/grn_benchmark/prior/tf_all.csv ` . |
379+ | ` --max_n_links ` | ` integer ` | (* Optional* ) NA. Default: ` 50000 ` . |
380+ | ` --num_workers ` | ` integer ` | (* Optional* ) NA. Default: ` 20 ` . |
381+ | ` --temp_dir ` | ` string ` | (* Optional* ) NA. Default: ` output/temdir ` . |
382+ | ` --layer ` | ` string ` | (* Optional* ) NA. Default: ` lognorm ` . |
383+ | ` --seed ` | ` integer ` | (* Optional* ) NA. Default: ` 32 ` . |
384+ | ` --dataset_id ` | ` string ` | (* Optional* ) NA. Default: ` op ` . |
385+ | ` --apply_tf_methods ` | ` boolean ` | (* Optional* ) NA. Default: ` TRUE ` . |
386+
387+ </div >
388+
0 commit comments