openproblems-bio
diff --git a/‎README.md‎
Lines changed: 90 additions & 137 deletions b/‎README.md‎
Lines changed: 90 additions & 137 deletions
@@ -6,34 +6,29 @@ This file is automatically generated from the tasks's api/*.yaml files.
 Do not edit this file directly.
 -->
 
-Benchmarking GRN inference methods
+Article: [geneRNIB: a living benchmark for gene regulatory network
+inference](https://www.biorxiv.org/content/10.1101/2025.02.25.640181v1)
 
-<!-- Leaderboard: 
-  [Performance comparision](https://add-grn--openproblems.netlify.app/results/grn_inference/) -->
-
-Check for [performance comparision](https://github.com/janursa/grn_benchmark/blob/main/notebooks/process_results.ipynb) of integrated GRN inference methods.
-
-Article: [geneRNIB: a living benchmark for gene regulatory network inference](https://www.biorxiv.org/content/10.1101/2025.02.25.640181v1)
-
-Documentation: 
+Documentation:
 [geneRNBI-doc](https://genernib-documentation.readthedocs.io/en/latest/)
 
-
 Repository:
 [openproblems-bio/task_grn_inference](https://github.com/openproblems-bio/task_grn_inference)
 
 If you use this framework, please cite
 
-```
-  @article{nourisa2025genernib,
-    title={geneRNIB: a living benchmark for gene regulatory network inference},
-    author={Nourisa, Jalil and Passemiers, Antoine and Stock, Marco and Zeller-Plumhoff, Berit and Cannoodt, Robrecht and Arnold, Christian and Tong, Alexander and Hartford, Jason and Scialdone, Antonio and Moreau, Yves and others},
-    journal={bioRxiv},
-    pages={2025--02},
-    year={2025},
-    publisher={Cold Spring Harbor Laboratory}
-  }
-```
+      @article{nourisa2025genernib,
+        title={geneRNIB: a living benchmark for gene regulatory network inference},
+        author={Nourisa, Jalil and Passemiers, Antoine and Stock, Marco and Zeller-Plumhoff, Berit and Cannoodt, Robrecht and Arnold, Christian and Tong, Alexander and Hartford, Jason and Scialdone, Antonio and Moreau, Yves and others},
+        journal={bioRxiv},
+        pages={2025--02},
+        year={2025},
+        publisher={Cold Spring Harbor Laboratory}
+      }
+
+Repository:
+[openproblems-bio/task_grn_inference](https://github.com/openproblems-bio/task_grn_inference)
+
 ## Description
 
 geneRNIB is a living benchmark platform for GRN inference. This platform
@@ -49,96 +44,41 @@ are re-assessed, and the leaderboard is updated accordingly. The aim is
 to evaluate both the accuracy and completeness of inferred GRNs. It is
 designed for both single-modality and multi-omics GRN inference.
 
-## Installation
-
-Install Docker, Java, and Viash using
-[these instructions](https://openproblems.bio/documentation/fundamentals/requirements).
-
-## Download resources
-```bash
-git clone --recursive git@github.com:openproblems-bio/task_grn_inference.git
-
-cd task_grn_inference
-```
-To interact with the framework,download the resources containing necessary inferene and evaluation datasets. 
-
-```bash
-pip install awscli
-aws s3 sync  s3://openproblems-data/resources/grn/grn_benchmark resources/grn_benchmark  --no-sign-request
-
-```
-
-
-## Run a GRN inference method 
-
-To infer a GRN for a given dataset (e.g. `op`) using simple Pearson correlation:
-
-```bash
-viash run src/methods/pearson_corr/config.vsh.yaml -- \
-            --rna resources/grn_benchmark/inference_data/op_rna.h5ad \
-            --tf_all resources/grn_benchmark/prior/tf_all.csv \ 
-            --prediction output/net.h5ad
-```
-
-## Evaluate a GRN model
-
-```bash
-bash scripts/run_grn_evaluation.sh \
-             --prediction=output/net.h5ad \
-             --dataset=op \ 
-             --build_images=true \ 
-             --save_dir=output 
-```
-`build_images` only needed for the first run.
-
-This outputs the scores into `output/score_uns.yaml`. 
-
-
-## Add a GRN inference method, evaluation metric, or dataset
-
-To add a new component to the repository, follow the [Documentation](https://genernib-documentation.readthedocs.io/en/latest/).
-
-## Run the entire pipline
-
-Run `scripts/run_all.sh` for the entire pipeline. Due to resource intensive nature of the task, we have splitted the pipeline into two steps of GRN inference and evaluation.
-
 ## Authors & contributors
 
 | name              | roles       |
 |:------------------|:------------|
 | Jalil Nourisa     | author      |
 | Robrecht Cannoodt | author      |
-| Antoine Passimier | contributor |
 | Jérémie Kalfon    | contributor |
+| Antoine Passimier | contributor |
 | Marco Stock       | contributor |
 | Christian Arnold  | contributor |
 
-
 ## API
 
 ``` mermaid
 flowchart TB
   file_atac_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-chromatin-accessibility-data'>chromatin accessibility data</a>")
   comp_method[/"<a href='https://github.com/openproblems-bio/task_grn_inference#component-type-method'>method</a>"/]
   file_prediction_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-grn-prediction'>GRN prediction</a>")
-  comp_metric_regression[/"<a href='https://github.com/openproblems-bio/task_grn_inference#component-type-feature-based-metrics'>feature-based metrics</a>"/]
-  comp_metric_ws[/"<a href='https://github.com/openproblems-bio/task_grn_inference#component-type-wasserstein-distance-metrics'>Wasserstein distance metrics</a>"/]
   comp_metric[/"<a href='https://github.com/openproblems-bio/task_grn_inference#component-type-metrics'>metrics</a>"/]
   file_score_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-score'>score</a>")
   file_evaluation_bulk_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-perturbation-data--pseudo-bulk'>perturbation data (pseudo)bulk</a>")
+  file_evaluation_de_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-perturbation-data-differential-expression'>perturbation data differential expression</a>")
   file_evaluation_sc_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-perturbation-data--sc-'>perturbation data (sc)</a>")
   file_rna_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-gene-expression-data'>gene expression data</a>")
+  comp_control_method[/"<a href='https://github.com/openproblems-bio/task_grn_inference#component-type-control-method'>Control Method</a>"/]
   file_atac_h5ad-.-comp_method
   comp_method-.->file_prediction_h5ad
-  file_prediction_h5ad---comp_metric_regression
-  file_prediction_h5ad---comp_metric_ws
   file_prediction_h5ad---comp_metric
-  comp_metric_regression-->file_score_h5ad
-  comp_metric_ws-->file_score_h5ad
   comp_metric-->file_score_h5ad
-  file_evaluation_bulk_h5ad---comp_metric_regression
-  file_evaluation_sc_h5ad-.-comp_metric_ws
+  file_evaluation_bulk_h5ad-.-comp_metric
+  file_evaluation_de_h5ad-.-comp_metric
+  file_evaluation_sc_h5ad-.-comp_metric
   file_rna_h5ad---comp_method
+  file_rna_h5ad---comp_control_method
+  comp_control_method-.->file_prediction_h5ad
 ```
 
 ## File format: chromatin accessibility data
@@ -189,12 +129,12 @@ Arguments:
 | `--prediction` | `file` | (*Optional, Output*) File indicating the inferred GRN. |
 | `--tf_all` | `file` | NA. Default: `resources_test/grn_benchmark/prior/tf_all.csv`. |
 | `--max_n_links` | `integer` | (*Optional*) NA. Default: `50000`. |
-| `--num_workers` | `integer` | (*Optional*) NA. Default: `20`. |
+| `--num_workers` | `integer` | (*Optional*) NA. Default: `2`. |
 | `--temp_dir` | `string` | (*Optional*) NA. Default: `output/temdir`. |
-| `--layer` | `string` | (*Optional*) NA. Default: `X_norm`. |
+| `--layer` | `string` | (*Optional*) NA. Default: `lognorm`. |
 | `--seed` | `integer` | (*Optional*) NA. Default: `32`. |
 | `--dataset_id` | `string` | (*Optional*) NA. Default: `op`. |
-| `--is_test` | `boolean` | (*Optional*) NA. Default: `FALSE`. |
+| `--apply_tf_methods` | `boolean` | (*Optional*) NA. Default: `TRUE`. |
 
 </div>
 
@@ -225,9 +165,9 @@ Data structure:
 
 </div>
 
-## Component type: feature-based metrics
+## Component type: metrics
 
-A regression metric to evaluate the performance of the inferred GRN
+A metric to evaluate the performance of the inferred GRN
 
 Arguments:
 
@@ -236,78 +176,63 @@ Arguments:
 | Name | Type | Description |
 |:---|:---|:---|
 | `--prediction` | `file` | File indicating the inferred GRN. |
+| `--evaluation_data` | `file` | (*Optional*) Perturbation dataset for benchmarking. |
+| `--evaluation_data_sc` | `file` | (*Optional*) Perturbation dataset for benchmarking (sinlge cell). |
+| `--evaluation_data_de` | `file` | (*Optional*) Perturbation dataset for benchmarking (differential expression). |
 | `--score` | `file` | (*Output*) File indicating the score of a metric. |
-| `--layer` | `string` | (*Optional*) NA. Default: `X_norm`. |
+| `--layer` | `string` | (*Optional*) NA. Default: `lognorm`. |
 | `--max_n_links` | `integer` | (*Optional*) NA. Default: `50000`. |
-| `--verbose` | `integer` | (*Optional*) NA. Default: `2`. |
+| `--tf_all` | `file` | (*Optional*) NA. |
 | `--num_workers` | `integer` | (*Optional*) NA. Default: `20`. |
 | `--apply_tf` | `boolean` | (*Optional*) NA. Default: `TRUE`. |
-| `--apply_skeleton` | `boolean` | (*Optional*) NA. Default: `FALSE`. |
-| `--skeleton` | `file` | (*Optional*) NA. |
-| `--evaluation_data` | `file` | Perturbation dataset for benchmarking. |
-| `--tf_all` | `file` | NA. |
+| `--regulators_consensus` | `file` | (*Optional*) NA. |
 | `--reg_type` | `string` | (*Optional*) NA. Default: `ridge`. |
 
 </div>
 
-## Component type: Wasserstein distance metrics
+## File format: score
 
-A Wasserstein distance based metric to evaluate the performance of the
-inferred GRN
+File indicating the score of a metric.
 
-Arguments:
+Example file: `resources_test/scores/score.h5ad`
+
+Format:
 
 <div class="small">
 
-| Name | Type | Description |
-|:---|:---|:---|
-| `--prediction` | `file` | File indicating the inferred GRN. |
-| `--score` | `file` | (*Output*) File indicating the score of a metric. |
-| `--layer` | `string` | (*Optional*) NA. Default: `X_norm`. |
-| `--max_n_links` | `integer` | (*Optional*) NA. Default: `50000`. |
-| `--verbose` | `integer` | (*Optional*) NA. Default: `2`. |
-| `--num_workers` | `integer` | (*Optional*) NA. Default: `20`. |
-| `--apply_tf` | `boolean` | (*Optional*) NA. Default: `TRUE`. |
-| `--apply_skeleton` | `boolean` | (*Optional*) NA. Default: `FALSE`. |
-| `--skeleton` | `file` | (*Optional*) NA. |
-| `--evaluation_data_sc` | `file` | (*Optional*) Perturbation dataset for benchmarking (sinlge cell). |
+    AnnData object
+     uns: 'dataset_id', 'method_id', 'metric_ids', 'metric_values'
 
 </div>
 
-## Component type: metrics
-
-A metric to evaluate the performance of the inferred GRN
-
-Arguments:
+Data structure:
 
 <div class="small">
 
-| Name | Type | Description |
+| Slot | Type | Description |
 |:---|:---|:---|
-| `--prediction` | `file` | File indicating the inferred GRN. |
-| `--score` | `file` | (*Output*) File indicating the score of a metric. |
-| `--layer` | `string` | (*Optional*) NA. Default: `X_norm`. |
-| `--max_n_links` | `integer` | (*Optional*) NA. Default: `50000`. |
-| `--verbose` | `integer` | (*Optional*) NA. Default: `2`. |
-| `--num_workers` | `integer` | (*Optional*) NA. Default: `20`. |
-| `--apply_tf` | `boolean` | (*Optional*) NA. Default: `TRUE`. |
-| `--apply_skeleton` | `boolean` | (*Optional*) NA. Default: `FALSE`. |
-| `--skeleton` | `file` | (*Optional*) NA. |
+| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
+| `uns["method_id"]` | `string` | A unique identifier for the method. |
+| `uns["metric_ids"]` | `string` | One or more unique metric identifiers. |
+| `uns["metric_values"]` | `double` | The metric values obtained for the given prediction. Must be of same length as ‘metric_ids’. |
 
 </div>
 
-## File format: score
+## File format: perturbation data (pseudo)bulk
 
-File indicating the score of a metric.
+Perturbation dataset for benchmarking
 
-Example file: `resources_test/scores/score.h5ad`
+Example file:
+`resources_test/grn_benchmark/evaluation_data/op_bulk.h5ad`
 
 Format:
 
 <div class="small">
 
     AnnData object
-     uns: 'dataset_id', 'method_id', 'metric_ids', 'metric_values'
+     obs: 'cell_type', 'perturbation', 'donor_id', 'perturbation_type'
+     layers: 'X_norm'
+     uns: 'dataset_id', 'dataset_name', 'dataset_summary', 'dataset_organism', 'normalization_id'
 
 </div>
 
@@ -317,27 +242,32 @@ Data structure:
 
 | Slot | Type | Description |
 |:---|:---|:---|
+| `obs["cell_type"]` | `string` | The annotated cell type of each cell based on RNA expression. |
+| `obs["perturbation"]` | `string` | Name of the column containing perturbation names. |
+| `obs["donor_id"]` | `string` | (*Optional*) Donor id. |
+| `obs["perturbation_type"]` | `string` | (*Optional*) Name of the column indicating perturbation type. |
+| `layers["X_norm"]` | `double` | Normalized values. |
 | `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
-| `uns["method_id"]` | `string` | A unique identifier for the method. |
-| `uns["metric_ids"]` | `string` | One or more unique metric identifiers. |
-| `uns["metric_values"]` | `double` | The metric values obtained for the given prediction. Must be of same length as ‘metric_ids’. |
+| `uns["dataset_name"]` | `string` | Nicely formatted name. |
+| `uns["dataset_summary"]` | `string` | Short description of the dataset. |
+| `uns["dataset_organism"]` | `string` | (*Optional*) The organism of the sample in the dataset. |
+| `uns["normalization_id"]` | `string` | Which normalization was used. |
 
 </div>
 
-## File format: perturbation data (pseudo)bulk
+## File format: perturbation data differential expression
 
-Perturbation dataset for benchmarking
+Perturbation dataset for benchmarking (differential expression)
 
 Example file:
-`resources_test/grn_benchmark/evaluation_data/op_bulk.h5ad`
+`resources_test/grn_benchmark/evaluation_data/replogle_de.h5ad`
 
 Format:
 
 <div class="small">
 
     AnnData object
      obs: 'cell_type', 'perturbation', 'donor_id', 'perturbation_type'
-     layers: 'X_norm'
      uns: 'dataset_id', 'dataset_name', 'dataset_summary', 'dataset_organism', 'normalization_id'
 
 </div>
@@ -352,7 +282,6 @@ Data structure:
 | `obs["perturbation"]` | `string` | Name of the column containing perturbation names. |
 | `obs["donor_id"]` | `string` | (*Optional*) Donor id. |
 | `obs["perturbation_type"]` | `string` | (*Optional*) Name of the column indicating perturbation type. |
-| `layers["X_norm"]` | `double` | Normalized values. |
 | `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
 | `uns["dataset_name"]` | `string` | Nicely formatted name. |
 | `uns["dataset_summary"]` | `string` | Short description of the dataset. |
@@ -433,3 +362,27 @@ Data structure:
 
 </div>
 
+## Component type: Control Method
+
+Quality control methods for verifying the pipeline.
+
+Arguments:
+
+<div class="small">
+
+| Name | Type | Description |
+|:---|:---|:---|
+| `--rna` | `file` | RNA expression data. |
+| `--rna_all` | `file` | (*Optional*) RNA expression data that contains all variability. Only used for positive control. |
+| `--prediction` | `file` | (*Optional, Output*) File indicating the inferred GRN. |
+| `--tf_all` | `file` | NA. Default: `resources_test/grn_benchmark/prior/tf_all.csv`. |
+| `--max_n_links` | `integer` | (*Optional*) NA. Default: `50000`. |
+| `--num_workers` | `integer` | (*Optional*) NA. Default: `20`. |
+| `--temp_dir` | `string` | (*Optional*) NA. Default: `output/temdir`. |
+| `--layer` | `string` | (*Optional*) NA. Default: `lognorm`. |
+| `--seed` | `integer` | (*Optional*) NA. Default: `32`. |
+| `--dataset_id` | `string` | (*Optional*) NA. Default: `op`. |
+| `--apply_tf_methods` | `boolean` | (*Optional*) NA. Default: `TRUE`. |
+
+</div>
+