Skip to content

Commit 77a36a2

Browse files
committed
Merge branch 'main' into all-contributors
2 parents c05227a + 3ff151a commit 77a36a2

89 files changed

Lines changed: 4010 additions & 733 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/ci.yaml

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,10 @@ on:
99
branches:
1010
- main
1111
- develop
12+
types:
13+
- opened
14+
- synchronize
15+
- reopened
1216

1317
permissions:
1418
contents: write
@@ -113,8 +117,13 @@ jobs:
113117
fail_ci_if_error: true # Optional, ensures the CI fails if Codecov upload fails
114118

115119
docs:
116-
if: github.ref == 'refs/heads/main' || (github.event_name == 'pull_request' && github.event.pull_request.base.ref == 'main')
120+
name: Docs
117121
runs-on: ubuntu-latest
122+
permissions:
123+
contents: write # needed to push to gh-pages
124+
125+
# Only build+deploy docs when main is updated (i.e., after PR merge)
126+
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
118127

119128
strategy:
120129
matrix:
@@ -124,7 +133,7 @@ jobs:
124133
- name: Check out the repository
125134
uses: actions/checkout@v4
126135
with:
127-
fetch-depth: 0 # Fetch all history if necessary
136+
fetch-depth: 0
128137

129138
- name: Set up Python
130139
uses: actions/setup-python@v5
@@ -146,18 +155,20 @@ jobs:
146155
- name: Generate API Documentation with Sphinx
147156
run: |
148157
source .venv/bin/activate
149-
sphinx-apidoc -o docs/ codes
158+
mkdir -p docs/source/api
159+
sphinx-apidoc -o docs/source/api codes
150160
151161
- name: Build HTML with Sphinx
152162
run: |
153163
source .venv/bin/activate
154-
sphinx-build -b html docs/ docs/_build
164+
sphinx-build -b html docs/source docs/_build/html
155165
156-
- name: Deploy Sphinx API docs to gh-pages
166+
- name: Deploy docs to gh-pages
157167
uses: peaceiris/actions-gh-pages@v4
158168
with:
159169
github_token: ${{ secrets.GITHUB_TOKEN }}
160-
publish_dir: docs/_build
170+
publish_dir: docs/_build/html
161171
publish_branch: gh-pages
162172
user_name: "GitHub Actions"
163173
user_email: "actions@github.com"
174+
force_orphan: false

.pre-commit-config.yaml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
repos:
2+
- repo: https://github.com/psf/black
3+
rev: 26.1.0
4+
hooks:
5+
- id: black
6+
- repo: https://github.com/pycqa/isort
7+
rev: 7.0.0
8+
hooks:
9+
- id: isort

README.md

Lines changed: 53 additions & 156 deletions
Original file line numberDiff line numberDiff line change
@@ -1,181 +1,78 @@
11
# CODES Benchmark
22

3-
[![codecov](https://codecov.io/github/robin-janssen/CODES-Benchmark/branch/main/graph/badge.svg?token=TNF9ISCAJK)](https://codecov.io/github/robin-janssen/CODES-Benchmark)
4-
![Static Badge](https://img.shields.io/badge/license-GPLv3-blue)
5-
![Static Badge](https://img.shields.io/badge/NeurIPS-2024-green)
3+
[![codecov](https://codecov.io/github/robin-janssen/CODES-Benchmark/branch/main/graph/badge.svg?token=TNF9ISCAJK)](https://codecov.io/github/robin-janssen/CODES-Benchmark) ![Static Badge](https://img.shields.io/badge/license-GPLv3-blue) ![Static Badge](https://img.shields.io/badge/NeurIPS-2024-green)
64
[![All Contributors](https://img.shields.io/github/all-contributors/AstroAI-Lab/CODES-Benchmark?color=ee8449&style=flat-square)](#contributors)
5+
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
6+
[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/)
77

8-
🎉 CODES was accepted to the ML4PS workshop @ NeurIPS2024 🎉
8+
🎉 Accepted to the ML4PS workshop @ NeurIPS 2024
99

10-
## Benchmarking Coupled ODE Surrogates
10+
Benchmark coupled ODE surrogate models on curated datasets with reproducible training, evaluation, and visualization pipelines. CODES helps you answer: _Which surrogate architecture fits my data, accuracy target, and runtime budget?_
1111

12-
CODES is a benchmark for coupled ODE surrogate models.
12+
## What you get
1313

14-
<picture>
15-
<!-- Dark mode SVG -->
16-
<source media="(prefers-color-scheme: dark)" srcset="docs/_static/file-alt-solid-white.svg">
17-
<!-- Light mode SVG -->
18-
<source media="(prefers-color-scheme: light)" srcset="docs/_static/file-alt-solid.svg">
19-
<!-- Fallback image (light mode by default) -->
20-
<img width="14" alt="Paper on arXiv" src="docs/_static/book-solid.svg">
21-
</picture> CODES paper on <a href="https://arxiv.org/abs/2410.20886">arXiV</a>. <p></p>
14+
- Baseline surrogates (MultiONet, FullyConnected, LatentNeuralODE, LatentPoly) with configurable hyperparameters
15+
- Rich datasets spanning chemistry, astrophysics, and dynamical systems
16+
- Optional studies for interpolation/extrapolation, sparse data regimes, uncertainty estimation, and batch scaling
17+
- Automated reporting: accuracy tables, resource usage, gradient analyses, and dozens of diagnostic plots
2218

23-
<picture>
24-
<source srcset="docs/_static/favicon-96x96.png">
25-
<img width="15" alt="CODES Logo" src="docs/_static/favicon-96x96.png">
26-
</picture> The main documentation can be found on the <a href="https://codes-docs.web.app/index.html">CODES website</a>. <p></p>
19+
## Two-minute quickstart
2720

28-
<picture>
29-
<!-- Dark mode SVG -->
30-
<source media="(prefers-color-scheme: dark)" srcset="docs/_static/book-solid-white.svg">
31-
<!-- Light mode SVG -->
32-
<source media="(prefers-color-scheme: light)" srcset="docs/_static/book-solid.svg">
33-
<!-- Fallback image (light mode by default) -->
34-
<img width="14" alt="CODES API Docs" src="docs/_static/book-solid.svg">
35-
</picture> The technical API documentation is hosted on this <a href="https://robin-janssen.github.io/CODES-Benchmark/">GitHub Page</a>.
21+
**uv (recommended)**
3622

37-
## Motivation
38-
39-
There are many efforts to use machine learning models ("surrogates") to replace the costly numerics involved in solving coupled ODEs. But for the end user, it is not obvious how to choose the right surrogate for a given task. Usually, the best choice depends on both the dataset and the target application.
40-
41-
Dataset specifics - how "complex" is the dataset?
42-
43-
- How many samples are there?
44-
- Are the trajectories very dynamic or are the developments rather slow?
45-
- How dense is the distribution of initial conditions?
46-
- Is the data domain of interest well-covered by the domain of the training set?
47-
48-
Task requirements:
49-
50-
- What is the required accuracy?
51-
- How important is inference time? Is the training time limited?
52-
- Are there computational constraints (memory or processing power)?
53-
- Is uncertainty estimation required (e.g. to replace uncertain predictions by numerics)?
54-
- How much predictive flexibility is required? Do we need to interpolate or extrapolate across time?
55-
56-
Besides these practical considerations, one overarching question is always: Does the model only learn the data, or does it "understand" something about the underlying dynamics?
57-
58-
## Goals
59-
60-
This benchmark aims to aid in choosing the best surrogate model for the task at hand and additionally to shed some light on the above questions.
61-
62-
To achieve this, a selection of surrogate models are implemented in this repository. They can be trained on one of the included datasets or a custom dataset and then benchmarked on the corresponding test dataset.
63-
64-
Some **metrics** included in the benchmark (but there is much more!):
65-
66-
- Absolute and relative error of the models.
67-
- Inference time.
68-
- Number of trainable parameters.
69-
- Memory requirements (**WIP**).
70-
71-
Besides this, there are plenty of **plots and visualisations** providing insights into the models behaviour:
72-
73-
- Error distributions - per model, across time or per quantity.
74-
- Insights into interpolation and extrapolation across time.
75-
- Behaviour when training with sparse data or varying batch size.
76-
- Predictions with uncertainty and predictive uncertainty across time.
77-
- Correlations between the either predictive uncertainty or dynamics (gradients) of the data and the prediction error
78-
79-
Some prime **use-cases** of the benchmark are:
80-
81-
- Finding the best-performing surrogate on a dataset. Here, best-performing could mean high accuracy, low inference times or any other metric of interest (e.g. most accurate uncertainty estimates, ...).
82-
- Comparing performance of a novel surrogate architecture against the implemented baseline models.
83-
- Gaining insights into a dataset or comparing datasets using the built-in dataset insights.
84-
85-
## Key Features
86-
87-
<details>
88-
<summary><b>Baseline Surrogates</b></summary>
89-
90-
The following surrogate models are currently implemented to be benchmarked:
91-
92-
- Fully Connected Neural Network:
93-
The vanilla neural network a.k.a. multilayer perceptron.
94-
- DeepONet:
95-
Two fully connected networks whose outputs are combined using a scalar product. In the current implementation, the surrogate comprises of only one DeepONet with multiple outputs (hence the name MultiONet).
96-
- Latent NeuralODE:
97-
NeuralODE combined with an autoencoder that reduces the dimensionality of the dataset before solving the dynamics in the resulting latent space.
98-
- Latent Polynomial:
99-
Uses an autoencoder similar to Latent NeuralODE, but fits a polynomial to the trajectories in the resulting latent space.
100-
101-
</details>
102-
103-
<details>
104-
<summary><b>Baseline Datasets</b></summary>
105-
106-
The following datasets are currently included in the benchmark:
107-
108-
</details>
109-
110-
<details>
111-
<summary><b>Uncertainty Quantification (UQ)</b></summary>
112-
113-
To give an uncertainty estimate that does not rely too much on the specifics of the surrogate architecture, we use DeepEnsemble for UQ.
114-
115-
</details>
116-
117-
<details>
118-
<summary><b>Parallel Training</b></summary>
119-
120-
To gain insights into the surrogates behaviour, many models must be trained on varying subsets of the training data. This task is trivially parallelisable. In addition to utilising all specified devices, the benchmark features some nice progress bars to gain insights into the current status of the training.
121-
122-
</details>
123-
124-
<details>
125-
<summary><b>Plots, Plots, Plots</b></summary>
126-
127-
While hard metrics are crucial to compare the surrogates, performance cannot always be broken down to a set of numbers. Running the benchmark creates many plots that serve to compare performance of surrogates or provide insights into the performance of each surrogate.
128-
129-
</details>
130-
131-
<details>
132-
<summary><b>Dataset Insights (WIP)</b></summary>
133-
134-
"Know your data" is one of the most important rules in machine learning. To aid in this, the benchmark provides plots and visualisations that should help to understand the dataset better.
135-
136-
</details>
137-
138-
<details>
139-
<summary><b>Tabular Benchmark Results</b></summary>
140-
141-
At the end of the benchmark, the most important metrics are displayed in a table, additionally, all metrics generated during the benchmark are provided as a csv file.
142-
143-
</details>
144-
145-
<details>
146-
<summary><b>Reproducibility</b></summary>
23+
```bash
24+
git clone https://github.com/robin-janssen/CODES-Benchmark.git
25+
cd CODES-Benchmark
26+
uv sync # creates .venv from pyproject/uv.lock
27+
source .venv/bin/activate
28+
uv run python run_training.py --config configs/train_eval/config_minimal.yaml
29+
uv run python run_eval.py --config configs/train_eval/config_minimal.yaml
30+
```
14731

148-
Randomness is an important part of machine learning and even required in the context of UQ with DeepEnsemble, but reproducibility is key in benchmarking enterprises. The benchmark uses a custom seed that can be set by the user to ensure full reproducibility.
32+
**pip alternative**
14933

150-
</details>
34+
```bash
35+
git clone https://github.com/robin-janssen/CODES-Benchmark.git
36+
cd CODES-Benchmark
37+
python -m venv .venv && source .venv/bin/activate
38+
pip install -e .
39+
pip install -r requirements.txt
40+
python run_training.py --config configs/train_eval/config_minimal.yaml
41+
python run_eval.py --config configs/train_eval/config_minimal.yaml
42+
```
15143

152-
<details>
153-
<summary><b>Custom Datasets and Own Models</b></summary>
44+
Outputs land in `trained/<training_id>`, `results/<training_id>`, and `plots/<training_id>`. The `configs/` folder contains ready-to-use templates (`train_eval/config_minimal.yaml`, `config_full.yaml`, etc.). Copy a file there and adjust datasets/surrogates/modalities before running the CLIs.
15445

155-
To cover a wide variety of use-cases, the benchmark is designed such that adding own datasets and models is explicitly supported.
46+
## Documentation
15647

157-
</details>
48+
- [Main docs & tutorials](https://robin-janssen.github.io/CODES-Benchmark/)
49+
- [API reference (Sphinx)](https://robin-janssen.github.io/CODES-Benchmark/modules.html)
50+
- [Paper on arXiv](https://arxiv.org/abs/2410.20886)
15851

159-
## Quickstart
52+
The GitHub Pages site now hosts the narrative guides, configuration reference, and interactive notebooks alongside the generated API docs.
16053

161-
First, clone the [GitHub Repository](https://github.com/robin-janssen/CODES-Benchmark) with
54+
## Repository map
16255

163-
```
164-
git clone ssh://git@github.com/robin-janssen/CODES-Benchmark
165-
```
56+
| Path | Purpose |
57+
| ------------------------------------------------- | -------------------------------------------------------------------------------- |
58+
| `configs/` | Ready-to-edit benchmark configs (`train_eval/`, `tuning/`, etc.) |
59+
| `datasets/` | Bundled datasets + download helper (`data_sources.yaml`) |
60+
| `codes/` | Python package with surrogates, training, tuning, and benchmarking utilities |
61+
| `run_training.py`, `run_eval.py`, `run_tuning.py` | CLI entry points for the main workflows |
62+
| `docs/` | Sphinx project powering the GitHub Pages site (guides, tutorials, API reference) |
63+
| `scripts/` | Convenience tooling (dataset downloads, analysis utilities) |
16664

167-
Optionally, you can set up a [virtual environment](https://docs.python.org/3/library/venv.html) (recommended).
65+
## Contributing
16866

169-
Then, install the required packages with
67+
Pull requests are welcome! Please include documentation updates, add or update tests when you touch executable code, and run:
17068

69+
```bash
70+
uv pip install --group dev
71+
pytest
72+
sphinx-build -b html docs/source/ docs/_build/html
17173
```
172-
pip install -r requirements.txt
173-
```
174-
175-
The installation is now complete. To be able to run and evaluate the benchmark, you need to first set up a configuration YAML file. There is one provided, but it should be configured. For more information, check the [configuration page](https://robin-janssen.github.io/CODES-Benchmark/documentation.html#config). There, we also offer an interactive Config-Generator tool with some explanations to help you set up your benchmark.
176-
177-
You can also add your own datasets and models to the benchmark to evaluate them against each other or some of our baseline models. For more information on how to do this, please refer to the [documentation](https://robin-janssen.github.io/CODES-Benchmark/documentation.html).
17874

75+
If you publish a new surrogate or dataset, document it under `docs/guides` / `docs/reference` so users can adopt it quickly. For questions, open an issue on GitHub.
17976

18077
## Contributors
18178

@@ -186,4 +83,4 @@ You can also add your own datasets and models to the benchmark to evaluate them
18683
<!-- markdownlint-restore -->
18784
<!-- prettier-ignore-end -->
18885

189-
<!-- ALL-CONTRIBUTORS-LIST:END -->
86+
<!-- ALL-CONTRIBUTORS-LIST:END -->

0 commit comments

Comments
 (0)