Skip to content

Commit 3ff151a

Browse files
Merge pull request #54 from AstroAI-Lab/update-docs
Update docs
2 parents 18388f9 + 7b37f42 commit 3ff151a

89 files changed

Lines changed: 4007 additions & 731 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/ci.yaml

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,10 @@ on:
99
branches:
1010
- main
1111
- develop
12+
types:
13+
- opened
14+
- synchronize
15+
- reopened
1216

1317
permissions:
1418
contents: write
@@ -113,8 +117,13 @@ jobs:
113117
fail_ci_if_error: true # Optional, ensures the CI fails if Codecov upload fails
114118

115119
docs:
116-
if: github.ref == 'refs/heads/main' || (github.event_name == 'pull_request' && github.event.pull_request.base.ref == 'main')
120+
name: Docs
117121
runs-on: ubuntu-latest
122+
permissions:
123+
contents: write # needed to push to gh-pages
124+
125+
# Only build+deploy docs when main is updated (i.e., after PR merge)
126+
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
118127

119128
strategy:
120129
matrix:
@@ -124,7 +133,7 @@ jobs:
124133
- name: Check out the repository
125134
uses: actions/checkout@v4
126135
with:
127-
fetch-depth: 0 # Fetch all history if necessary
136+
fetch-depth: 0
128137

129138
- name: Set up Python
130139
uses: actions/setup-python@v5
@@ -146,18 +155,20 @@ jobs:
146155
- name: Generate API Documentation with Sphinx
147156
run: |
148157
source .venv/bin/activate
149-
sphinx-apidoc -o docs/ codes
158+
mkdir -p docs/source/api
159+
sphinx-apidoc -o docs/source/api codes
150160
151161
- name: Build HTML with Sphinx
152162
run: |
153163
source .venv/bin/activate
154-
sphinx-build -b html docs/ docs/_build
164+
sphinx-build -b html docs/source docs/_build/html
155165
156-
- name: Deploy Sphinx API docs to gh-pages
166+
- name: Deploy docs to gh-pages
157167
uses: peaceiris/actions-gh-pages@v4
158168
with:
159169
github_token: ${{ secrets.GITHUB_TOKEN }}
160-
publish_dir: docs/_build
170+
publish_dir: docs/_build/html
161171
publish_branch: gh-pages
162172
user_name: "GitHub Actions"
163173
user_email: "actions@github.com"
174+
force_orphan: false

.pre-commit-config.yaml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
repos:
2+
- repo: https://github.com/psf/black
3+
rev: 26.1.0
4+
hooks:
5+
- id: black
6+
- repo: https://github.com/pycqa/isort
7+
rev: 7.0.0
8+
hooks:
9+
- id: isort

README.md

Lines changed: 50 additions & 154 deletions
Original file line numberDiff line numberDiff line change
@@ -1,176 +1,72 @@
11
# CODES Benchmark
22

3-
[![codecov](https://codecov.io/github/robin-janssen/CODES-Benchmark/branch/main/graph/badge.svg?token=TNF9ISCAJK)](https://codecov.io/github/robin-janssen/CODES-Benchmark)
4-
![Static Badge](https://img.shields.io/badge/license-GPLv3-blue)
5-
![Static Badge](https://img.shields.io/badge/NeurIPS-2024-green)
3+
[![codecov](https://codecov.io/github/robin-janssen/CODES-Benchmark/branch/main/graph/badge.svg?token=TNF9ISCAJK)](https://codecov.io/github/robin-janssen/CODES-Benchmark) ![Static Badge](https://img.shields.io/badge/license-GPLv3-blue) ![Static Badge](https://img.shields.io/badge/NeurIPS-2024-green)
64

7-
🎉 CODES was accepted to the ML4PS workshop @ NeurIPS2024 🎉
5+
🎉 Accepted to the ML4PS workshop @ NeurIPS 2024
86

9-
## Benchmarking Coupled ODE Surrogates
7+
Benchmark coupled ODE surrogate models on curated datasets with reproducible training, evaluation, and visualization pipelines. CODES helps you answer: *Which surrogate architecture fits my data, accuracy target, and runtime budget?*
108

11-
CODES is a benchmark for coupled ODE surrogate models.
9+
## What you get
1210

13-
<picture>
14-
<!-- Dark mode SVG -->
15-
<source media="(prefers-color-scheme: dark)" srcset="docs/_static/file-alt-solid-white.svg">
16-
<!-- Light mode SVG -->
17-
<source media="(prefers-color-scheme: light)" srcset="docs/_static/file-alt-solid.svg">
18-
<!-- Fallback image (light mode by default) -->
19-
<img width="14" alt="Paper on arXiv" src="docs/_static/book-solid.svg">
20-
</picture> CODES paper on <a href="https://arxiv.org/abs/2410.20886">arXiV</a>. <p></p>
11+
- Baseline surrogates (MultiONet, FullyConnected, LatentNeuralODE, LatentPoly) with configurable hyperparameters
12+
- Rich datasets spanning chemistry, astrophysics, and dynamical systems
13+
- Optional studies for interpolation/extrapolation, sparse data regimes, uncertainty estimation, and batch scaling
14+
- Automated reporting: accuracy tables, resource usage, gradient analyses, and dozens of diagnostic plots
2115

22-
<picture>
23-
<source srcset="docs/_static/favicon-96x96.png">
24-
<img width="15" alt="CODES Logo" src="docs/_static/favicon-96x96.png">
25-
</picture> The main documentation can be found on the <a href="https://codes-docs.web.app/index.html">CODES website</a>. <p></p>
16+
## Two-minute quickstart
2617

27-
<picture>
28-
<!-- Dark mode SVG -->
29-
<source media="(prefers-color-scheme: dark)" srcset="docs/_static/book-solid-white.svg">
30-
<!-- Light mode SVG -->
31-
<source media="(prefers-color-scheme: light)" srcset="docs/_static/book-solid.svg">
32-
<!-- Fallback image (light mode by default) -->
33-
<img width="14" alt="CODES API Docs" src="docs/_static/book-solid.svg">
34-
</picture> The technical API documentation is hosted on this <a href="https://robin-janssen.github.io/CODES-Benchmark/">GitHub Page</a>.
18+
**uv (recommended)**
3519

36-
## Motivation
37-
38-
There are many efforts to use machine learning models ("surrogates") to replace the costly numerics involved in solving coupled ODEs. But for the end user, it is not obvious how to choose the right surrogate for a given task. Usually, the best choice depends on both the dataset and the target application.
39-
40-
Dataset specifics - how "complex" is the dataset?
41-
42-
- How many samples are there?
43-
- Are the trajectories very dynamic or are the developments rather slow?
44-
- How dense is the distribution of initial conditions?
45-
- Is the data domain of interest well-covered by the domain of the training set?
46-
47-
Task requirements:
48-
49-
- What is the required accuracy?
50-
- How important is inference time? Is the training time limited?
51-
- Are there computational constraints (memory or processing power)?
52-
- Is uncertainty estimation required (e.g. to replace uncertain predictions by numerics)?
53-
- How much predictive flexibility is required? Do we need to interpolate or extrapolate across time?
54-
55-
Besides these practical considerations, one overarching question is always: Does the model only learn the data, or does it "understand" something about the underlying dynamics?
56-
57-
## Goals
58-
59-
This benchmark aims to aid in choosing the best surrogate model for the task at hand and additionally to shed some light on the above questions.
60-
61-
To achieve this, a selection of surrogate models are implemented in this repository. They can be trained on one of the included datasets or a custom dataset and then benchmarked on the corresponding test dataset.
62-
63-
Some **metrics** included in the benchmark (but there is much more!):
64-
65-
- Absolute and relative error of the models.
66-
- Inference time.
67-
- Number of trainable parameters.
68-
- Memory requirements (**WIP**).
69-
70-
Besides this, there are plenty of **plots and visualisations** providing insights into the models behaviour:
71-
72-
- Error distributions - per model, across time or per quantity.
73-
- Insights into interpolation and extrapolation across time.
74-
- Behaviour when training with sparse data or varying batch size.
75-
- Predictions with uncertainty and predictive uncertainty across time.
76-
- Correlations between the either predictive uncertainty or dynamics (gradients) of the data and the prediction error
77-
78-
Some prime **use-cases** of the benchmark are:
79-
80-
- Finding the best-performing surrogate on a dataset. Here, best-performing could mean high accuracy, low inference times or any other metric of interest (e.g. most accurate uncertainty estimates, ...).
81-
- Comparing performance of a novel surrogate architecture against the implemented baseline models.
82-
- Gaining insights into a dataset or comparing datasets using the built-in dataset insights.
83-
84-
## Key Features
85-
86-
<details>
87-
<summary><b>Baseline Surrogates</b></summary>
88-
89-
The following surrogate models are currently implemented to be benchmarked:
90-
91-
- Fully Connected Neural Network:
92-
The vanilla neural network a.k.a. multilayer perceptron.
93-
- DeepONet:
94-
Two fully connected networks whose outputs are combined using a scalar product. In the current implementation, the surrogate comprises of only one DeepONet with multiple outputs (hence the name MultiONet).
95-
- Latent NeuralODE:
96-
NeuralODE combined with an autoencoder that reduces the dimensionality of the dataset before solving the dynamics in the resulting latent space.
97-
- Latent Polynomial:
98-
Uses an autoencoder similar to Latent NeuralODE, but fits a polynomial to the trajectories in the resulting latent space.
99-
100-
</details>
101-
102-
<details>
103-
<summary><b>Baseline Datasets</b></summary>
104-
105-
The following datasets are currently included in the benchmark:
106-
107-
</details>
108-
109-
<details>
110-
<summary><b>Uncertainty Quantification (UQ)</b></summary>
111-
112-
To give an uncertainty estimate that does not rely too much on the specifics of the surrogate architecture, we use DeepEnsemble for UQ.
113-
114-
</details>
115-
116-
<details>
117-
<summary><b>Parallel Training</b></summary>
118-
119-
To gain insights into the surrogates behaviour, many models must be trained on varying subsets of the training data. This task is trivially parallelisable. In addition to utilising all specified devices, the benchmark features some nice progress bars to gain insights into the current status of the training.
120-
121-
</details>
122-
123-
<details>
124-
<summary><b>Plots, Plots, Plots</b></summary>
125-
126-
While hard metrics are crucial to compare the surrogates, performance cannot always be broken down to a set of numbers. Running the benchmark creates many plots that serve to compare performance of surrogates or provide insights into the performance of each surrogate.
127-
128-
</details>
129-
130-
<details>
131-
<summary><b>Dataset Insights (WIP)</b></summary>
132-
133-
"Know your data" is one of the most important rules in machine learning. To aid in this, the benchmark provides plots and visualisations that should help to understand the dataset better.
134-
135-
</details>
136-
137-
<details>
138-
<summary><b>Tabular Benchmark Results</b></summary>
20+
```bash
21+
git clone https://github.com/robin-janssen/CODES-Benchmark.git
22+
cd CODES-Benchmark
23+
uv sync # creates .venv from pyproject/uv.lock
24+
source .venv/bin/activate
25+
uv run python run_training.py --config configs/train_eval/config_minimal.yaml
26+
uv run python run_eval.py --config configs/train_eval/config_minimal.yaml
27+
```
13928

140-
At the end of the benchmark, the most important metrics are displayed in a table, additionally, all metrics generated during the benchmark are provided as a csv file.
29+
**pip alternative**
14130

142-
</details>
31+
```bash
32+
git clone https://github.com/robin-janssen/CODES-Benchmark.git
33+
cd CODES-Benchmark
34+
python -m venv .venv && source .venv/bin/activate
35+
pip install -e .
36+
pip install -r requirements.txt
37+
python run_training.py --config configs/train_eval/config_minimal.yaml
38+
python run_eval.py --config configs/train_eval/config_minimal.yaml
39+
```
14340

144-
<details>
145-
<summary><b>Reproducibility</b></summary>
41+
Outputs land in `trained/<training_id>`, `results/<training_id>`, and `plots/<training_id>`. The `configs/` folder contains ready-to-use templates (`train_eval/config_minimal.yaml`, `config_full.yaml`, etc.). Copy a file there and adjust datasets/surrogates/modalities before running the CLIs.
14642

147-
Randomness is an important part of machine learning and even required in the context of UQ with DeepEnsemble, but reproducibility is key in benchmarking enterprises. The benchmark uses a custom seed that can be set by the user to ensure full reproducibility.
43+
## Documentation
14844

149-
</details>
45+
- [Main docs & tutorials](https://robin-janssen.github.io/CODES-Benchmark/)
46+
- [API reference (Sphinx)](https://robin-janssen.github.io/CODES-Benchmark/modules.html)
47+
- [Paper on arXiv](https://arxiv.org/abs/2410.20886)
15048

151-
<details>
152-
<summary><b>Custom Datasets and Own Models</b></summary>
49+
The GitHub Pages site now hosts the narrative guides, configuration reference, and interactive notebooks alongside the generated API docs.
15350

154-
To cover a wide variety of use-cases, the benchmark is designed such that adding own datasets and models is explicitly supported.
51+
## Repository map
15552

156-
</details>
53+
| Path | Purpose |
54+
| --- | --- |
55+
| `configs/` | Ready-to-edit benchmark configs (`train_eval/`, `tuning/`, etc.) |
56+
| `datasets/` | Bundled datasets + download helper (`data_sources.yaml`) |
57+
| `codes/` | Python package with surrogates, training, tuning, and benchmarking utilities |
58+
| `run_training.py`, `run_eval.py`, `run_tuning.py` | CLI entry points for the main workflows |
59+
| `docs/` | Sphinx project powering the GitHub Pages site (guides, tutorials, API reference) |
60+
| `scripts/` | Convenience tooling (dataset downloads, analysis utilities) |
15761

158-
## Quickstart
62+
## Contributing
15963

160-
First, clone the [GitHub Repository](https://github.com/robin-janssen/CODES-Benchmark) with
64+
Pull requests are welcome! Please include documentation updates, add or update tests when you touch executable code, and run:
16165

66+
```bash
67+
uv pip install --group dev
68+
pytest
69+
sphinx-build -b html docs/source/ docs/_build/html
16270
```
163-
git clone ssh://git@github.com/robin-janssen/CODES-Benchmark
164-
```
165-
166-
Optionally, you can set up a [virtual environment](https://docs.python.org/3/library/venv.html) (recommended).
167-
168-
Then, install the required packages with
169-
170-
```
171-
pip install -r requirements.txt
172-
```
173-
174-
The installation is now complete. To be able to run and evaluate the benchmark, you need to first set up a configuration YAML file. There is one provided, but it should be configured. For more information, check the [configuration page](https://robin-janssen.github.io/CODES-Benchmark/documentation.html#config). There, we also offer an interactive Config-Generator tool with some explanations to help you set up your benchmark.
17571

176-
You can also add your own datasets and models to the benchmark to evaluate them against each other or some of our baseline models. For more information on how to do this, please refer to the [documentation](https://robin-janssen.github.io/CODES-Benchmark/documentation.html).
72+
If you publish a new surrogate or dataset, document it under `docs/guides` / `docs/reference` so users can adopt it quickly. For questions, open an issue on GitHub.

0 commit comments

Comments
 (0)