-
Notifications
You must be signed in to change notification settings - Fork 78
bench: Add internal performance benchmark suite + CodSpeed CI #771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
8c908af
413f1c6
a6cc83b
300abb5
c725c68
99483f8
751aa78
8b124e2
86fd036
9be18e1
51f418d
c0f3fee
0522a75
7bb464e
83bdeda
8e378b5
f67721b
3ac333b
919e061
c921b78
4c6f328
d703cb1
69693c0
2f08aa6
321d2d9
a0d4b7a
45700e7
ad7aa53
7c7bab2
6a8a16d
09dad9d
2ece2c1
ea4bc76
cccd476
d34824a
d88f235
abb3f14
914efbf
eb687f1
5a08e79
e24451a
f4917dd
2993b95
ac1df53
0e6ec41
3981cad
59eadb3
cbf517a
4ba6fb4
d86b111
b153239
754e0ec
7d3e474
2621a7b
2656178
e7f9c5b
b35fafe
11f56d2
3091c64
e74ae1e
55612f5
c031153
3df647c
c5f23ec
2839145
4502fed
99f4f56
927750f
ee8d89a
cfcd4b2
9063ec8
8a4a8a6
abe6329
f2e63a2
816a5ce
ac1fda4
7f2585b
827a947
919e766
be65b12
f07fa3c
14b6445
e9934f0
1db88ec
8daaa2d
049a940
e08b44f
03beba4
942dec8
ac3a6d1
fb4a8bd
1ddba73
f235cc4
d0ca7d3
0ef08c0
dc8d404
ce449ce
ec473c8
755085a
ab61e27
1e2f49d
0478494
d0d0c52
8593377
e2276dc
f37c8be
520bb43
3fdb9b4
0dc9c0a
4db3c76
6d47dfa
a64c0f2
85c103c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| name: Benchmark smoke | ||
|
|
||
| # Builds every spec and fires every phase once under --quick | ||
| # --benchmark-disable: a "did a refactor break a spec?" check, not timing. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How long does this run?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 2 minutes Can be optimized I think. https://github.com/fluxopt/linopy/actions/runs/27085593476/job/79939326221 |
||
|
|
||
| on: | ||
| push: | ||
| branches: [ master ] | ||
| pull_request: | ||
| branches: [ '*' ] | ||
|
|
||
| concurrency: | ||
| group: ${{ github.workflow }}-${{ github.ref }} | ||
| cancel-in-progress: true | ||
|
|
||
| jobs: | ||
| smoke: | ||
| name: Benchmark smoke (quick) | ||
| runs-on: ubuntu-latest | ||
|
|
||
| steps: | ||
| - uses: actions/checkout@v6 | ||
| with: | ||
| fetch-depth: 0 # setuptools_scm | ||
|
|
||
| - name: Set up Python 3.12 | ||
| uses: actions/setup-python@v6 | ||
| with: | ||
| python-version: "3.12" | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Switch to 3.13? |
||
|
|
||
| - name: Install package and benchmark dependencies | ||
| run: | | ||
| python -m pip install uv | ||
| uv pip install --system -e ".[dev,benchmarks]" | ||
|
|
||
| - name: Run benchmark smoke | ||
| run: | | ||
| python -m benchmarks smoke | ||
|
|
||
| - name: Execute walkthrough notebook | ||
| # Catches doc rot: the walkthrough must stay runnable end-to-end. | ||
| run: | | ||
| python -m benchmarks notebook | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,62 @@ | ||
| name: CodSpeed (walltime macro) | ||
|
|
||
| # Wall-clock benchmarks on CodSpeed's dedicated bare-metal macro runners — the | ||
| # mode that reflects the real cost of dense-vs-sparse work (cache, allocation, | ||
| # native numpy/scipy), which instruction counting under-weights. | ||
| # | ||
| # Master push (updates the walltime baseline) + manual dispatch + opt-in per-PR | ||
| # via the ``trigger:benchmark`` label. Off every *unlabelled* PR: macro-runner | ||
| # minutes are metered (600/month free), and self-hosted bare-metal shouldn't run | ||
| # arbitrary PR code — the label is a maintainer-controlled gate, so only apply it | ||
| # to trusted (same-repo) PRs. | ||
| # | ||
| # Requires the repo under a GitHub org (macro runners are org-only) with the | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please remove the noise |
||
| # CodSpeed app connected to the repo (OIDC auth — no token secret needed). | ||
|
|
||
| on: | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we merge both codspeed files? |
||
| push: | ||
| branches: [ master ] | ||
| pull_request: | ||
| types: [ labeled, synchronize ] | ||
| branches: [ master ] | ||
| workflow_dispatch: | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess push master + workflow_dispatch is enough |
||
|
|
||
| concurrency: | ||
| group: ${{ github.workflow }}-${{ github.ref }} | ||
| cancel-in-progress: true | ||
|
|
||
| jobs: | ||
| macro: | ||
| name: CodSpeed walltime (macro runner) | ||
| # Always on master push / dispatch; on PRs only when explicitly labelled. | ||
| if: >- | ||
| ${{ github.event_name != 'pull_request' || | ||
| contains(github.event.pull_request.labels.*.name, 'trigger:benchmark') }} | ||
| runs-on: codspeed-macro | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What are costs here. Looks great running memory on github runners and clocktime on codspeed runners, can you point me to more resources? Specially if we wanna use this across multiple repos. How long does this run and how much is for free?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Walltime (macro) is free up to 600 min/month Cost after that is 0,013 €/min But I'm not sure if we can go over the 600 minutes without a paid plan... Everything else can run on a girhub runner and is free |
||
| # Non-gating until the CodSpeed app is connected to the repo (OIDC auth). | ||
| continue-on-error: true | ||
| permissions: | ||
| contents: read # actions/checkout | ||
| id-token: write # OIDC auth with CodSpeed — no token secret | ||
| steps: | ||
| - uses: actions/checkout@v6 | ||
| with: | ||
| fetch-depth: 0 # setuptools_scm | ||
|
|
||
| - name: Set up Python 3.12 | ||
| uses: actions/setup-python@v6 | ||
| with: | ||
| python-version: "3.12" | ||
|
|
||
| - name: Install pinned benchmark environment | ||
| # Pinned ``[benchmarks]`` extra so Dependabot bumps → one CodSpeed delta each. | ||
| run: | | ||
| python -m pip install uv | ||
| uv pip install --system -e ".[dev,benchmarks]" | ||
|
|
||
| - name: Run benchmarks under CodSpeed (walltime) | ||
| uses: CodSpeedHQ/action@v4 | ||
| with: | ||
| mode: walltime | ||
| run: | | ||
| pytest benchmarks/ --quick --codspeed | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,48 @@ | ||
| name: CodSpeed (memory) | ||
|
|
||
| # Heap-allocation tracking — the always-on signal for this sparsity/memory fork. | ||
| # Fast (~2 min) and free on a GitHub runner, so it runs on master (baseline) and | ||
| # every PR. A solo instrument on ubuntu: its one upload per (commit, env) never | ||
| # clashes with the walltime run, which is a separate bare-metal environment. | ||
|
|
||
| on: | ||
| push: | ||
| branches: [ master ] | ||
| pull_request: | ||
| branches: [ master ] | ||
| workflow_dispatch: | ||
|
|
||
| concurrency: | ||
| group: ${{ github.workflow }}-${{ github.ref }} | ||
| cancel-in-progress: true | ||
|
|
||
| jobs: | ||
| memory: | ||
| name: CodSpeed memory | ||
| runs-on: ubuntu-latest | ||
| # Non-gating: informational, never blocks a merge. | ||
| continue-on-error: true | ||
| permissions: | ||
| contents: read # actions/checkout | ||
| id-token: write # OIDC auth with CodSpeed — no token secret | ||
| steps: | ||
| - uses: actions/checkout@v6 | ||
| with: | ||
| fetch-depth: 0 # setuptools_scm | ||
|
|
||
| - name: Set up Python 3.12 | ||
| uses: actions/setup-python@v6 | ||
| with: | ||
| python-version: "3.12" | ||
|
|
||
| - name: Install pinned benchmark environment | ||
| run: | | ||
| python -m pip install uv | ||
| uv pip install --system -e ".[dev,benchmarks]" | ||
|
|
||
| - name: Run benchmarks under CodSpeed (memory) | ||
| uses: CodSpeedHQ/action@v4 | ||
| with: | ||
| mode: memory | ||
| run: | | ||
| pytest benchmarks/ --quick --codspeed |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,94 +1,69 @@ | ||
| # Internal Performance Benchmarks | ||
|
|
||
| Measures linopy's own performance (build time, LP write speed, memory usage) across problem sizes using [pytest-benchmark](https://pytest-benchmark.readthedocs.io/) and [pytest-memray](https://pytest-memray.readthedocs.io/). Use these to check whether a code change introduces a regression or improvement. | ||
| End-to-end performance tracking for `linopy` — build → solver handoff | ||
| → netCDF (de)serialization → fixed PyPSA model. Solver algorithm | ||
| runtime is out of scope. | ||
|
|
||
| > **Note:** The `benchmark/` directory (singular) contains *external* benchmarks comparing linopy against other modeling frameworks. This directory (`benchmarks/`) is for *internal* performance tracking only. | ||
| **The walkthrough is load-bearing.** Phase coverage, CLI introspection, | ||
| the two-snapshot regression workflow with inline Plotly views, and | ||
| how to extend the suite live in [`walkthrough.md`](walkthrough.md). | ||
| This README only covers install and how to open the walkthrough. | ||
|
|
||
| ## Setup | ||
| > `benchmark/` (singular) is the legacy external-framework suite. | ||
| > `benchmarks/` (plural) is this internal suite. | ||
|
|
||
| ```bash | ||
| pip install -e ".[benchmarks]" | ||
| ``` | ||
| ## Models vs patterns | ||
|
|
||
| ## Running benchmarks | ||
| Two kinds of benchmark spec, same harness (time *or* peak memory — a | ||
| `run`/`sweep` `--metric` flag, same phases), distinguished by their sweep axis: | ||
|
|
||
| ```bash | ||
| # Quick smoke test (small sizes only) | ||
| pytest benchmarks/ --quick | ||
| - **Models** (`models/`, `REGISTRY`) — whole `linopy.Model`s swept over | ||
| `size` (axis `n`): "how does cost scale with the problem?" | ||
| - **Patterns** (`patterns/`, `PATTERNS`) — fragments of realistic modelling | ||
| code (a balance constraint, a KVL contraction) swept over `severity` | ||
| (0–100, axis `severity`): "how does cost respond as one data shape goes | ||
| from benign to pathological?" Each `PatternSpec.description` documents what | ||
| its dial means (`"0: …, 100: …"`). | ||
|
|
||
| # Full timing benchmarks | ||
| pytest benchmarks/test_build.py benchmarks/test_lp_write.py benchmarks/test_matrices.py | ||
| Both kinds build a complete `linopy.Model`, so both run the **same phases** and | ||
| share the phase drivers (`test_build.py`, `test_matrices.py`, …) and `memory` | ||
| grid — they're just more `(spec, value)` rows, tagged by `axis`. There is no | ||
| separate pattern driver. Running a pattern through `build` *and* `lp_write` | ||
| shows whether a dense-`_term` blow-up propagates to export or collapses. | ||
|
|
||
| # Run a specific model | ||
| pytest benchmarks/test_build.py -k basic | ||
| ``` | ||
| Patterns target the operations where the dense-`_term` representation forces | ||
| materialisation — `groupby().sum()` padding, sparse `@` densification — so a | ||
| `severity` sweep draws the cost cliff, and a cross-version `compare` shows a | ||
| kernel change bending it. Adding either is one file: drop it in `models/` or | ||
| `patterns/`, call `register(...)` / `register_pattern(...)`. | ||
|
|
||
| ## Comparing timing between branches | ||
| ## Install | ||
|
|
||
| ```bash | ||
| # Save baseline results on master | ||
| git checkout master | ||
| pytest benchmarks/test_build.py --benchmark-save=master | ||
|
|
||
| # Switch to feature branch and compare | ||
| git checkout my-feature | ||
| pytest benchmarks/test_build.py --benchmark-save=my-feature --benchmark-compare=0001_master | ||
|
|
||
| # Compare saved results without re-running | ||
| pytest-benchmark compare 0001_master 0002_my-feature --columns=median,iqr | ||
| uv sync --extra dev --extra benchmarks | ||
| source .venv/bin/activate | ||
| ``` | ||
|
|
||
| Results are stored in `.benchmarks/` (gitignored). | ||
|
|
||
| ## Memory benchmarks | ||
| `pypsa` is optional — `pypsa_scigrid` and | ||
| `test_pypsa_carbon_management.py` skip gracefully without it. Install | ||
| when you need them: `uv pip install pypsa`. | ||
|
|
||
| `memory.py` runs each test in a separate process with pytest-memray to get accurate per-test peak memory (including C/numpy allocations). Results are saved as JSON and can be compared across branches. | ||
| The `[benchmarks]` extra in `pyproject.toml` pins every direct dep that | ||
| affects measurement (`numpy`, `scipy`, `xarray`, `pandas`, `polars`, | ||
| `dask`, etc.). `sweep` installs these into each per-version venv, so | ||
| "same deps, only linopy varies" comes for free without a separate | ||
| lockfile — bump the pins in pyproject and the next sweep picks them up. | ||
|
|
||
| By default, only the build phase (`test_build.py`) is measured. Unlike timing benchmarks where `benchmark()` isolates the measured function, memray tracks all allocations within a test — including model construction in setup. This means LP write and matrix tests would report build + phase memory combined, making the phase-specific contribution impossible to isolate. Since model construction dominates memory usage, measuring build alone gives the most actionable numbers. | ||
| ## Open the walkthrough | ||
|
|
||
| ```bash | ||
| # Save baseline on master | ||
| git checkout master | ||
| python benchmarks/memory.py save master | ||
|
|
||
| # Save feature branch | ||
| git checkout my-feature | ||
| python benchmarks/memory.py save my-feature | ||
|
|
||
| # Compare | ||
| python benchmarks/memory.py compare master my-feature | ||
|
|
||
| # Quick mode (smaller sizes, faster) | ||
| python benchmarks/memory.py save master --quick | ||
|
|
||
| # Measure a specific phase (includes build overhead) | ||
| python benchmarks/memory.py save master --test-path benchmarks/test_lp_write.py | ||
| python -m benchmarks notebook --build # (re)generate walkthrough.ipynb | ||
| jupyter lab benchmarks/walkthrough.ipynb # ...or PyCharm / VSCode | ||
| ``` | ||
|
|
||
| Results are stored in `.benchmarks/memory/` (gitignored). Requires Linux or macOS (memray is not available on Windows). | ||
|
|
||
| > **Note:** Small tests (~5 MiB) are near the import-overhead floor and may show noise of ~1 MiB between runs. Focus on larger tests for meaningful memory comparisons. Do not combine `--memray` with timing benchmarks — memray adds ~2x overhead that invalidates timing results. | ||
|
|
||
| ## Models | ||
|
|
||
| | Model | Description | Sizes | | ||
| |-------|-------------|-------| | ||
| | `basic` | Dense N*N model, 2*N^2 vars/cons | 10 — 1600 | | ||
| | `knapsack` | N binary variables, 1 constraint | 100 — 1M | | ||
| | `expression_arithmetic` | Broadcasting, scaling, summation across dims | 10 — 1000 | | ||
| | `sparse_network` | Ring network with mismatched bus/line coords | 10 — 1000 | | ||
| | `pypsa_scigrid` | Real power system (requires `pypsa`) | 10 — 200 snapshots | | ||
|
|
||
| ## Phases | ||
|
|
||
| | Phase | File | What it measures | | ||
| |-------|------|------------------| | ||
| | Build | `test_build.py` | Model construction (add_variables, add_constraints, add_objective) | | ||
| | LP write | `test_lp_write.py` | Writing the model to an LP file | | ||
| | Matrices | `test_matrices.py` | Generating sparse matrices (A, b, c, bounds) from the model | | ||
|
|
||
| ## Adding a new model | ||
| The `.md` is the source of truth; the `.ipynb` is a disposable, | ||
| gitignored build artifact. Edit the `.md`, re-run `--build`, re-open. | ||
| Same workflow in any editor. | ||
|
|
||
| 1. Create `benchmarks/models/my_model.py` with a `build_my_model(n)` function and a `SIZES` list | ||
| 2. Add parametrized tests in the relevant `test_*.py` files | ||
| 3. Add a quick threshold in `conftest.py` | ||
| CI executes the walkthrough end-to-end on every PR | ||
| (`python -m benchmarks notebook`) so the examples can't silently rot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pinning like you did in
pyproject.tomldoesn't give us full pins, because the deps of deps aren't pinned. Also, I'm not sure we need these per dependency PRs just to get codspeed stats on them. The user will use the newest versions anyway, and it's very unlikely that we'd pin deps for the user just because of codspeed.So I'd just fully remove this and the benchmark pins in pyproject.toml. For PRs, we probably want to just commit
uv.lockand use that one in all PRs (for tests and benchmarks), and on master we always resolve from scratch. But all CI runs need to be updated then. I guess this can rather be done in another PR, and here we just ignore dependency pinningThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided against a uv lock file to be able to sweep across linopy versions. As this is arguably not needed for ci and regression tracking, I'm fine with removing this and maybe even commuting a lock file instead. But lockfile usage needs to stay optional to make sweeping possible.
Another reason to add those here was to catch indirect improvements in upstream repos (xarray etc), as a dependabot or would also run when trying to bump the deps.
But we could probably simplify this.