Skip to content

refactor(benchmarks): decouple CI baseline from analytics + unify measurements() (split prep)#34

Draft
FBumann wants to merge 2 commits into
masterfrom
refactor/benchmarks-decouple-core
Draft

refactor(benchmarks): decouple CI baseline from analytics + unify measurements() (split prep)#34
FBumann wants to merge 2 commits into
masterfrom
refactor/benchmarks-decouple-core

Conversation

@FBumann

@FBumann FBumann commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

TODO (human): one line on why.

Note

Generated by AI (Claude).

Foundation for splitting the local analytics (plotting / sweep / memray engine / CLI) out of linopy into a standalone package — does the two structural changes that make that split clean, with no behaviour change.

Changes

1. Decouple the CI baseline from the analytics layer.
The CodSpeed baseline (registry + models/patterns + phases + test_*.py + conftest) now imports none of {memory, sweep, plotting, snapshot, bench, cli}. Two things were in the way:

  • spec_param_id (the <name>-<axis>=<value> convention) moved from snapshot.py into registry.py — it's a core concept used by the drivers.
  • benchmarks/__init__ no longer eagerly imports bench (from benchmarks import bench still works on demand).

Result: importing the package + the three CodSpeed test modules pulls in zero analytics; the dependency arrow only points analytics → core.

2. One measurements() contract for timing + memory.
The pytest drivers and the memray engine each re-derived "what runs + its node id" per phase — a duplication test_memory_id_alignment.py only band-aided. Now phases own phase_cases(phase) -> Iterable[PhaseCase] (each case a setup→action→teardown context manager); the drivers parametrize over it and the memray engine loops over it.

  • Every benchmark node id is byte-identical (verified by an empty pytest --co diff) → CodSpeed baselines untouched.
  • _measurements / _phase_tag gone; the memray engine now measures every available solver (was highs-only); the alignment test is deleted (ids are the same object now).
  • pipeline becomes a first-class opt-in phase (--phase pipeline, both metrics; out of the default run + CI).

Verify

  • ruff + mypy clean; harness tests pass; collected-node-id diff empty for every existing benchmark.

FBumann and others added 2 commits June 8, 2026 18:17
…cs layer

Prep for splitting the local benchmarking (memory, sweep, plotting, …) into a
standalone package — makes the CodSpeed CI baseline import-clean of the
analytics layer:

- Move `spec_param_id` (the `<name>-<axis>=<value>` param-id convention) from
  `snapshot.py` into `registry.py`. It's used by the core (`registry.param_ids`,
  `test_to_solver`) and was the only core → analytics import.
- Stop eagerly importing `bench` in `benchmarks/__init__`; `models`/`patterns`
  stay (they register specs). `from benchmarks import bench` still works on
  demand (submodule import).

Result: importing the package + the three CodSpeed test modules pulls zero of
{memory, sweep, plotting, snapshot, bench, cli}; the dependency arrow now only
points analytics → core. No behaviour change — collection, harness tests (incl.
the memory↔pytest id alignment), and the memory CLI are all unaffected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The pytest drivers and the memray engine each re-derived "what runs + its node
id" for every phase — a duplication that test_memory_id_alignment.py existed to
band-aid. Collapse it into a single source of truth: phases own a
phase_cases(phase) -> Iterable[PhaseCase], where each case is a context manager
(setup → measured action → teardown).

- phases.py: phase_cases + the per-phase case CMs (build/matrices/to_lp/netcdf/
  to_solver/pipeline) + PHASE_NODE. Setup (build, scratch files) runs untimed/
  untracked before the yielded action; the build is excluded for export phases
  and included for build/pipeline — uniformly, by construction.
- drivers: each test_<phase> is now a thin parametrize-over-phase_cases + the
  shared conftest.run_case. Node ids are byte-identical (verified by an empty
  `pytest --co` id diff), so CodSpeed baselines are untouched.
- memory.run_phase consumes the same phase_cases; _measurements and _phase_tag
  are gone, and the memray engine now measures every available solver (was
  highs-only) with ids matching pytest's.
- test_memory_id_alignment.py deleted — the ids are now the same object, not two
  strings hoped to match.

Also makes `pipeline` a first-class phase (build → matrices → lp in one region,
build included): selectable via `--phase pipeline` for both metrics, with a new
test_pipeline.py for timing. It stays opt-in (deselected unless `--pipeline`)
and out of CI (not in CODSPEED_MODULES).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codspeed-hq

codspeed-hq Bot commented Jun 8, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

✅ 79 untouched benchmarks
🆕 59 new benchmarks
⏩ 593 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
🆕 Memory test_to_lp[kvl_cycles-severity=100] N/A 38.5 MB N/A
🆕 Memory test_to_solver[highs-knapsack-n=10000] N/A 2.5 MB N/A
🆕 Memory test_to_solver[gurobi-knapsack-n=10000] N/A 2.8 MB N/A
🆕 Memory test_to_solver[gurobi-sos-n=1000] N/A 3.1 MB N/A
🆕 Memory test_to_solver[highs-expression_arithmetic-n=250] N/A 34.9 MB N/A
🆕 Memory test_to_solver[gurobi-piecewise-n=1000] N/A 5.1 MB N/A
🆕 Memory test_to_solver[gurobi-qp-n=1000] N/A 831.1 KB N/A
🆕 Memory test_to_solver[gurobi-kvl_cycles-severity=100] N/A 198.8 MB N/A
🆕 Memory test_to_lp[nodal_balance-severity=100] N/A 17.9 MB N/A
🆕 Memory test_to_lp[merge_balance-severity=100] N/A 17.6 MB N/A
🆕 Memory test_build[milp-n=50] N/A 283.7 KB N/A
🆕 Memory test_to_solver[highs-kvl_cycles-severity=100] N/A 217 MB N/A
🆕 Memory test_build[masked-n=100] N/A 735.4 KB N/A
🆕 Memory test_to_solver[highs-sparse_network-n=250] N/A 64.8 MB N/A
🆕 Memory test_to_solver[highs-cumsum-severity=100] N/A 70.7 MB N/A
🆕 Memory test_to_lp[sparse_network-n=250] N/A 34.5 MB N/A
🆕 Memory test_to_solver[highs-masked-n=100] N/A 659 KB N/A
🆕 Memory test_build[sos-n=1000] N/A 444.1 KB N/A
🆕 Memory test_to_solver[highs-piecewise-n=1000] N/A 4.2 MB N/A
🆕 Memory test_to_lp[milp-n=50] N/A 63.3 KB N/A
... ... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.


Comparing refactor/benchmarks-decouple-core (d2a5d54) with master (85c103c)

Open in CodSpeed

Footnotes

  1. 593 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant