Skip to content

Commit 2ba2966

Browse files
committed
chore: add radar benchmark suite (leanprover-community#33762)
This PR adds the benchmark suite to mathlib that radar has already been using for a while. It can also be run standalone following the instructions in `scripts/bench/README.md`. Radar will automatically switch to this suite once merged. Compared to the radar bench suite, there have been some smaller changes: 1. All measurement results are now collected in the `measurements.jsonl` file. Previously, some results were instead emitted via stdout/stderr, but that makes it more difficult to run the bench suite standalone. 2. The `lint` benchmark now no longer includes the build of `runLinter` in its measurements, only the linting itself. Co-authored-by: Joscha <joscha@plugh.de>
1 parent 9baa9e8 commit 2ba2966

19 files changed

Lines changed: 704 additions & 0 deletions

File tree

scripts/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ to learn about it as well!
1515
If these web pages are deprecated or removed, we should remove these scripts.
1616

1717
**Repository analysis and reporting**
18+
- `bench` is mathlib's benchmark suite. View its [README.md](bench/README.md) for more details.
1819
- `user_activity_report.py`
1920
Generates a comprehensive report of all users with repository access and their last commit activity.
2021
Shows username, age of last commit, and access level, sorted by commit recency (most recent first).

scripts/bench/README.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# Mathlib4 benchmark suite
2+
3+
This directory contains the mathlib4 benchmark suite.
4+
It is built around [radar](github.com/leanprover/radar)
5+
and benchmark results can be viewed
6+
on the [Lean FRO radar instance](https://radar.lean-lang.org/repos/mathlib4).
7+
8+
To execute the entire suite, run `scripts/bench/run` in the repo root.
9+
To execute an individual benchmark, run `scripts/bench/<benchmark>/run` in the repo root.
10+
All scripts output their measurements into the file `measurements.jsonl`.
11+
12+
Radar sums any duplicated measurements with matching metrics.
13+
To post-process the `measurements.jsonl` file this way in-place,
14+
run `scripts/bench/combine.py` in the repo root after executing the benchmark suite.
15+
16+
The `*.py` symlinks exist only so the python files are a bit nicer to edit
17+
in text editors that rely on the file ending.
18+
19+
## Adding a benchmark
20+
21+
To add a benchmark to the suite, follow these steps:
22+
23+
1. Create a new folder containing a `run` script and a `README.md` file describing the benchmark,
24+
as well as any other files required for the benchmark.
25+
2. Edit `scripts/bench/run` to call the `run` script of your new benchmark.
26+
27+
## How radar executes the benchmark suite
28+
29+
Radar requires a _bench repo_ to be configured for each repo.
30+
The bench repo contains scripts that execute benchmarks and present the results to radar
31+
following the
32+
[bench repo specification](https://github.com/leanprover/radar/blob/62bffab39025a1c2039499ae7a85b1ad446286d9/README.md#bench-repo-specification).
33+
34+
The bench repo for mathlib4 is
35+
[leanprover/radar-bench-mathlib4](https://github.com/leanprover/radar-bench-mathlib4).
36+
It calls the bench suite inside the mathlib repository and passes the results on to radar.
37+
It expects all measurements to be presented through the `measurements.jsonl` file,
38+
_not_ through stdout/stderr (even though this would be allowed by the bench script specification).

scripts/bench/build/README.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# The `build` benchmark
2+
3+
This benchmark executes a complete build of mathlib4 and collects global and per-module metrics.
4+
5+
The following metrics are collected by a wrapper around the entire build process:
6+
7+
- `build//instructions`
8+
- `build//maxrss`
9+
- `build//task-clock`
10+
- `build//wall-clock`
11+
12+
The following metrics are collected from `leanc --profile` and summed across all modules:
13+
14+
- `build/profile/<name>//wall-clock`
15+
16+
The following metrics are collected from `lakeprof report`:
17+
18+
- `build/lakeprof/longest build path//wall-clock`
19+
- `build/lakeprof/longest rebuild path//wall-clock`
20+
21+
The following metrics are collected individually for each module:
22+
23+
- `build/module/<name>//lines`
24+
- `build/module/<name>//instructions`
25+
26+
If the file `build_upload_lakeprof_report` is present in the repo root,
27+
the lakeprof report will be uploaded once the benchmark run concludes.
Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
#!/usr/bin/env python3
2+
3+
import argparse
4+
import json
5+
import re
6+
import subprocess
7+
import sys
8+
from pathlib import Path
9+
10+
NAME = "build"
11+
REPO = Path()
12+
BENCH = REPO / "scripts" / "bench"
13+
OUTFILE = REPO / "measurements.jsonl"
14+
15+
16+
def save_result(metric: str, value: float, unit: str | None = None) -> None:
17+
data = {"metric": metric, "value": value}
18+
if unit is not None:
19+
data["unit"] = unit
20+
with open(OUTFILE, "a+") as f:
21+
f.write(f"{json.dumps(data)}\n")
22+
23+
24+
def run(*command: str) -> None:
25+
result = subprocess.run(command)
26+
if result.returncode != 0:
27+
sys.exit(result.returncode)
28+
29+
30+
def run_stderr(*command: str) -> str:
31+
result = subprocess.run(command, capture_output=True, encoding="utf-8")
32+
if result.returncode != 0:
33+
print(result.stdout, end="", file=sys.stdout)
34+
print(result.stderr, end="", file=sys.stderr)
35+
sys.exit(result.returncode)
36+
return result.stderr
37+
38+
39+
def get_module(setup: Path) -> str:
40+
with open(setup) as f:
41+
return json.load(f)["name"]
42+
43+
44+
def count_lines(module: str, path: Path) -> None:
45+
with open(path) as f:
46+
lines = sum(1 for _ in f)
47+
save_result(f"{NAME}/module/{module}//lines", lines)
48+
49+
50+
def run_lean(module: str) -> None:
51+
stderr = run_stderr(
52+
f"{BENCH}/measure.py",
53+
*("-t", f"{NAME}/module/{module}"),
54+
*("-m", "instructions"),
55+
"--",
56+
*("lean", "--profile", "-Dprofiler.threshold=9999999"),
57+
*sys.argv[1:],
58+
)
59+
60+
for line in stderr.splitlines():
61+
# Output of `lean --profile`
62+
# See timeit.cpp for the time format
63+
if match := re.fullmatch(r"\t(.*) ([\d.]+)(m?s)", line):
64+
name = match.group(1)
65+
seconds = float(match.group(2))
66+
if match.group(3) == "ms":
67+
seconds = seconds / 1000
68+
save_result(f"{NAME}/profile/{name}//wall-clock", seconds, "s")
69+
70+
71+
def main() -> None:
72+
if sys.argv[1:] == ["--print-prefix"]:
73+
print(Path(__file__).resolve().parent.parent)
74+
return
75+
76+
if sys.argv[1:] == ["--githash"]:
77+
run("lean", "--githash")
78+
return
79+
80+
parser = argparse.ArgumentParser()
81+
parser.add_argument("lean", type=Path)
82+
parser.add_argument("--setup", type=Path)
83+
args, _ = parser.parse_known_args()
84+
85+
lean: Path = args.lean
86+
setup: Path = args.setup
87+
88+
module = get_module(setup)
89+
count_lines(module, lean)
90+
run_lean(module)
91+
92+
93+
if __name__ == "__main__":
94+
main()
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
lean
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
<!DOCTYPE html>
2+
<html>
3+
<head>
4+
<meta charset="UTF-8" />
5+
<title>Lakeprof Report</title>
6+
</head>
7+
<body>
8+
<h1>Lakeprof Report</h1>
9+
<button type="button" id="btn_fetch">View build trace in Perfetto</button>
10+
<pre><code>__LAKEPROF_REPORT__</code></pre>
11+
12+
<script type="text/javascript">
13+
// https://perfetto.dev/docs/visualization/deep-linking-to-perfetto-ui
14+
// https://gist.github.com/chromy/170c11ce30d9084957d7f3aa065e89f8
15+
16+
const BASE_URL = __BASE_URL__;
17+
const ORIGIN = "https://ui.perfetto.dev";
18+
19+
const btnFetch = document.getElementById("btn_fetch");
20+
21+
async function fetchAndOpen(traceUrl) {
22+
const resp = await fetch(traceUrl);
23+
// Error checking is left as an exercise to the reader.
24+
const blob = await resp.blob();
25+
const arrayBuffer = await blob.arrayBuffer();
26+
openTrace(arrayBuffer, traceUrl);
27+
}
28+
29+
function openTrace(arrayBuffer, traceUrl) {
30+
const win = window.open(ORIGIN);
31+
if (!win) {
32+
btnFetch.style.background = "#f3ca63";
33+
btnFetch.onclick = () => openTrace(arrayBuffer);
34+
btnFetch.innerText =
35+
"Popups blocked, click here to open the trace file";
36+
return;
37+
}
38+
39+
const timer = setInterval(() => win.postMessage("PING", ORIGIN), 50);
40+
41+
const onMessageHandler = (evt) => {
42+
if (evt.data !== "PONG") return;
43+
44+
// We got a PONG, the UI is ready.
45+
window.clearInterval(timer);
46+
window.removeEventListener("message", onMessageHandler);
47+
48+
const reopenUrl = new URL(location.href);
49+
reopenUrl.hash = `#reopen=${traceUrl}`;
50+
win.postMessage(
51+
{
52+
perfetto: {
53+
buffer: arrayBuffer,
54+
title: "Lake Build Trace",
55+
url: reopenUrl.toString(),
56+
},
57+
},
58+
ORIGIN
59+
);
60+
};
61+
62+
window.addEventListener("message", onMessageHandler);
63+
}
64+
65+
// This is triggered when following the link from the Perfetto UI's sidebar.
66+
if (location.hash.startsWith("#reopen=")) {
67+
const traceUrl = location.hash.substr(8);
68+
fetchAndOpen(traceUrl);
69+
}
70+
71+
btnFetch.onclick = () => fetchAndOpen(`${BASE_URL}/lakeprof.trace_event`);
72+
</script>
73+
</body>
74+
</html>
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
#!/usr/bin/env python3
2+
3+
import json
4+
import subprocess
5+
import sys
6+
from pathlib import Path
7+
8+
9+
def run(*args: str) -> None:
10+
subprocess.run(args, check=True)
11+
12+
13+
def run_stdout(*command: str, cwd: str | None = None) -> str:
14+
result = subprocess.run(command, capture_output=True, encoding="utf-8", cwd=cwd)
15+
if result.returncode != 0:
16+
print(result.stdout, end="", file=sys.stdout)
17+
print(result.stderr, end="", file=sys.stderr)
18+
sys.exit(result.returncode)
19+
return result.stdout
20+
21+
22+
def main() -> None:
23+
script_file = Path(__file__)
24+
template_file = script_file.parent / "lakeprof_report_template.html"
25+
26+
sha = run_stdout("git", "rev-parse", "@").strip()
27+
base_url = f"https://speed.lean-lang.org/mathlib4-out/{sha}"
28+
report = run_stdout("lakeprof", "report", "-prc")
29+
with open(template_file) as f:
30+
template = f.read()
31+
32+
template = template.replace("__BASE_URL__", json.dumps(base_url))
33+
template = template.replace("__LAKEPROF_REPORT__", report)
34+
35+
with open("index.html", "w") as f:
36+
f.write(template)
37+
38+
run("curl", "-T", "index.html", f"{base_url}/index.html")
39+
run("curl", "-T", "lakeprof.log", f"{base_url}/lakeprof.log")
40+
run("curl", "-T", "lakeprof.trace_event", f"{base_url}/lakeprof.trace_event")
41+
42+
43+
if __name__ == "__main__":
44+
main()

scripts/bench/build/run

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
#!/usr/bin/env bash
2+
set -euxo pipefail
3+
4+
BENCH="scripts/bench"
5+
6+
# Prepare build
7+
# Use build cache for proofwidgets, but not for anything else.
8+
lake clean
9+
LEAN_PATH=$(lean --print-libdir) lake build proofwidgets
10+
rm -f .lake/packages/batteries/.lake/build/bin/runLinter
11+
12+
# Run build
13+
LAKE_OVERRIDE_LEAN=true LEAN=$(realpath "$BENCH/build/fake-root/bin/lean") \
14+
"$BENCH/measure.py" -t build \
15+
-m instructions -m maxrss -m task-clock -m wall-clock -- \
16+
lakeprof record lake build --no-cache
17+
18+
# Analyze lakeprof data
19+
lakeprof report -pj | jq -c '{metric: "build/lakeprof/longest build path//wall-clock", value: .[-1][2], unit: "s"}' >> measurements.jsonl
20+
lakeprof report -rj | jq -c '{metric: "build/lakeprof/longest rebuild path//wall-clock", value: .[-1][2], unit: "s"}' >> measurements.jsonl
21+
22+
# Upload lakeprof report
23+
# Guarded to prevent accidental uploads (which wouldn't work anyways) during local runs.
24+
if [ -f build_upload_lakeprof_report ]; then
25+
python3 "$BENCH/build/lakeprof_report_upload.py"
26+
fi

scripts/bench/combine.py

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
#!/usr/bin/env python3
2+
3+
import argparse
4+
import json
5+
from pathlib import Path
6+
7+
OUTFILE = Path() / "measurements.jsonl"
8+
9+
if __name__ == "__main__":
10+
parser = argparse.ArgumentParser(
11+
description=f"Combine duplicated measurements in {OUTFILE.name} the way radar does, by summing their values."
12+
)
13+
args = parser.parse_args()
14+
15+
values: dict[str, float] = {}
16+
units: dict[str, str | None] = {}
17+
18+
with open(OUTFILE, "r") as f:
19+
for line in f:
20+
data = json.loads(line)
21+
metric = data["metric"]
22+
values[metric] = values.get(metric, 0) + data["value"]
23+
units[metric] = data.get("unit")
24+
25+
with open(OUTFILE, "w") as f:
26+
for metric, value in values.items():
27+
unit = units.get(metric)
28+
data = {"metric": metric, "value": value}
29+
if unit is not None:
30+
data["unit"] = unit
31+
f.write(f"{json.dumps(data)}\n")

scripts/bench/lint/README.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
## The `lint` benchmark
2+
3+
This benchmark runs `lake exe runLinter Mathlib`.
4+
It measures the following metrics:
5+
6+
- `lint//instructions`
7+
- `lint//maxrss`
8+
- `lint//task-clock`
9+
- `lint//wall-clock`

0 commit comments

Comments
 (0)