-
Notifications
You must be signed in to change notification settings - Fork 1
183 lines (156 loc) · 8.38 KB
/
benchmark.yml
File metadata and controls
183 lines (156 loc) · 8.38 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
name: Benchmark
on:
schedule:
- cron: "0 6 * * 1" # Every Monday at 06:00 UTC
workflow_dispatch:
permissions:
contents: write
jobs:
benchmark:
name: macOS Benchmarks
runs-on: macos-14
env:
# Wipe the malt prefix between cold runs so the published numbers
# reflect a real network download, not a store-cache hit. Fail loud
# if any install returns non-zero so a broken install path can never
# silently publish fake timings.
BENCH_TRUE_COLD: "1"
BENCH_FAIL_FAST: "1"
steps:
- uses: actions/checkout@v6
- name: Install Zig
uses: mlugg/setup-zig@v2
with:
version: 0.16.0
- name: Install Rust
uses: dtolnay/rust-toolchain@stable
# ---------------------------------------------------------------
# Each step below delegates to scripts/bench.sh, which is also the
# supported way to reproduce these numbers locally:
#
# BENCH_TRUE_COLD=1 ./scripts/bench.sh
#
# The first step builds malt + the other tools and benches `tree`;
# the rest reuse those builds via SKIP_BUILD=1.
# ---------------------------------------------------------------
- name: Benchmark tree (also builds malt + other tools)
id: tree
run: ./scripts/bench.sh tree
- name: Benchmark wget
id: wget
run: SKIP_BUILD=1 ./scripts/bench.sh wget
- name: Benchmark ffmpeg
id: ffmpeg
run: SKIP_BUILD=1 ./scripts/bench.sh ffmpeg
# --- Race detection -----------------------------------------------------
#
# Single-sample benchmarks miss low-rate races in the parallel install
# pipeline. The P3 `fetchFormulaWorker` allocator race passed ~90% of
# cold ffmpeg runs and slipped through five subsequent PRs before the
# post-P7 head-to-head bench finally caught it.
#
# `BENCH_STRESS=20` runs malt cold install 20 times and exits non-zero
# on any single failure. With a 10% race, detection rate is ~88%; a 20%
# race is detected ~99% of the time. ffmpeg has the most deps (11) and
# exercises the parallel paths hardest, so it's the target here — costs
# ~80 seconds of CI time (20 runs × ~4 s each) in exchange for almost
# certain detection of the kind of bug `BENCH_FAIL_FAST=1` cannot catch.
#
# This step runs *before* the README update, so a race failure prevents
# the workflow from publishing misleading numbers to `main`.
- name: Stress-test malt cold install (ffmpeg ×20)
run: BENCH_STRESS=20 SKIP_BUILD=1 SKIP_OTHERS=1 SKIP_BREW=1 ./scripts/bench.sh ffmpeg
# --- Summary ---
- name: Summary
run: |
cat >> $GITHUB_STEP_SUMMARY << 'EOF'
## Benchmark Results — macOS 14 (Apple Silicon)
### Binary Size
| Tool | Size |
|------|------|
| **malt** | ${{ steps.tree.outputs.mt_size }} |
| nanobrew | ${{ steps.tree.outputs.nb_size }} |
| zerobrew | ${{ steps.tree.outputs.zb_size }} |
### Cold Install (median ±σ)
| Package | malt | nanobrew | zerobrew | Homebrew |
|---------|------|----------|----------|----------|
| **tree** (0 deps) | ${{ steps.tree.outputs.mt_cold_disp }} | ${{ steps.tree.outputs.nb_cold_disp }} | ${{ steps.tree.outputs.zb_cold_disp }} | ${{ steps.tree.outputs.brew_cold_disp }} |
| **wget** (6 deps) | ${{ steps.wget.outputs.mt_cold_disp }} | ${{ steps.wget.outputs.nb_cold_disp }} | ${{ steps.wget.outputs.zb_cold_disp }} | ${{ steps.wget.outputs.brew_cold_disp }} |
| **ffmpeg** (11 deps) | ${{ steps.ffmpeg.outputs.mt_cold_disp }} | ${{ steps.ffmpeg.outputs.nb_cold_disp }} | ${{ steps.ffmpeg.outputs.zb_cold_disp }} | ${{ steps.ffmpeg.outputs.brew_cold_disp }} |
### Warm Install
| Package | malt | nanobrew | zerobrew |
|---------|------|----------|----------|
| **tree** (0 deps) | ${{ steps.tree.outputs.mt_warm }} | ${{ steps.tree.outputs.nb_warm }} | ${{ steps.tree.outputs.zb_warm }} |
| **wget** (6 deps) | ${{ steps.wget.outputs.mt_warm }} | ${{ steps.wget.outputs.nb_warm }} | ${{ steps.wget.outputs.zb_warm }} |
| **ffmpeg** (11 deps) | ${{ steps.ffmpeg.outputs.mt_warm }} | ${{ steps.ffmpeg.outputs.nb_warm }} | ${{ steps.ffmpeg.outputs.zb_warm }} |
EOF
- name: Update README
run: |
DATE=$(date -u +"%Y-%m-%d")
# Generate new size table
cat > /tmp/size_table.md << TABLE
<!-- BENCH:SIZE:START -->
### Binary Size
| Tool | Size |
| ---- | ---- |
| **malt** | ${{ steps.tree.outputs.mt_size }} |
| nanobrew | ${{ steps.tree.outputs.nb_size }} |
| zerobrew | ${{ steps.tree.outputs.zb_size }} |
<!-- BENCH:SIZE:END -->
TABLE
# Generate cold install table. Cold cells use `_cold_disp`
# (pre-formatted "median±stddev s") so run-to-run noise is
# visible directly in the README — cold is where network and
# launchd jitter actually live. Warm stays as the bare median
# because warm-install variance is in the low-milliseconds.
cat > /tmp/cold_table.md << TABLE
<!-- BENCH:COLD:START -->
### Cold Install (median ±σ)
| Package | malt | nanobrew | zerobrew | Homebrew |
| ------- | ---- | -------- | -------- | -------- |
| **tree** (0 deps) | ${{ steps.tree.outputs.mt_cold_disp }} | ${{ steps.tree.outputs.nb_cold_disp }} | ${{ steps.tree.outputs.zb_cold_disp }} | ${{ steps.tree.outputs.brew_cold_disp }} |
| **wget** (6 deps) | ${{ steps.wget.outputs.mt_cold_disp }} | ${{ steps.wget.outputs.nb_cold_disp }} | ${{ steps.wget.outputs.zb_cold_disp }} | ${{ steps.wget.outputs.brew_cold_disp }} |
| **ffmpeg** (11 deps) | ${{ steps.ffmpeg.outputs.mt_cold_disp }} | ${{ steps.ffmpeg.outputs.nb_cold_disp }} | ${{ steps.ffmpeg.outputs.zb_cold_disp }} | ${{ steps.ffmpeg.outputs.brew_cold_disp }} |
<!-- BENCH:COLD:END -->
TABLE
# Generate warm install table
cat > /tmp/warm_table.md << TABLE
<!-- BENCH:WARM:START -->
### Warm Install
| Package | malt | nanobrew | zerobrew |
| ------- | ---- | -------- | -------- |
| **tree** (0 deps) | ${{ steps.tree.outputs.mt_warm }} | ${{ steps.tree.outputs.nb_warm }} | ${{ steps.tree.outputs.zb_warm }} |
| **wget** (6 deps) | ${{ steps.wget.outputs.mt_warm }} | ${{ steps.wget.outputs.nb_warm }} | ${{ steps.wget.outputs.zb_warm }} |
| **ffmpeg** (11 deps) | ${{ steps.ffmpeg.outputs.mt_warm }} | ${{ steps.ffmpeg.outputs.nb_warm }} | ${{ steps.ffmpeg.outputs.zb_warm }} |
<!-- BENCH:WARM:END -->
TABLE
# Strip leading whitespace from heredoc indentation
sed -i '' 's/^ //' /tmp/size_table.md /tmp/cold_table.md /tmp/warm_table.md
# Replace sections between markers
for marker in SIZE COLD WARM; do
file="/tmp/$(echo "$marker" | tr '[:upper:]' '[:lower:]')_table.md"
sed -i '' "/<!-- BENCH:${marker}:START -->/,/<!-- BENCH:${marker}:END -->/{
/<!-- BENCH:${marker}:START -->/{
r ${file}
d
}
/<!-- BENCH:${marker}:END -->/d
d
}" README.md
done
# Update date
sed -i '' "s/> Benchmarks on Apple Silicon.*/> Benchmarks on Apple Silicon (GitHub Actions macos-14), $DATE. Auto-updated weekly via [benchmark workflow](.github\/workflows\/benchmark.yml)./" README.md
- name: Commit results
# Only publish the README refresh when the workflow runs against
# the canonical branch. A manual `workflow_dispatch` against any
# PR or feature branch should still produce numbers in the job
# log, but must not overwrite README.md or push — which would
# otherwise create the format-vs-numbers conflict we saw on
# `refactor/zig-hardening`.
if: github.ref == 'refs/heads/main'
run: |
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"
git add README.md
git diff --cached --quiet || git commit -m "bench: update benchmark results $(date -u +%Y-%m-%d)"
git push