Skip to content

build(bench): add quick mode + multi-n inputs to benchmark harness#230

Merged
davydog187 merged 1 commit into
mainfrom
bench/quick-mode-multi-n
May 22, 2026
Merged

build(bench): add quick mode + multi-n inputs to benchmark harness#230
davydog187 merged 1 commit into
mainfrom
bench/quick-mode-multi-n

Conversation

@davydog187
Copy link
Copy Markdown
Contributor

Quick mode + multi-n inputs for the benchmark harness

Drops mix lua.bench runtime from ~17 minutes to ~80 seconds for
iterative use, while keeping the long-form numbers available behind an
env var. Independent of the B-series perf plans; this is harness work
that we'll lean on for every subsequent perf change.

Motivation

While iterating on B7 (table array+hash split), measurement noise made
it hard to tell whether a code change had moved the headline workload.
The 10 s benchee window is the right tradeoff for publishing a number
(low deviation, memory tracking, multiple input sizes) but it's the
wrong tradeoff for "did my last commit help?". Each mix lua.bench
cycle was ~17 minutes, which is slow enough that we'd run the bench
once and trust the result — which turned out to be a measurement
mistake exactly once.

A separate but related question: n=500 is one specific point on the
table workload curve. At that size, exponential-growth tuples are
already at 1024 capacity and most array work is amortized. n=10
exercises pre-doubling. n=100 hits the middle. Useful to see the
curve, not just one point.

Design

benchmarks/helpers.exs exposes a shared Bench module with one knob —
the LUA_BENCH_MODE env var:

  • default ("quick") — 1 s warmup, 3 s measurement, memory_time off.
    ~4 s per Benchee.run. Full mix lua.bench ≈ 80 s.
  • "full" — 2 s warmup, 10 s measurement, memory_time on, plus a
    sweep of n ∈ {10, 100, 1000} for the table workloads.
    ~1-2 min per Benchee.run. Full mix lua.bench ≈ 15+ min.

Each script does Code.require_file("helpers.exs", __DIR__) and calls
Bench.opts() in place of an inline keyword list. table_ops.exs is
restructured to use Benchee's inputs: from Bench.table_inputs/0 so
all sizes share warmup state per workload.

The mix lua.bench task forwards the parent process env, so
LUA_BENCH_MODE set in the user's shell propagates to the child
mix run automatically.

Usage

# Fast iteration loop
mix lua.bench

# Publishable numbers (for PR descriptions, ROADMAP.md, etc.)
LUA_BENCH_MODE=full mix lua.bench

# Single workload, fast loop
mix lua.bench --workload table_ops

Tradeoffs

Quick mode trades measurement precision for iteration speed. Deviation
bands grow from ±0.5% (10 s window) to ±15-25% (3 s window) on most
workloads. That's fine for "did my change move the needle by 10%+", but
it's not fine for headline numbers. The LUA_BENCH_MODE=full path
exists exactly to bridge that — any number we publish should come from
a full run, ideally with the machine in a known-cold state.

Changes

 benchmarks/closures.exs   |  10 ++--
 benchmarks/fibonacci.exs  |   6 +--
 benchmarks/helpers.exs    |  64 ++++++++++++++++++++++++++++++++ (new)
 benchmarks/oop.exs        |   6 +--
 benchmarks/string_ops.exs |  14 +++--
 benchmarks/table_ops.exs  | 134 ++++++++++++++------------------
 tasks/lua.bench.ex        |  17 +++++-
 7 files changed, 159 insertions(+), 95 deletions(-)

Verification

mix format
mix compile --warnings-as-errors
mix test    # 1692 tests, 0 failures (this PR doesn't touch lib/)

# Quick mode (default)
MIX_ENV=benchmark mix run benchmarks/fibonacci.exs
#   warmup: 1 s, time: 3 s, memory time: 0 ns, total ~16 s

# Full mode
LUA_BENCH_MODE=full MIX_ENV=benchmark mix run benchmarks/table_ops.exs
#   warmup: 2 s, time: 10 s, memory time: 1 s, n ∈ {10, 100, 1000}

# Via the task wrapper
mix lua.bench --workload table_ops          # quick, single workload
LUA_BENCH_MODE=full mix lua.bench           # full, all workloads

Sample output

Quick mode table_ops, on main:

=== Table Build (mode: quick) ===
##### With input medium (n=100) #####
Name                ips     average   deviation  median
C Lua (luaport)   107 K     9.35 µs   ±22.6%    9.08 µs
lua (chunk)        60 K    16.68 µs   ±11.7%   16.54 µs
luerl              57 K    17.42 µs   ±10.8%   17.38 µs
lua (eval)         53 K    18.84 µs   ±13.1%   18.71 µs

The deviation bands are wider than full mode (~±0.5%) but the
ips ordering is still clear and intermediate runs converge to the
full-mode answer at n=100 (luerl ≈ 18 µs, our chunk path ≈ 17 µs).

Out of scope (intentional)

  • Changing what the benchmarks measure. Only the harness shape moves.
  • Tuning the choice of n values further. The {10, 100, 1000} sweep
    in full mode is a starting point; if specific workloads need more
    granular curves we can extend Bench.table_inputs/0.
  • Capturing benchmark output to a file for tracking-over-time. That's
    a separate plan if we want it.

Adds benchmarks/helpers.exs exposing a shared Bench module with two
pre-canned Benchee profiles selected by the LUA_BENCH_MODE env var:

  * default ("quick") - 1 s warmup, 3 s measurement, memory_time off.
    Each Benchee.run takes ~4 s; the full mix lua.bench suite is ~80 s
    instead of ~17 min. For "did my change move the needle?" loops.
  * "full" - 2 s warmup, 10 s measurement, memory_time on, plus a
    sweep of input sizes (n=10, 100, 1000) for the table workloads.
    For any numbers we publish.

Each script Code.require_file/2s helpers.exs and calls Bench.opts() in
place of an inline keyword list. table_ops.exs is restructured to use
Benchee inputs: from Bench.table_inputs/0 so all sizes share warmup
and measurement state per workload.

Quick mode trades measurement precision (higher deviation bands) for
iteration speed. Full mode is the source of truth for published
numbers.

The mix lua.bench task forwards the parent process env, so
LUA_BENCH_MODE set in the user's shell propagates to the child
mix run automatically.
@davydog187 davydog187 merged commit f3919f2 into main May 22, 2026
4 checks passed
@davydog187 davydog187 deleted the bench/quick-mode-multi-n branch May 22, 2026 00:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant