From 853e51b643504449a638d97b3c6c36ea75f56e88 Mon Sep 17 00:00:00 2001 From: Dave Lucia Date: Thu, 21 May 2026 14:24:27 -0700 Subject: [PATCH 1/3] chore(B8): start plan --- .agents/plans/B8-inline-numeric-narrowing.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.agents/plans/B8-inline-numeric-narrowing.md b/.agents/plans/B8-inline-numeric-narrowing.md index ced56f6..5372ce9 100644 --- a/.agents/plans/B8-inline-numeric-narrowing.md +++ b/.agents/plans/B8-inline-numeric-narrowing.md @@ -5,7 +5,7 @@ issue: null pr: null branch: perf/inline-numeric-narrowing base: main -status: ready +status: in-progress direction: B unlocks: - small but free win on all integer-arithmetic workloads From ba2f3a25e8832c753cf6c33b27b510106064d9a4 Mon Sep 17 00:00:00 2001 From: Dave Lucia Date: Thu, 21 May 2026 14:28:11 -0700 Subject: [PATCH 2/3] perf(vm): fast-path Numeric.to_signed_int64 for in-range integers The Lua 5.3 wrap-around mask runs on every integer arithmetic result, but the overwhelming common case is an input already in [-2^63, 2^63 - 1], which passes through unchanged. Adding a guard-clause clause that returns the input as-is short-circuits the masking on that branch. `@compile {:inline, ...}` lets the BEAM inline both clauses at intra-module call sites; cross-module callers still trip a function boundary but the guarded clause's match cost is lower than the band+compare body. On fib(22), Numeric.to_signed_int64 self-time drops 3.82% -> 3.38% under tprof. On fib(30) wall clock, lua (chunk) improves 873.4ms -> 844.8ms (-3.3%), comfortably outside the run-to-run deviation band. Luerl (the control) does not move. Overflow tests (max_int + 1, min_int - 1, 0xFFFF...) still wrap correctly. Plan: .agents/plans/B8-inline-numeric-narrowing.md --- lib/lua/vm/numeric.ex | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/lib/lua/vm/numeric.ex b/lib/lua/vm/numeric.ex index d465668..c728cd3 100644 --- a/lib/lua/vm/numeric.ex +++ b/lib/lua/vm/numeric.ex @@ -39,6 +39,8 @@ defmodule Lua.VM.Numeric do @max_int 0x7FFFFFFFFFFFFFFF @min_int -0x8000000000000000 + @compile {:inline, signed?: 1, to_signed_int64: 1} + @doc "Maximum signed 64-bit integer (`2^63 - 1`)." @spec max_int() :: integer() def max_int, do: @max_int @@ -68,6 +70,10 @@ defmodule Lua.VM.Numeric do -1 """ @spec to_signed_int64(integer()) :: integer() + def to_signed_int64(n) when is_integer(n) and n >= @min_int and n <= @max_int do + n + end + def to_signed_int64(n) when is_integer(n) do masked = band(n, @uint64_mask) if masked >= @sign_bit, do: masked - @uint64_modulus, else: masked From 74f1d237a576687d23acc1ef34d8d79e47874f46 Mon Sep 17 00:00:00 2001 From: Dave Lucia Date: Thu, 21 May 2026 14:29:03 -0700 Subject: [PATCH 3/3] chore(B8): mark plan as review Records PR #227, captures the discovery that @compile {:inline, ...} does not cross module boundaries (so the fast path's win comes from the guard short-circuit only, not from call-site inlining), and the wall-clock fib(30) delta of -3.3%. --- .agents/plans/B8-inline-numeric-narrowing.md | 36 ++++++++++++++++++-- 1 file changed, 33 insertions(+), 3 deletions(-) diff --git a/.agents/plans/B8-inline-numeric-narrowing.md b/.agents/plans/B8-inline-numeric-narrowing.md index 5372ce9..afdbe10 100644 --- a/.agents/plans/B8-inline-numeric-narrowing.md +++ b/.agents/plans/B8-inline-numeric-narrowing.md @@ -2,10 +2,10 @@ id: B8 title: Inline `to_signed_int64/1` for the in-range fast path issue: null -pr: null +pr: 227 branch: perf/inline-numeric-narrowing base: main -status: in-progress +status: review direction: B unlocks: - small but free win on all integer-arithmetic workloads @@ -155,4 +155,34 @@ Lua.eval!(lua, chunk) ## Discoveries -(populated during implementation) +- `@compile {:inline, ...}` only inlines within the same module. Cross-module + callers in `Lua.VM.Executor` and `Lua.VM.Value` still trip a function + boundary on every call. tprof call count stayed at 85,968 before/after, + confirming no inlining happened at the dispatch sites. This caps the + realized win below the plan's stretch target — the gain comes entirely + from the guard short-circuit, not from inlining at call sites. +- Profile self-time on fib(22) moved 3.82% → 3.38%, a 12% relative drop + on the function itself. Plan's stretch target of < 1.5% was not hit + because it implicitly required cross-module inlining. +- Wall-clock win on fib(30) is real: lua (chunk) 873.4ms → 844.8ms + (**-3.3%**), well outside the ±0.5% deviation band. luerl (control) + did not move. The plan's 3% stretch floor on fib was met. + +## What changed + +- `lib/lua/vm/numeric.ex` — added in-range guard clause to + `to_signed_int64/1`; added `@compile {:inline, signed?: 1, + to_signed_int64: 1}`. + +PR: #227 + +Suite delta: 1692 tests passing → 1692 tests passing (no regression). +lua53 suite: 29 tests, 0 failures (matches main). + +Benchmarks (fib(30), 10s benchee, 2s warmup): + +| benchmark | baseline | after | delta | +|--------------|-------------|--------------|--------| +| lua (chunk) | 873.36 ms | 844.76 ms | -3.3% | +| lua (eval) | 876.74 ms | 852.21 ms | -2.8% | +| luerl (ctl) | 730.87 ms | 731.78 ms | noise |