perf(vm): compile Lua prototypes to BEAM modules by davydog187 · Pull Request #235 · tv-labs/lua

davydog187 · 2026-05-22T15:49:27Z

Plan: B5a — Erlang codegen foundation

Plan: .agents/plans/B5a-erlang-codegen-foundation.md
Parent strategic plan: .agents/plans/B5-compile-prototypes-to-erlang.md

Goal

Land the foundation for compiling Lua prototypes to BEAM modules
via :compile.forms/2. A compiled prototype's call goes through a
new {:compiled_closure, mod, fun, upvalues, proto} value type,
bypassing the interpreter's register-tuple construction and per-opcode
dispatch loop entirely. This first PR covers arithmetic, comparison,
logical ops, conditional :test, single-result :call,
single-value :return, and the common _ENV.name lookup path.

Scope

Supported in this PR:

Constants and moves: :load_constant, :load_boolean, :load_nil,
:move, :source_line, :scope
Upvalues + globals: :get_upvalue, :get_open_upvalue,
:load_env, :get_global
_ENV.name field access: :get_field with binary literal name
(inlines the no-metatable fast path; metatable case delegates to
Executor.index_value/6)
Arithmetic with integer fast path: :add, :subtract, :multiply;
slow-path-only for :divide, :floor_divide, :modulo, :power,
:negate
Comparison with number fast path: :less_than, :less_equal,
:greater_than, :greater_equal; slow-path-only for :equal,
:not_equal
Logical :not
Conditional :test and :test_true — restricted to branches that
terminate via :return (no SSA-merging in B5a)
:call with single-result returns; routes through
call_function_with_position which bridges native-callback position
tracking but no-ops for pure Lua-to-Lua calls.

Out of scope (deliberately falling back to interpreter):

Tables → B5c
Closures, varargs, multi-return → B5d
Error position fidelity inside compiled raises → B5e
:goto/:label, loops (:numeric_for, :while_loop, etc.)

All-or-nothing per prototype: a prototype containing any unsupported
opcode falls back to interpretation in its entirety. Sub-prototypes
compile independently.

Success criteria

Lua.Compiler.Erlang.compile/1 exists and returns
{:ok, proto_with_compiled_module_set} for covered prototypes
Lua.VM.CompiledModule value type wired through
Executor.call_function/3 and the :call opcode dispatch
Every covered opcode lowered in Lua.Compiler.Erlang.Opcodes
Uncovered opcodes trigger fallback — never crash
Closure construction (:closure) emits :compiled_closure
when the nested prototype compiled, else :lua_closure
mix test: 1705 tests + 51 properties + 55 doctests, 0
failures
mix test --only lua53: 29 tests, 0 failures
fib(25) beats Luerl by ≥5x — not met (achieves ~1.1x).
The throw/catch overhead on non-tail returns and the
register-tuple setelement/3 churn dominate; B5b/B5c/B5d will
close the gap as more opcodes inline.
No workload regresses

Perf

fib(30), full mode:

Implementation	Mean	vs main	vs Luerl
main	~970 ms	1.00x	0.74x slower
B5a (this PR)	~670 ms	1.45x faster	1.07x faster
Luerl	~720 ms	1.35x faster	baseline
C Lua (luaport)	~27 ms	36x faster	27x faster

The compiled path beats Luerl modestly today. The 5x stretch target
is held back primarily by:

throw/catch for non-tail returns (~8% of CPU). This PR
optimises the function-tail :return to natural-return; returns
inside :test branches still throw. B5e (error fidelity) will
revisit the throw/catch shape.
setelement/3 per opcode write (~22% of CPU). Equivalent to
the interpreter's register-tuple cost. Register promotion to SSA
Erlang variables (deferred follow-up) eliminates this.
apply_arith_op / index_value calls when the inline fast path
doesn't fire. B5c adds table-opcode coverage which inlines more
paths.

Changes

lib/lua/compiler/erlang.ex — top-level compile/load orchestration
lib/lua/compiler/erlang/codegen.ex — abstract-forms generation
lib/lua/compiler/erlang/opcodes.ex — per-opcode lowering
lib/lua/compiler/erlang/runtime.ex — generated-code runtime helpers
lib/lua/compiler/prototype.ex — compiled_module field
lib/lua/compiler.ex — wire codegen into Lua.Compiler.compile/2
lib/lua/vm.ex — top-level execute dispatches to compiled module
lib/lua/vm/executor.ex — :compiled_closure clauses in
call_function/3 and the :call opcode; apply_arith_op/6,
apply_unary_op/5, apply_compare_op/6,
call_function_with_position/5 public helpers; index_value/6
promoted to public
lib/lua/vm/value.ex, lib/lua/util.ex, lib/lua/api.ex,
lib/lua/vm/display.ex, lib/lua/vm/stdlib*.ex, lib/lua.ex —
add :compiled_closure clauses everywhere :lua_closure was
pattern-matched

Verification

mix format
mix compile --warnings-as-errors
mix test                       # 1705 tests, 0 failures
mix test --only lua53          # 29 tests, 0 failures
MIX_ENV=benchmark mix run benchmarks/fibonacci.exs

Known limitations (followed up in B5b–B5e)

Every prototype gets a fresh module name; loaded modules persist
until BEAM exit. B5b introduces the content-addressable
ref-counted cache.
:get_field with non-binary name, all other table opcodes, and
closures fall back. B5c and B5d cover them.
Errors raised from compiled code carry the codegen-time :source_line
but not full position fidelity. B5e adds try/catch with
pc_to_line tables.
One observed :erl_lint :unsafe_var warning logs (not a failure)
for prototypes with a specific shape involving register write
inside :test branches that then continue. The prototype safely
falls back in that case.

Splits B5 into five sequential plans (B5a foundation, B5b lifecycle, B5c tables, B5d closures, B5e error fidelity) after three pre-flight spikes confirmed the dispatch-loop hypothesis: - Stripped fib(25): 278x faster than interpreter (BEAMASM ceiling) - Faithful fib(25): 12.4x faster than interpreter, 10.4x vs Luerl - Faithful table_sum: 2.1x faster than interpreter (modest by design) Spike benchmarks land permanently under benchmarks/b5_spike*.exs so each follow-on plan can re-measure against the same baseline. Plan: B5a (foundation)

Introduces Lua.Compiler.Erlang — a codegen that translates supported %Prototype{} values into Erlang abstract forms via :compile.forms/2, loaded as fresh BEAM modules at runtime. The dispatch path through {:compiled_closure, mod, fun, upvalues, proto} bypasses the interpreter's register-tuple construction and per-opcode dispatch loop entirely. Coverage in this PR (B5a — foundation): - arithmetic, comparison, logical ops (with integer fast paths) - control flow: :test (terminating branches), :test_true, early return - upvalues: :get_upvalue, :get_open_upvalue, :load_env, :get_global - :get_field on _ENV (inline no-metatable fast path; metatable case delegates to Executor.index_value/6) - :call with single-result returns; routes through call_function_with_position which bridges native-callback position tracking but no-ops for Lua-to-Lua calls. - :scope (transparent block inlining) - :move, :load_constant, :load_nil, :load_boolean, :source_line Out of scope (B5c/B5d/B5e): - table opcodes (:new_table, :get_table, :set_table, :set_list, :set_field, non-env :get_field) - closure construction (:closure), upvalue mutation (:set_upvalue, :set_open_upvalue), varargs, multi-value returns - error position fidelity for raises inside compiled code - :goto/:label, loops (:numeric_for, :while_loop, :repeat_loop, :generic_for, :break) The all-or-nothing rule applies per prototype: if any opcode in a prototype is unsupported, that prototype falls back to interpretation. Sub-prototypes compile or fall back independently, and the :closure opcode emits the appropriate value type per child. Suite: 1705 tests + 51 properties + 55 doctests, 0 failures. 29 lua53 tests, 0 failures. Perf (fib(30)): - main: ~970 ms - with B5a: ~670 ms (1.4x faster than main, 1.07x vs Luerl) The 5x-vs-Luerl stretch target from the plan is not met by this PR alone — most of the remaining gap is throw/catch overhead on the non-tail :return forms, register-tuple setelement churn, and the Process.put bridge on calls. Each closes incrementally as B5b through B5e land. Plan: B5a

davydog187 added 3 commits May 22, 2026 08:47

chore(B5a): mark plan as review and record discoveries

4a7cfac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(vm): compile Lua prototypes to BEAM modules#235

perf(vm): compile Lua prototypes to BEAM modules#235
davydog187 wants to merge 3 commits into
mainfrom
perf/erlang-codegen-foundation

davydog187 commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

davydog187 commented May 22, 2026

Plan: B5a — Erlang codegen foundation

Goal

Scope

Success criteria

Perf

Changes

Verification

Known limitations (followed up in B5b–B5e)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant