perf(vm): compile Lua prototypes to BEAM modules#235
Open
davydog187 wants to merge 3 commits into
Open
Conversation
Splits B5 into five sequential plans (B5a foundation, B5b lifecycle, B5c tables, B5d closures, B5e error fidelity) after three pre-flight spikes confirmed the dispatch-loop hypothesis: - Stripped fib(25): 278x faster than interpreter (BEAMASM ceiling) - Faithful fib(25): 12.4x faster than interpreter, 10.4x vs Luerl - Faithful table_sum: 2.1x faster than interpreter (modest by design) Spike benchmarks land permanently under benchmarks/b5_spike*.exs so each follow-on plan can re-measure against the same baseline. Plan: B5a (foundation)
Introduces Lua.Compiler.Erlang — a codegen that translates supported
%Prototype{} values into Erlang abstract forms via :compile.forms/2,
loaded as fresh BEAM modules at runtime. The dispatch path through
{:compiled_closure, mod, fun, upvalues, proto} bypasses the interpreter's
register-tuple construction and per-opcode dispatch loop entirely.
Coverage in this PR (B5a — foundation):
- arithmetic, comparison, logical ops (with integer fast paths)
- control flow: :test (terminating branches), :test_true, early return
- upvalues: :get_upvalue, :get_open_upvalue, :load_env, :get_global
- :get_field on _ENV (inline no-metatable fast path; metatable case
delegates to Executor.index_value/6)
- :call with single-result returns; routes through
call_function_with_position which bridges native-callback position
tracking but no-ops for Lua-to-Lua calls.
- :scope (transparent block inlining)
- :move, :load_constant, :load_nil, :load_boolean, :source_line
Out of scope (B5c/B5d/B5e):
- table opcodes (:new_table, :get_table, :set_table, :set_list,
:set_field, non-env :get_field)
- closure construction (:closure), upvalue mutation
(:set_upvalue, :set_open_upvalue), varargs, multi-value returns
- error position fidelity for raises inside compiled code
- :goto/:label, loops (:numeric_for, :while_loop, :repeat_loop,
:generic_for, :break)
The all-or-nothing rule applies per prototype: if any opcode in a
prototype is unsupported, that prototype falls back to interpretation.
Sub-prototypes compile or fall back independently, and the :closure
opcode emits the appropriate value type per child.
Suite: 1705 tests + 51 properties + 55 doctests, 0 failures.
29 lua53 tests, 0 failures.
Perf (fib(30)):
- main: ~970 ms
- with B5a: ~670 ms (1.4x faster than main, 1.07x vs Luerl)
The 5x-vs-Luerl stretch target from the plan is not met by this PR
alone — most of the remaining gap is throw/catch overhead on the
non-tail :return forms, register-tuple setelement churn, and the
Process.put bridge on calls. Each closes incrementally as B5b through
B5e land.
Plan: B5a
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Plan: B5a — Erlang codegen foundation
Plan:
.agents/plans/B5a-erlang-codegen-foundation.mdParent strategic plan:
.agents/plans/B5-compile-prototypes-to-erlang.mdGoal
Land the foundation for compiling Lua prototypes to BEAM modules
via
:compile.forms/2. A compiled prototype's call goes through anew
{:compiled_closure, mod, fun, upvalues, proto}value type,bypassing the interpreter's register-tuple construction and per-opcode
dispatch loop entirely. This first PR covers arithmetic, comparison,
logical ops, conditional
:test, single-result:call,single-value
:return, and the common_ENV.namelookup path.Scope
Supported in this PR:
:load_constant,:load_boolean,:load_nil,:move,:source_line,:scope:get_upvalue,:get_open_upvalue,:load_env,:get_global_ENV.namefield access::get_fieldwith binary literal name(inlines the no-metatable fast path; metatable case delegates to
Executor.index_value/6):add,:subtract,:multiply;slow-path-only for
:divide,:floor_divide,:modulo,:power,:negate:less_than,:less_equal,:greater_than,:greater_equal; slow-path-only for:equal,:not_equal:not:testand:test_true— restricted to branches thatterminate via
:return(no SSA-merging in B5a):callwith single-result returns; routes throughcall_function_with_positionwhich bridges native-callback positiontracking but no-ops for pure Lua-to-Lua calls.
Out of scope (deliberately falling back to interpreter):
:goto/:label, loops (:numeric_for,:while_loop, etc.)All-or-nothing per prototype: a prototype containing any unsupported
opcode falls back to interpretation in its entirety. Sub-prototypes
compile independently.
Success criteria
Lua.Compiler.Erlang.compile/1exists and returns{:ok, proto_with_compiled_module_set}for covered prototypesLua.VM.CompiledModulevalue type wired throughExecutor.call_function/3and the:callopcode dispatchLua.Compiler.Erlang.Opcodes:closure) emits:compiled_closurewhen the nested prototype compiled, else
:lua_closuremix test: 1705 tests + 51 properties + 55 doctests, 0failures
mix test --only lua53: 29 tests, 0 failuresThe throw/catch overhead on non-tail returns and the
register-tuple
setelement/3churn dominate; B5b/B5c/B5d willclose the gap as more opcodes inline.
Perf
fib(30), full mode:The compiled path beats Luerl modestly today. The 5x stretch target
is held back primarily by:
throw/catchfor non-tail returns (~8% of CPU). This PRoptimises the function-tail
:returnto natural-return; returnsinside
:testbranches still throw. B5e (error fidelity) willrevisit the throw/catch shape.
setelement/3per opcode write (~22% of CPU). Equivalent tothe interpreter's register-tuple cost. Register promotion to SSA
Erlang variables (deferred follow-up) eliminates this.
apply_arith_op/index_valuecalls when the inline fast pathdoesn't fire. B5c adds table-opcode coverage which inlines more
paths.
Changes
lib/lua/compiler/erlang.ex— top-level compile/load orchestrationlib/lua/compiler/erlang/codegen.ex— abstract-forms generationlib/lua/compiler/erlang/opcodes.ex— per-opcode loweringlib/lua/compiler/erlang/runtime.ex— generated-code runtime helperslib/lua/compiler/prototype.ex—compiled_modulefieldlib/lua/compiler.ex— wire codegen intoLua.Compiler.compile/2lib/lua/vm.ex— top-level execute dispatches to compiled modulelib/lua/vm/executor.ex—:compiled_closureclauses incall_function/3and the:callopcode;apply_arith_op/6,apply_unary_op/5,apply_compare_op/6,call_function_with_position/5public helpers;index_value/6promoted to public
lib/lua/vm/value.ex,lib/lua/util.ex,lib/lua/api.ex,lib/lua/vm/display.ex,lib/lua/vm/stdlib*.ex,lib/lua.ex—add
:compiled_closureclauses everywhere:lua_closurewaspattern-matched
Verification
Known limitations (followed up in B5b–B5e)
until BEAM exit. B5b introduces the content-addressable
ref-counted cache.
:get_fieldwith non-binary name, all other table opcodes, andclosures fall back. B5c and B5d cover them.
:source_linebut not full position fidelity. B5e adds try/catch with
pc_to_linetables.:erl_lint :unsafe_varwarning logs (not a failure)for prototypes with a specific shape involving register write
inside
:testbranches that then continue. The prototype safelyfalls back in that case.