From ab375c677bd9b1b91b7b8eb3b1311ef01ee22c47 Mon Sep 17 00:00:00 2001
From: Dave Lucia <davelucianyc@gmail.com>
Date: Fri, 22 May 2026 08:47:59 -0700
Subject: [PATCH 1/3] chore(plan): scope B5a-B5e and record spike benchmarks

Splits B5 into five sequential plans (B5a foundation, B5b lifecycle,
B5c tables, B5d closures, B5e error fidelity) after three pre-flight
spikes confirmed the dispatch-loop hypothesis:

- Stripped fib(25):  278x faster than interpreter (BEAMASM ceiling)
- Faithful fib(25):  12.4x faster than interpreter, 10.4x vs Luerl
- Faithful table_sum: 2.1x faster than interpreter (modest by design)

Spike benchmarks land permanently under benchmarks/b5_spike*.exs so
each follow-on plan can re-measure against the same baseline.

Plan: B5a (foundation)
---
 .../plans/B5-compile-prototypes-to-erlang.md  | 341 +++++++++++++++-
 .../plans/B5a-erlang-codegen-foundation.md    | 363 ++++++++++++++++++
 .agents/plans/B5b-module-lifecycle.md         | 236 ++++++++++++
 .agents/plans/B5c-table-opcodes.md            | 216 +++++++++++
 .agents/plans/B5d-closures-and-varargs.md     | 197 ++++++++++
 .agents/plans/B5e-error-position-fidelity.md  | 171 +++++++++
 benchmarks/b5_spike.exs                       | 126 ++++++
 benchmarks/b5_spike_faithful.exs              | 279 ++++++++++++++
 benchmarks/b5_spike_tables.exs                | 206 ++++++++++
 9 files changed, 2128 insertions(+), 7 deletions(-)
 create mode 100644 .agents/plans/B5a-erlang-codegen-foundation.md
 create mode 100644 .agents/plans/B5b-module-lifecycle.md
 create mode 100644 .agents/plans/B5c-table-opcodes.md
 create mode 100644 .agents/plans/B5d-closures-and-varargs.md
 create mode 100644 .agents/plans/B5e-error-position-fidelity.md
 create mode 100644 benchmarks/b5_spike.exs
 create mode 100644 benchmarks/b5_spike_faithful.exs
 create mode 100644 benchmarks/b5_spike_tables.exs

diff --git a/.agents/plans/B5-compile-prototypes-to-erlang.md b/.agents/plans/B5-compile-prototypes-to-erlang.md
index 3de7800..a93ea34 100644
--- a/.agents/plans/B5-compile-prototypes-to-erlang.md
+++ b/.agents/plans/B5-compile-prototypes-to-erlang.md
@@ -3,20 +3,39 @@ id: B5
 title: Compile Lua prototypes to Erlang functions (executor JIT)
 issue: null
 pr: null
-branch: perf/compile-to-erlang
+branch: n/a (split into B5a-B5e)
 base: main
-status: blocked
+status: split
 direction: B
 unlocks:
   - sub-Luerl latency on tight numeric/call workloads
   - "perf parity with Luerl, ±10%" 1.0 commitment headroom
 ---
 
-## Blocked on
+## Status: split into B5a–B5e
 
-- B4 — the flat instruction stream is the natural intermediate
-  representation to translate into Erlang. Trying to JIT directly from
-  the list-of-tuples shape would mix two structural changes in one PR.
+After three pre-flight spikes (recorded in `## Discoveries` below)
+the work was split into five sequential plans, each shippable as
+one PR per the `ship-a-plan` contract:
+
+- **B5a** — Erlang codegen foundation; covers fib + arithmetic +
+  control flow. Falls back on tables and closures.
+- **B5b** — Module lifecycle (cache + ref-counted purging).
+  Immediately after B5a; required before more opcodes ship.
+- **B5c** — Table opcodes.
+- **B5d** — Closures, varargs, multi-return.
+- **B5e** — Error position fidelity.
+
+This parent plan stays as the strategic record: spike data,
+architectural decisions, and what was decided out of scope. Read
+the child plans for what gets implemented.
+
+## Blocked on (historical)
+
+- B4 — the flat instruction stream was assumed to be the natural
+  intermediate representation. The B4 spike disproved this; B5
+  proceeds directly from the existing list-of-tuples shape. See
+  the B4 plan's Discoveries for why.
 
 ## Goal
 
@@ -251,4 +270,312 @@ IO.inspect(:code.all_loaded() |> length(), label: "loaded modules after 1000 eva
 
 ## Discoveries
 
-(populated during implementation)
+### Pre-flight spike (perf/b5-spike-fib, May 2026)
+
+Before committing to the multi-month build, a vertical-slice spike hand-
+wrote what `compile:forms/2` would emit for the fib prototype's hot
+path and compared it against the interpreter, native Elixir (BEAMASM
+ceiling), Luerl, and C Lua (via luaport). Spike source:
+`benchmarks/b5_spike.exs`.
+
+**fib(25), full mode:**
+
+| Implementation | Mean | Memory | vs interpreter |
+|---|---|---|---|
+| native elixir | 0.27 ms | 0 B | 325x faster |
+| compiled erlang | 0.89 ms | 0 B | **98x faster** |
+| C Lua (luaport) | 2.35 ms | 184 B | 37x faster |
+| luerl | 65.4 ms | 238 MB | 1.34x faster |
+| lua (chunk) | 87.7 ms | 705 MB | baseline |
+
+**fib(30), quick mode:**
+
+| Implementation | Mean | vs interpreter |
+|---|---|---|
+| native elixir | 3.30 ms | 294x faster |
+| compiled erlang | 9.67 ms | **100x faster** |
+| C Lua (luaport) | 26.8 ms | 36x faster |
+| luerl | 726 ms | 1.34x faster |
+| lua (chunk) | 970 ms | baseline |
+
+Ratios are stable across n; the result is not a small-n artefact.
+
+### What the spike shows
+
+- The compiled-erlang path is two orders of magnitude faster than the
+  interpreter on fib's hot path and is the only path that beats luerl
+  by more than a constant factor. The exit condition for going ahead
+  with B5 (≥30% win on fib(30)) is met by ~33x.
+- Memory is the more dramatic signal: 705 MB → 0 B on fib(25). The
+  interpreter's register-tuple churn (`setelement/3` at 25% of self-
+  time in the main-branch profile) **disappears completely** when the
+  prototype compiles to a module that uses Erlang variables instead of
+  a tuple. This validates that the `setelement/3` ceiling identified
+  in the B-series consolidation is not a wall — it is a property of
+  the interpreter's data shape, not of the BEAM.
+- The 3.3x gap between native elixir and compiled erlang is the
+  realistic ceiling for B5: cross-module inlining and constant-time
+  call resolution that runtime-loaded modules don't get. B5 should be
+  scoped against the compiled-erlang column, not the native-elixir
+  column.
+
+### Caveats the spike does not address
+
+1. **fib is the friendliest possible workload.** Pure integer math, no
+   tables, no metamethods, no strings, no upvalue mutations. The OOP
+   and table_ops benchmarks exercise costs the spike does not touch.
+   B5 may deliver smaller (still meaningful) wins on those.
+2. **The spike strips Lua semantics.** No register tuple, no `_ENV.fib`
+   lookup, no metamethod dispatch path on `<` or `+`. Each of those
+   reintroduces overhead a real B5 codegen must respect. The first PR
+   should validate that a faithful translation (register tuple +
+   `get_upvalue` + `get_field` for the recursive call) still clears
+   the original plan's success bar.
+3. **Module load cost is not amortised.** Compiled once outside the
+   benchmark. Content-addressable module cache (already in the plan)
+   handles repeated runs; one-shot scripts may be net slower.
+
+### Adjustments to the plan
+
+- Success criterion "fib(25) parity with Luerl ±5%" is too conservative
+  given the spike numbers. Update to "fib(25) beats Luerl by ≥20x" or
+  similar, set on the basis of the faithful-translation prototype, not
+  the stripped spike.
+- Option (1) (keep registers as tuple, eat `setelement/3`) is the right
+  first move. The spike showed the dispatch-loop win is overwhelming
+  even before register promotion; SSA promotion (`B5c`) can be deferred
+  without losing the bulk of the win.
+- The faithful-translation prototype (next step) should land as a
+  second spike before the full plan implementation begins. If a
+  faithful fib compiled module loses more than 5x against the stripped
+  spike, the Lua-semantics overhead is bigger than expected and the
+  plan needs another pass.
+
+### Spike artefact
+
+Branch `perf/b5-spike-fib`, file `benchmarks/b5_spike.exs`. Reproduce
+with `MIX_ENV=benchmark mix run benchmarks/b5_spike.exs`
+(or with `LUA_BENCH_MODE=full` / `FIB_N=30`).
+
+### Faithful follow-up spike (perf/b5-spike-fib, May 2026)
+
+The stripped spike answered "is there headroom?" Yes. This second
+spike answered "how much survives when we add back the Lua-VM
+machinery a real B5 codegen could not skip?"
+
+The faithful spike compiles fib via `compile:forms/2` and then, for
+its recursive call, looks up `_ENV` through the upvalue cell, fetches
+`_ENV.fib` from the globals table, and re-enters via
+`Lua.VM.Executor.call_function/3`. State threads through both calls.
+Args are boxed in a list, results unbox from a list — all the same
+protocol the interpreter uses.
+
+Source: `benchmarks/b5_spike_faithful.exs`. Required a small
+additive change in `lib/lua/vm/executor.ex` to register a
+`:compiled_closure` value type that dispatches to a BEAM module
+without building a callee register tuple (this is the win condition —
+the spike measures call cost when the dispatch shape itself is
+collapsed to a BEAM function call). The change is tagged as spike-
+only in comments; full test suite (1705 tests + 51 properties + 55
+doctests) still passes with it in place.
+
+**fib(25), full mode:**
+
+| Implementation | Mean | Memory | vs interpreter | vs Luerl |
+|---|---|---|---|---|
+| compiled-stripped | 0.28 ms | 0 MB | 278x | 232x |
+| native elixir | 0.32 ms | 0 MB | 243x | 210x |
+| C Lua (luaport) | 2.34 ms | 184 B | 33x | 28x |
+| **compiled-faithful** | **6.27 ms** | **13.0 MB** | **12.4x** | **10.4x** |
+| luerl | 64.8 ms | 227 MB | 1.2x | baseline |
+| lua (interpreter) | 77.7 ms | 673 MB | baseline | 1.2x slower |
+
+### What the faithful spike shows
+
+- B5 still clears its bar by a wide margin: **12.4x faster than the
+  interpreter, 10.4x faster than Luerl, 22x slower than the BEAMASM
+  ceiling**. The 22x gap between stripped and faithful is the real
+  cost of preserving Lua semantics during the recursive call (upvalue
+  cell lookup, get_field on `_ENV`, two `call_function/3` invocations
+  per frame, state threading, args/result list boxing).
+- Memory is the standout signal: 673 MB → 13 MB on fib(25), a 50x
+  reduction *even with* the call-protocol overhead intact. The
+  register-tuple `setelement/3` churn that consumed 25% of fib's
+  self-time on main is gone — the compiled function uses Erlang
+  variables, and the register tuple never enters the picture for
+  the compiled prototype itself.
+- Risks #5 in the original plan — "the 1.13x current gap may not be
+  reachable from here alone" — is falsified. The plan assumed
+  `setelement/3` was a floor. The spike shows it is a property of
+  the interpreter's data shape, not the BEAM.
+
+### What the faithful spike unlocks for the plan
+
+1. **The biggest remaining cost is the call protocol, not dispatch.**
+   That changes B5's phasing. A v1 that just collapses dispatch (the
+   plan's headline) gets most of the win. A follow-up that adds a
+   direct-call edge for compiled-to-compiled invocations (skipping
+   list boxing on args and results) would buy another large chunk
+   — likely a B5d or B5e plan.
+
+2. **`_ENV.fib` static-resolution is a real follow-up lever.** Every
+   recursive call re-resolves `fib` through `_ENV`. The interpreter
+   pays this. B5 codegen can prove (in the common case) that the
+   binding is stable across calls and emit a direct call. This is
+   peephole/escape-analysis work — defer to a follow-up plan.
+
+3. **Register-tuple `setelement/3` is not the ceiling.** This was
+   the dominant concern in the B-series consolidation (ROADMAP.md
+   §"What we learned"). The spike shows compiling out of the
+   register-tuple representation entirely (Option 1 in the plan,
+   ironically the conservative one) eliminates the cost completely
+   on prototypes that fit in BEAM registers. SSA promotion (`B5c`)
+   was scoped as the lever for this — it can be deferred without
+   losing the bulk of the win.
+
+### Revised success criteria
+
+Replace the plan's "fib(25) parity with Luerl ±5%" with:
+
+- Floor: fib(25) beats Luerl by ≥5x.
+- Target: fib(25) beats Luerl by ≥8x.
+- Stretch: fib(25) beats Luerl by ≥10x.
+
+The faithful spike hit 10.4x; even a halving of that gap for real-
+codegen overhead clears the floor comfortably.
+
+### What the spike did not prove
+
+- **Other workloads.** fib is pure integer math. The OOP, table_ops,
+  closures, and string_ops benchmarks exercise costs the spike does
+  not touch. A faithful spike on at least one table-heavy workload
+  should follow before B5 commits to a phasing — if (say) the
+  table_ops loop only wins 2-3x faithful, the plan's per-opcode
+  migration order may need to lead with table ops rather than
+  arithmetic.
+- **Compile-and-load amortisation.** Spike loads modules once outside
+  the loop. `Lua.VM.CodeCache` work in the plan stands.
+- **Module purging.** Spike never cleans up.
+
+### Spike artefacts
+
+- `benchmarks/b5_spike.exs` — stripped spike (no Lua semantics).
+- `benchmarks/b5_spike_faithful.exs` — faithful spike (full call
+  protocol).
+- `lib/lua/vm/executor.ex` — additive `:compiled_closure` dispatch
+  (spike-only, two clauses; see in-line comments).
+
+All on branch `perf/b5-spike-fib`. Reproduce:
+
+```
+MIX_ENV=benchmark mix run benchmarks/b5_spike_faithful.exs
+LUA_BENCH_MODE=full MIX_ENV=benchmark mix run benchmarks/b5_spike_faithful.exs
+```
+
+### Table-heavy spike (perf/b5-spike-fib, May 2026)
+
+The first two spikes measured fib — pure integer arithmetic, the
+friendliest possible workload. Open question after the faithful
+spike: does the win generalise to table-heavy code? Tables exercise
+costs B5 cannot eliminate (`Table.put/3` building a new map per
+mutation, `state.tables` updates per write).
+
+Third spike compiles `run_table_sum(n)` from
+`benchmarks/table_ops.exs` — two tight `:numeric_for` loops, one
+populating a 1..n table, one summing it. Every iteration of the
+first loop hits `:set_table`; every iteration of the second hits
+`:get_table`. Same `:compiled_closure` dispatch as the second spike.
+
+Source: `benchmarks/b5_spike_tables.exs`. The compiled function is
+written in Elixir rather than via `:compile.forms/2` — the second
+spike already proved `:compile.forms` output runs at near-native
+Elixir speed (1.13x slower in the worst case), and writing two
+recursive loop helpers as abstract forms would add ~200 lines without
+changing what's measured.
+
+**run_table_sum(n), full mode:**
+
+| n | Interpreter | Compiled | Luerl | C Lua | vs interp | vs Luerl | vs C Lua |
+|---|---|---|---|---|---|---|---|
+| 100  | 23.0 μs | 10.9 μs | 41.9 μs | 9.6 μs  | **2.1x** | **3.8x** | 0.88x slower |
+| 500  | 125 μs  | 56.4 μs | 146 μs  | 14.1 μs | **2.2x** | **2.6x** | 4.0x slower |
+| 1000 | 274 μs  | 131 μs  | 272 μs  | 20.1 μs | **2.1x** | **2.1x** | 6.6x slower |
+
+Memory at n=1000: interpreter 2.45 MB → compiled 0.59 MB (4.2x less).
+
+### What the table spike shows
+
+- **The compiled-vs-interpreter ratio is stable at ~2.1x across all
+  n.** Per-op interpreter dispatch is a constant per opcode, B5 saves
+  a constant fraction. Does not scale with n because the dominant
+  cost (table mutation allocation via `Table.put/3` + state.tables
+  update) is unchanged.
+- **The compiled-vs-C-Lua gap widens with n.** At n=100 we
+  essentially match C Lua. At n=1000 we are 6.6x slower. This is
+  allocation churn — every `t[i] = i` allocates a new `:data` map
+  and a new `state.tables` map. PUC-Lua mutates in place; we cannot
+  because tables are immutable maps. Same constraint that defeated
+  B7 (see ROADMAP.md §"What we learned").
+- **B5's win on tables is ~6x smaller than on fib.** fib's win was
+  12.4x faithful; tables is 2.1x faithful. Why: fib eliminates the
+  register-tuple `setelement/3` (25% of its self-time) entirely.
+  table_sum cannot escape the `Table.put` cost because that lives
+  in `state.tables`, not in registers — B5 saves dispatch around
+  the mutation, not the mutation itself.
+
+### What this changes about B5 phasing
+
+The plan's per-opcode phasing (arithmetic + control flow first,
+tables next, then metamethods, then native calls) is correct. What
+changes is the *expected return per phase*:
+
+- **Phase 1 (arithmetic + control flow):** the big win. Numeric
+  workloads jump from 1.2x-vs-Luerl (today) to ~10x. fib-style code
+  is the primary beneficiary. This is where most of the headline
+  performance numbers will come from.
+- **Phase 2 (table ops):** smaller win (~2x). Worth doing, but
+  table-heavy workloads will not see numbers that look like Phase 1.
+- **Phase 3+ (metamethods, native calls):** unmeasured. Each needs
+  a pre-flight spike if/when scoped.
+
+A Phase 1-only v1 would honestly ship — fib-style workloads get the
+big bump immediately, table workloads stay at interpreter speed
+until Phase 2 lands. The release notes need to be honest about which
+workloads benefit when.
+
+### Refined success criteria
+
+Replace single fib target with per-workload targets:
+
+- **Numeric workloads (fib, math.*):** floor 5x faster than Luerl,
+  target 8x, stretch 10x.
+- **Table workloads (table_sum, OOP, etc.):** floor 1.5x faster than
+  Luerl, target 2x. PUC-Lua parity is unreachable on BEAM for
+  table-heavy code — the third spike puts a hard number on this
+  (6.6x slower at n=1000 with the dispatch loop eliminated). The
+  remaining gap is allocation cost in immutable maps. Drop any
+  aspiration of PUC-Lua parity on table workloads.
+
+### Implication: parallel investigation worth scoping later
+
+Most of the table-workload allocation cost comes from `state.tables`
+being a map of maps — every mutation walks two levels. If a future
+plan changed table storage to something mutable from inside the BEAM
+(`:ets`, `:atomics`, or a per-state mutable structure with explicit
+GC integration), it would compose multiplicatively with B5: B5 saves
+dispatch, that change saves allocation. Together they could close
+the C-Lua gap meaningfully on table workloads.
+
+Not in scope for B5. Worth keeping in the back pocket as a B-series
+follow-up once B5 v1 has shipped and the data shape is the obvious
+remaining ceiling.
+
+### Third spike artefact
+
+`benchmarks/b5_spike_tables.exs`. Reuses the `:compiled_closure`
+dispatch from the second spike. Reproduce:
+
+```
+MIX_ENV=benchmark mix run benchmarks/b5_spike_tables.exs
+LUA_BENCH_MODE=full MIX_ENV=benchmark mix run benchmarks/b5_spike_tables.exs
+```
diff --git a/.agents/plans/B5a-erlang-codegen-foundation.md b/.agents/plans/B5a-erlang-codegen-foundation.md
new file mode 100644
index 0000000..c11e930
--- /dev/null
+++ b/.agents/plans/B5a-erlang-codegen-foundation.md
@@ -0,0 +1,363 @@
+---
+id: B5a
+title: Erlang codegen foundation — compile arithmetic + control flow prototypes to BEAM modules
+issue: null
+pr: null
+branch: perf/erlang-codegen-foundation
+base: main
+status: in-progress
+direction: B
+unlocks:
+  - B5b (lifecycle), B5c (tables), B5d (closures), B5e (errors)
+  - ~10x speedup over Luerl on numeric workloads (fib, math.*)
+  - ~2x speedup over Luerl on control-flow-heavy code
+---
+
+## Goal
+
+Land the foundation for compiling Lua `%Prototype{}` values to BEAM
+modules via `:compile.forms/2`. The compiled module gets dispatched
+through a new `:compiled_closure` value type that bypasses the
+interpreter's register-tuple construction and per-opcode dispatch
+loop entirely.
+
+This first PR covers every opcode **except tables and closures**:
+arithmetic, comparison, control flow (including loops and goto),
+bitwise ops, string concat/length, source-line tracking, calls,
+single-value returns, and upvalue reads (read-only, since closures
+ship in B5d). If a prototype contains a table or closure opcode the
+whole prototype falls back to the interpreter (all-or-nothing per
+prototype — mixed-mode interpret-from-pc is explicitly out of scope).
+
+## Why now
+
+Three pre-flight spikes (recorded under `## Discoveries` in
+`.agents/plans/B5-compile-prototypes-to-erlang.md`, branch
+`perf/b5-spike-fib`) measured the headroom against today's
+interpreter:
+
+- **Stripped fib(25):** 278x faster than interpreter (BEAMASM ceiling).
+- **Faithful fib(25):** 12.4x faster than interpreter, 10.4x faster
+  than Luerl. Memory 673 MB → 13 MB.
+- **Faithful run_table_sum(1000):** 2.1x faster than interpreter,
+  2.1x faster than Luerl.
+
+The dispatch-loop hypothesis from the parent plan is confirmed. The
+spike branch demonstrated the `:compiled_closure` dispatch shape;
+this plan productionises it.
+
+The library is pre-release and there is no flag — every prototype
+the codegen can handle goes through compilation. That's the bet.
+
+## Out of scope
+
+- Module lifecycle (cache, ref-counting, purging). Every prototype
+  gets a fresh module per compile in this PR. **Leaks. B5b fixes
+  this immediately after merge.**
+- Tables (`:new_table`, `:get_table`, `:set_table`, `:set_list`,
+  `:get_field` full path, `:set_field`). Falls back to interpreter.
+  B5c.
+- Closures (`:closure`, `:set_upvalue`, `:get_open_upvalue`,
+  `:set_open_upvalue`, `:vararg`, `:return_vararg`, `:return` count
+  > 1). Falls back to interpreter. B5d.
+- Error position fidelity for compiled code (line/source in raise
+  sites). B5e.
+- Mixed-mode (compiled prototype calls interpreter for one missing
+  opcode and resumes). All-or-nothing per prototype.
+- SSA / register promotion. Registers stay in a tuple in compiled
+  code — same shape as the interpreter. This is the conservative
+  option from the parent plan; the spike showed the dispatch win
+  alone justifies the work.
+
+## Success criteria
+
+- [ ] `Lua.Compiler.Erlang` module exists and converts a covered
+      `%Prototype{}` into Erlang abstract forms.
+- [ ] `Lua.VM.CompiledModule` value type exists and is dispatched
+      by `Executor.call_function/3` and the `:call` opcode.
+      Carries `{:compiled_closure, module_name, function_name,
+      upvalues_tuple}`.
+- [ ] `Lua.Compiler.compile/1,2` returns prototypes that have been
+      compiled to BEAM modules where the codegen accepts them.
+      Prototypes containing any uncovered opcode are returned as
+      plain interpreted prototypes (current behaviour).
+- [ ] Opcode coverage in this PR (everything except tables,
+      closures, varargs, multi-return, generic_for, tail_call,
+      self):
+      `:load_constant`, `:load_boolean`, `:load_nil`, `:move`,
+      `:source_line`, `:scope`, `:get_upvalue`, `:get_global`,
+      `:set_global`, `:load_env`, `:get_field` (env-lookup form
+      only — uses the same fast path as the interpreter's
+      `get_field` when reading from an upvalue-loaded register
+      holding `_ENV`), `:add`, `:subtract`, `:multiply`, `:divide`,
+      `:floor_divide`, `:modulo`, `:power`, `:negate`,
+      `:bitwise_and`, `:bitwise_or`, `:bitwise_xor`, `:shift_left`,
+      `:shift_right`, `:bitwise_not`, `:less_than`, `:less_equal`,
+      `:greater_than`, `:greater_equal`, `:equal`, `:not_equal`,
+      `:not`, `:length`, `:concatenate`, `:test`, `:test_true`,
+      `:test_and`, `:test_or`, `:goto`, `:label`, `:numeric_for`,
+      `:while_loop`, `:repeat_loop`, `:break`, `:call`, `:return`
+      (count = 1).
+      Out of scope and falling back:
+      `:new_table`/`:get_table`/`:set_table`/`:set_list`/
+      `:set_field`/non-env-form `:get_field` (→ B5c),
+      `:closure`/`:set_upvalue`/`:get_open_upvalue`/
+      `:set_open_upvalue`/`:vararg`/`:return_vararg`/
+      `:return` count > 1/`:generic_for`/`:self`/`:tail_call`
+      (→ B5d).
+- [ ] `mix test` passes; 1705 tests + 51 properties + 55 doctests.
+- [ ] `mix test --only lua53` does not regress.
+- [ ] fib(25) beats Luerl by ≥5x in `mix run benchmarks/fibonacci.exs`.
+      Stretch: ≥8x.
+- [ ] No workload regresses on the existing benchmark suite by more
+      than 5% (within noise).
+- [ ] Compiled-mode failures (codegen bugs) fall back gracefully to
+      interpretation — never crash. Logged via Logger.warning.
+
+## Implementation notes
+
+### Strategy
+
+`Lua.Compiler.Erlang.compile/1` takes a `%Prototype{}` and returns
+either `{:ok, compiled_prototype}` or `:fallback` if any opcode is
+uncovered. The codegen walks the instruction stream once, building
+Erlang abstract forms, then calls `:compile.forms/2` and
+`:code.load_binary/3`.
+
+Module names in this PR: `lua_proto_<unique_integer>`. Real
+content-addressable naming and lifecycle is B5b's job. Yes this
+leaks; one PR of leak is acceptable for the integration period.
+
+### Codegen shape
+
+The compiled function signature mirrors the spike's faithful path:
+
+```elixir
+@spec execute([term()], tuple(), State.t()) ::
+        {[term()], State.t()}
+def execute(args, upvalues, state) do
+  # body
+end
+```
+
+`args` is the call args as a list (matches `Executor.call_function/3`'s
+`:lua_closure` clause). `upvalues` is the upvalue cell-ref tuple
+threaded by the caller. `state` threads through.
+
+Inside the function:
+
+- A register variable for each register slot: `R0`, `R1`, …
+  Single-assignment Erlang variables. Reassigning `R3` becomes
+  `R3_1`, `R3_2`, … using a per-codegen-pass counter.
+- The parameters land in `R0..R{param_count-1}` from the args list
+  via pattern matching at the function head.
+- State is threaded as `State_0`, `State_1`, … through any opcode
+  that can mutate it. (`:call` and `:get_global` for upvalue
+  resolution can.) Most arithmetic is state-pure.
+
+### Control flow
+
+`:numeric_for`, `:while_loop`, `:repeat_loop` compile to
+**recursive Erlang helper functions** inside the generated module.
+This is the BEAM-native loop idiom and what `:compile.forms`
+produces for any Erlang `case`-based loop. Each loop gets a fresh
+helper named `loop_<counter>/N` where N covers the loop variable,
+limit, step, and any captured live variables.
+
+`:goto` + `:label` resolve at codegen time to a function call into
+a helper. The interpreter's `find_label/2` linear scan is replaced
+by a compile-time label-to-helper map.
+
+`:break` becomes an early return from the loop helper.
+
+### Opcode lowering
+
+Each covered opcode lowers to a fixed snippet of Erlang abstract
+forms. Strategy:
+
+- **Arithmetic/comparison** that already has integer fast paths in
+  the executor (the work from PR #223 et al.): inline a guard
+  clause for the integer-integer case, fall through to a helper
+  call (`Lua.VM.Numeric.add/2` etc.) for the slow path. This
+  preserves the exact semantics the interpreter delivers including
+  metamethod dispatch — the helper calls back through
+  `Executor.try_binary_metamethod/5`.
+- **`:test`**: compile to an Erlang `case` over `Value.truthy?/1`,
+  with the two branches inlined as instruction sequences. This is
+  why we need control flow first — `:test` is everywhere.
+- **`:call`**: dispatch to `Executor.call_function/3`. Args list is
+  materialized from the relevant register range; results unbox into
+  the right register slots. Pays the same call-protocol cost the
+  third spike measured.
+- **`:return` count = 1**: returns `{[elem(regs, base)], state}` —
+  the standard CPS-frame-pop shape, but since this is the entry
+  function not a continuation, it just returns to whoever called
+  `Executor.call_function/3`.
+
+### Dispatch wiring
+
+`Lua.VM.Executor.call_function/3` learns a new clause:
+
+```elixir
+def call_function({:compiled_closure, mod, fun, upvalues}, args, state) do
+  apply(mod, fun, [args, upvalues, state])
+end
+```
+
+The `:call` opcode dispatch learns the same shortcut: bypass
+register-tuple construction, materialize args list, call
+`apply(mod, fun, ...)`. This is the spike's `:compiled_closure`
+clause promoted to production. The spike already added these
+clauses to `lib/lua/vm/executor.ex` on this branch — verify they
+stay in place, are properly tested, and are no longer flagged as
+"spike-only" in comments.
+
+### Falling back
+
+`Lua.Compiler.compile/2` (the existing entry) is changed to:
+
+```elixir
+def compile(source, opts \\ []) do
+  proto = existing_compile_path(source, opts)
+  case Lua.Compiler.Erlang.compile(proto) do
+    {:ok, compiled} -> compiled
+    :fallback -> proto
+  end
+end
+```
+
+`Lua.Compiler.Erlang.compile/1` walks the instructions and returns
+`:fallback` on the first uncovered opcode. Sub-prototypes (nested
+function definitions) recurse; if any sub-prototype falls back, the
+parent does too (avoids the mixed-mode complexity of mixing call
+shapes between parent and child).
+
+### Where prototypes live after compile
+
+`%Prototype{}` gains a new optional field `compiled_module ::
+{atom(), atom()} | nil` — module name and function name. When set,
+all execution sites that currently see `{:lua_closure, proto,
+upvalues}` use `{:compiled_closure, mod, fun, upvalues}` instead.
+The conversion happens at closure-creation time
+(`:closure` opcode, `Lua.Compiler.compile_to_closure`, and the
+top-level entry in `Lua.VM.execute/2`).
+
+### Files
+
+- `lib/lua/compiler/erlang.ex` (new) — abstract-forms generator.
+  Public API: `compile/1`. Internal: per-opcode lowering helpers.
+- `lib/lua/compiler/erlang/opcodes.ex` (new) — pure functions mapping
+  each covered opcode to its Erlang form. Kept separate so opcode
+  tables are easy to extend in later plans.
+- `lib/lua/compiler/prototype.ex` — add `compiled_module` field.
+- `lib/lua/compiler.ex` — wire the codegen into the public compile
+  path. Fallback handling.
+- `lib/lua/vm/executor.ex` — add `:compiled_closure` clauses to
+  `call_function/3` and the `:call` opcode. Update closure-creation
+  sites to emit `:compiled_closure` when `proto.compiled_module` is
+  set.
+- `lib/lua/vm.ex` — update entry point to dispatch the top-level
+  prototype through the compiled module if present.
+- `test/lua/compiler/erlang_test.exs` (new) — fixed-input prototype
+  golden tests: every covered opcode in isolation, assert compiled
+  result == interpreted result.
+- `test/lua/compiler/erlang_fallback_test.exs` (new) — every
+  uncovered opcode triggers `:fallback`. Sub-prototype fallback
+  cascades to parent.
+
+### Error fidelity (placeholder, full fix in B5e)
+
+For this PR: runtime errors raised from compiled code carry the
+line at codegen time of the originating opcode (already in the
+`:source_line` opcodes). Source filename comes from the prototype.
+This is good enough for most tests; B5e adds full position
+threading via try/catch.
+
+If a test asserts a specific stack trace shape that the compiled
+path breaks, that test moves to an explicit `compiled: false` fixture
+override **only after** confirming the assertion is about the
+interpreter's stack trace specifically, not user-facing behaviour.
+Track any such overrides in `## Discoveries`.
+
+### Benchmarks
+
+The spike benchmarks `benchmarks/b5_spike*.exs` ship as part of
+this PR. They serve a dual purpose:
+
+1. **Regression tests for the dispatch shape.** They exercise the
+   `:compiled_closure` value type with hand-built modules,
+   independent of the codegen. If a later plan breaks the
+   dispatch protocol they fail loudly.
+2. **Comparison baseline for codegen output.** The faithful spike
+   represents what a hand-tuned compile would look like. The real
+   codegen running through `Lua.Compiler.Erlang` should be within
+   ~2x of the faithful spike on fib. Diverging from that means
+   the codegen has room to optimise.
+
+The spikes are kept as `benchmarks/b5_spike{,_faithful,_tables}.exs`
+rather than renamed, to make their origin explicit.
+
+## Verification
+
+```bash
+mix format
+mix compile --warnings-as-errors
+mix test
+mix test --only lua53
+
+# fib parity check (the main success criterion).
+LUA_BENCH_MODE=full mix run benchmarks/fibonacci.exs
+
+# Confirm other workloads don't regress.
+LUA_BENCH_MODE=full mix run benchmarks/closures.exs
+LUA_BENCH_MODE=full mix run benchmarks/oop.exs
+LUA_BENCH_MODE=full mix run benchmarks/table_ops.exs
+LUA_BENCH_MODE=full mix run benchmarks/string_ops.exs
+
+# Confirm fallback path: every uncovered opcode triggers fallback,
+# never a crash. (Tests cover this; this is the manual smoke.)
+mix run -e '
+{:ok, _, _} = Lua.eval(Lua.new(), "local t = {1,2,3}; return t[2]")
+IO.puts("table fallback OK")
+'
+```
+
+## Risks
+
+- **`compile:forms/2` is slow (hundreds of microseconds per
+  module).** For embedders that one-shot `Lua.eval!` of short
+  scripts, compilation could be net slower than interpretation.
+  Acceptable for this PR — B5b's content-addressable cache makes
+  repeated evals of the same source share a module. If the
+  one-shot cost is too high in real usage, B5b's cache can be
+  extended to memoise by source-hash rather than only prototype-
+  hash. Defer the call.
+- **The compiled module path differs subtly from the interpreter
+  on edge cases.** Float-to-integer coercion, NaN comparisons,
+  string-to-number coercion in arithmetic. Mitigation: opcode-by-
+  opcode golden tests in `erlang_test.exs` assert byte-for-byte
+  result equality with the interpreter on a battery of inputs
+  including the nasty corners (NaN, inf, -0.0, max_int + 1, "3" + 2).
+- **BEAM atom table pressure.** Every prototype this PR compiles
+  creates a unique module name. Run-once embedders that compile
+  unique source forever could exhaust the atom table. Concrete
+  ceiling: ~1M atoms in default BEAM config. This PR's leak is
+  bounded for the integration period because nobody runs production
+  for hours between B5a and B5b — but it's a real footgun if B5b
+  slips. Mitigation: if B5b takes longer than a week to ship, add
+  a hard cap here that disables further compilation past N modules.
+- **Module loading is not crash-safe across hot reload.** If `mix
+  test` recompiles `lib/` mid-run, compiled prototypes referencing
+  old function definitions raise. Mitigation: regenerate prototypes
+  at `Application.start/2` boot in the test env, and include the
+  application boot hash in the module name. Same approach the plan
+  parent (`B5`) calls for in Risks #3.
+- **Some interpreter tests will fail by assertion of internal state**
+  — e.g. tests that count instruction-list reductions, or compare
+  inspectability of a `:lua_closure`. Track these in Discoveries
+  and either update the assertion to be representation-agnostic or
+  add a fixture override. Should be a small number.
+
+## Discoveries
+
+(populated during implementation)
diff --git a/.agents/plans/B5b-module-lifecycle.md b/.agents/plans/B5b-module-lifecycle.md
new file mode 100644
index 0000000..d89b231
--- /dev/null
+++ b/.agents/plans/B5b-module-lifecycle.md
@@ -0,0 +1,236 @@
+---
+id: B5b
+title: Module lifecycle — content-addressable cache + ref-counted purging
+issue: null
+pr: null
+branch: perf/erlang-codegen-lifecycle
+base: main
+status: ready
+direction: B
+unlocks:
+  - B5c (tables) and later phases can ship without compounding the leak
+  - Production-safe deployment of the codegen path
+---
+
+## Blocked on
+
+- B5a — there's nothing to manage the lifecycle of until the codegen
+  is producing modules.
+
+## Goal
+
+Make B5a not leak. Every compiled prototype currently allocates a
+fresh `lua_proto_<unique_integer>` module that lives forever in the
+BEAM code server. After B5a merges this would saturate the atom
+table within hours of real use.
+
+This PR introduces `Lua.VM.CodeCache`, a content-addressable
+ref-counted registry. Identical prototypes (same instruction stream,
+same upvalue descriptors) share a module. When the last reference
+to a compiled prototype drops, the module is purged.
+
+## Why now
+
+B5a ships the codegen with leak-by-design as a known limitation.
+The leak is bounded for the integration period (no production
+deployment between B5a and B5b) but compounds rapidly the moment a
+real user hits the codegen. Every PR that adds opcodes (B5c, B5d)
+makes the leak worse because more prototypes are eligible for
+compilation. Fix it now, before the surface area grows.
+
+## Out of scope
+
+- Adding more opcodes (B5c, B5d).
+- Cross-prototype optimization or whole-program compilation.
+- Persistent compilation caches (on-disk). Memory cache only.
+- Changes to the codegen output. The cache wraps codegen calls;
+  it doesn't rewrite the modules themselves.
+
+## Success criteria
+
+- [ ] `Lua.VM.CodeCache` GenServer exists. Started under
+      `Lua.Application` supervision tree.
+- [ ] Module names become `lua_proto_<short_content_hash>`. Two
+      prototypes with byte-identical instruction streams + upvalue
+      descriptors share a module.
+- [ ] Per-module ref count tracks live closures referencing it.
+      Each `{:compiled_closure, mod, fun, upvalues}` value
+      increments on creation, decrements on collection.
+- [ ] When ref count reaches zero, the cache schedules
+      `:code.purge/1` + `:code.delete/1`. Scheduled, not immediate
+      — running code may still be executing the module on another
+      scheduler.
+- [ ] Hard cap on loaded modules (default 4096, configurable via
+      `Lua.Compiler.Erlang.cache_size/0`). LRU eviction when the
+      cap is hit.
+- [ ] Build hash in module names (`lua_proto_<build>_<content>`).
+      A code-server module loaded from a previous build is rejected
+      on lookup and recompiled. Prevents stale references across
+      `mix test` hot-reload.
+- [ ] Stress test: 10,000 unique prototypes compiled and dropped in
+      sequence. `:code.all_loaded() |> length()` stays within
+      cache_size + a small buffer for the duration.
+- [ ] Stress test: 10,000 *identical* prototypes compiled. Only one
+      module loaded.
+- [ ] `mix test` passes. No regression.
+- [ ] No measurable performance regression on
+      `mix run benchmarks/fibonacci.exs` — the cache hit path adds
+      one ETS lookup per call to `compile`, which should be
+      ~hundreds of nanoseconds.
+
+## Implementation notes
+
+### Architecture
+
+- `Lua.VM.CodeCache` is a GenServer holding an ETS table
+  (`:lua_code_cache`) plus an LRU access list.
+- ETS keyed by `{build_hash, content_hash}` → `{module_name,
+  function_name, ref_count, last_accessed}`.
+- `Lua.Compiler.Erlang.compile/1` consults the cache before
+  invoking `:compile.forms/2`. Cache hit returns the existing
+  module; miss compiles, loads, inserts, returns.
+- Ref-counting:
+  - Increment when a `{:compiled_closure, mod, fun, upvalues}`
+    value is created (closure construction, prototype top-level
+    compile).
+  - Decrement when… (see below — this is the hard part).
+
+### Ref-count decrement strategy
+
+Closures in this codebase are plain Elixir values. They get
+garbage-collected by the BEAM with no callback. So "decrement when
+collected" cannot be implemented with `:erlang.monitor`.
+
+Two viable approaches:
+
+1. **Periodic GC sweep.** Every N seconds, walk every live state's
+   tables, collect the set of referenced `(mod, fun)` pairs, mark
+   the cache. Anything not referenced for K sweeps is purged. This
+   is what Luerl's equivalent layer does.
+2. **Resource tracking via NIF resource.** Wrap the module
+   reference in a NIF-allocated resource whose destructor
+   decrements the count. Requires a NIF, which we currently don't
+   ship.
+
+Recommend (1) for this PR. Simpler, no NIF, doesn't bound when
+modules are purged (they linger until the next sweep) but that's
+acceptable for the cap-and-LRU policy.
+
+Sweep cadence: every 30 seconds. Configurable.
+
+LRU eviction provides a hard upper bound regardless of sweep
+correctness — if the cap is hit, the least-recently-accessed
+module is purged immediately, ref-count be damned. This prevents
+unbounded growth if the sweep logic has a bug.
+
+### Build hash
+
+`@build_hash` is computed at compile time from the app's
+`:application.get_key(:lua, :vsn)` plus a hash of the codegen
+module's source. Embedded in module names. On lookup, if the
+module's name doesn't match the current build hash, treat as a
+miss and recompile. The stale module is purged by the LRU as it
+ages out.
+
+This handles two cases:
+
+- Production: a host application doing a rolling deploy may keep
+  old compiled modules in memory referenced by older state values
+  that survived the upgrade. The new compiled prototypes use new
+  module names; the old ones age out.
+- Dev: `mix test` recompiles `lib/`. Compiled prototypes from a
+  previous test run reference old internal helpers; reject them
+  and recompile.
+
+### Content hash
+
+`:erlang.phash2/2` over `{instructions, upvalue_descriptors,
+param_count, is_vararg}`. Truncated to 12 hex chars. Collision
+probability is negligible at the scales we care about, but we
+verify by storing the full pre-hash key alongside the hash in ETS
+and asserting equality on lookup.
+
+### Files
+
+- `lib/lua/vm/code_cache.ex` (new) — the GenServer + ETS interface.
+- `lib/lua/application.ex` — supervise the new GenServer.
+- `lib/lua/compiler/erlang.ex` (modified) — replace the
+  unique-integer module naming with `CodeCache.module_for/1`.
+- `test/lua/vm/code_cache_test.exs` (new) — the unit tests +
+  stress tests listed in Success criteria.
+
+### Edge cases
+
+- **Module name collisions with non-Lua code.** Mitigation: the
+  `lua_proto_` prefix is reserved. Document in `Lua.VM.CodeCache`'s
+  moduledoc.
+- **GenServer crash.** If the cache GenServer dies (shouldn't, but
+  defense in depth), the supervisor restarts it with an empty ETS
+  table. Every prototype recompiles. Performance penalty, not a
+  correctness failure.
+- **Cache poisoned by a compile error.** If `:compile.forms/2`
+  raises mid-load, the ETS entry must roll back. Use a
+  `try`-`rescue` in `CodeCache.handle_call`.
+
+## Verification
+
+```bash
+mix format
+mix compile --warnings-as-errors
+mix test
+mix test test/lua/vm/code_cache_test.exs
+
+# Stress test: 10k unique prototypes
+mix run -e '
+for i <- 1..10_000 do
+  src = "function f_#{i}(n) return n + #{i} end f_#{i}(42)"
+  {_, _} = Lua.eval!(Lua.new(), src)
+end
+:erlang.garbage_collect()
+Process.sleep(35_000)
+count = :code.all_loaded() |> Enum.count(fn {m, _} ->
+  to_string(m) |> String.starts_with?("lua_proto_")
+end)
+IO.puts("loaded after sweep: #{count}")
+# Should be ≤ cache_size (default 4096).
+'
+
+# Stress test: 10k *identical* prototypes
+mix run -e '
+src = "function f(n) return n + 1 end f(42)"
+for _ <- 1..10_000, do: Lua.eval!(Lua.new(), src)
+count = :code.all_loaded() |> Enum.count(fn {m, _} ->
+  to_string(m) |> String.starts_with?("lua_proto_")
+end)
+IO.puts("identical compiles → loaded count: #{count}")
+# Should be 1.
+'
+```
+
+## Risks
+
+- **Sweep cadence vs allocation rate.** If a host app compiles
+  faster than the sweep can clean up, the LRU evicts. If the LRU
+  evicts a module that's still in use by a long-running state,
+  next call into that closure raises (module not found).
+  Mitigation: defer LRU eviction of modules with ref_count > 0
+  until they age past a hard limit (10x cache_size, say).
+  Compromise: under extreme pressure, the cache exceeds the soft
+  cap; only when ref counts drop does it shrink. Acceptable
+  trade-off — we'd rather use 2x memory than crash.
+- **The sweep is O(states × refs).** For a deployment with tens of
+  thousands of live Lua states this could be measurable. Profile
+  during this PR; if it shows up, partition the sweep across
+  cycles or push the work into a dedicated scheduler.
+- **`:code.purge/1` blocks if any process is currently executing
+  the module on another scheduler.** Use `:code.soft_purge/1`
+  first; if that fails, defer to next sweep rather than blocking.
+  Document the policy.
+- **NIF resource alternative might be necessary post-launch.** If
+  the sweep approach proves too imprecise (modules sticking around
+  too long, memory pressure), the NIF-resource approach can be a
+  later plan. Don't pre-commit to it now.
+
+## Discoveries
+
+(populated during implementation)
diff --git a/.agents/plans/B5c-table-opcodes.md b/.agents/plans/B5c-table-opcodes.md
new file mode 100644
index 0000000..02c986e
--- /dev/null
+++ b/.agents/plans/B5c-table-opcodes.md
@@ -0,0 +1,216 @@
+---
+id: B5c
+title: Compile table opcodes — make table-heavy workloads bypass the interpreter
+issue: null
+pr: null
+branch: perf/erlang-codegen-tables
+base: main
+status: ready
+direction: B
+unlocks:
+  - ~2x speedup on table_ops benchmarks
+  - the full OOP benchmark workload (depends on tables + closures)
+---
+
+## Blocked on
+
+- B5a (foundation)
+- B5b (lifecycle) — required before adding more opcodes to the
+  codegen, otherwise the cache pressure scales with surface area.
+
+## Goal
+
+Extend `Lua.Compiler.Erlang` to lower the table opcode family:
+`:new_table`, `:get_table`, `:set_table`, `:set_list`, `:get_field`
+(full path, not just env lookup), `:set_field`. After this PR,
+prototypes that touch tables compile end-to-end and stay out of
+the interpreter fallback path.
+
+The third spike measured **2.1x faster than interpreter** on
+run_table_sum(1000). This PR delivers that.
+
+## Why now
+
+Once tables compile, the OOP benchmark and most real-world Lua
+code stops falling back to the interpreter. The win is smaller per
+opcode than fib's (3.8x vs 12.4x at faithful), but it removes a
+large class of fallback cases — the dominant blocker after B5a.
+
+## Out of scope
+
+- Closures (`:closure`, upvalue mutation). B5d.
+- Error position fidelity. B5e.
+- Optimising table data shape (this is a B-series follow-up that
+  was deferred: B6/B7). B5 saves dispatch around the table
+  mutation, not the mutation itself.
+
+## Success criteria
+
+- [ ] Opcodes added to the codegen: `:new_table`, `:get_table`,
+      `:set_table`, `:set_list`, `:get_field` (full path),
+      `:set_field`.
+- [ ] `mix test` passes; no regression in unit, suite, or property
+      tests.
+- [ ] `LUA_BENCH_MODE=full mix run benchmarks/table_ops.exs`:
+      `lua (chunk)` beats Luerl by ≥1.5x on `Table Iterate/Sum`
+      and `Table Map + Reduce` at n=500 and n=1000. Stretch: ≥2x.
+- [ ] `mix run benchmarks/oop.exs`: no regression now that more
+      of the OOP path is compiled. Stretch: measurable improvement
+      once `:closure` lands in B5d.
+- [ ] No regression on numeric benchmarks (fibonacci, etc.) — the
+      shared codegen pieces don't slow down what B5a already won.
+
+## Implementation notes
+
+### Lowering each opcode
+
+The interpreter's table opcodes already have fast paths (PR #223
+and follow-ups). The compiled lowering mirrors them inline rather
+than calling back into the interpreter helpers, **except** when the
+slow path is hit (metamethod dispatch, type errors). The slow
+paths delegate to `Lua.VM.Executor` helpers that already exist.
+
+#### `:new_table`
+
+```erlang
+{Tref0, State0} = 'Elixir.Lua.VM.State':alloc_table(State_in),
+R_dest = Tref0,
+State_out = State0
+```
+
+State threads through.
+
+#### `:get_table`
+
+Two cases. Integer or binary key on a `{:tref, _}`: inline the
+fast path from `executor.ex:1300-1323`:
+
+```erlang
+TableVal = R_table,
+Key = R_key,
+case TableVal of
+    {tref, Id} when is_integer(Key); is_binary(Key) ->
+        Table = erlang:map_get(Id, maps:get(tables, State_in)),
+        case erlang:map_get(data, Table) of
+            #{Key := Value} ->
+                R_dest = Value,
+                State_out = State_in;
+            _ ->
+                case erlang:map_get(metatable, Table) of
+                    nil ->
+                        R_dest = nil,
+                        State_out = State_in;
+                    _ ->
+                        {Value, State1} = 'Elixir.Lua.VM.Executor':index_value(
+                            TableVal, Key, State_in, Line, Source, NameHint),
+                        R_dest = Value,
+                        State_out = State1
+                end
+        end;
+    _ ->
+        {Value, State1} = 'Elixir.Lua.VM.Executor':index_value(
+            TableVal, Key, State_in, Line, Source, NameHint),
+        R_dest = Value,
+        State_out = State1
+end
+```
+
+`index_value/6` needs to be promoted from `defp` to `def` in the
+executor so the compiled module can call it. Add `@doc false` to
+keep it out of the public API surface.
+
+#### `:set_table`
+
+```erlang
+case R_table of
+    {tref, _} ->
+        State_out = 'Elixir.Lua.VM.Executor':table_newindex(
+            R_table, R_key, R_value, State_in);
+    _ ->
+        'Elixir.Lua.VM.Executor':raise_index_type_error(
+            R_table, Line, Source, NameHint)
+end
+```
+
+`table_newindex/4` is already `def` (executor.ex:1919).
+`raise_index_type_error/4` needs promoting.
+
+#### `:set_list`
+
+Iterates over a register range and calls `table_newindex` per
+entry. Compile as a recursive helper (same pattern as
+`:numeric_for` from B5a).
+
+#### `:get_field`, `:set_field`
+
+B5a already covers `:get_field` for env lookups. Generalise: the
+fast path uses the table's `:data` map with the literal binary
+key. Falls through to `index_value` / `table_newindex` for
+metatable cases.
+
+### Promoting helpers
+
+The executor's table helpers that the compiled code calls into:
+
+- `Lua.VM.Executor.table_newindex/4` — already `def`.
+- `Lua.VM.Executor.index_value/6` — currently `defp`. Promote to
+  `def` with `@doc false`.
+- `Lua.VM.Executor.raise_index_type_error/4` — currently `defp`.
+  Promote.
+
+The `@doc false` keeps these from showing up in the user-facing
+documentation but lets the compiled module call them by their
+fully-qualified `'Elixir.Lua.VM.Executor':function(...)` form.
+
+### Files
+
+- `lib/lua/compiler/erlang/opcodes.ex` — add lowering clauses for
+  the table family.
+- `lib/lua/compiler/erlang.ex` — remove table opcodes from the
+  fallback set; allow them in the codegen.
+- `lib/lua/vm/executor.ex` — promote `index_value/6` and
+  `raise_index_type_error/4` to public.
+- `test/lua/compiler/erlang_test.exs` — golden tests for each table
+  opcode (compiled vs interpreted result equality on a battery of
+  inputs including metatable cases).
+
+## Verification
+
+```bash
+mix format
+mix compile --warnings-as-errors
+mix test
+mix test --only lua53
+
+LUA_BENCH_MODE=full mix run benchmarks/table_ops.exs
+LUA_BENCH_MODE=full mix run benchmarks/oop.exs
+LUA_BENCH_MODE=full mix run benchmarks/fibonacci.exs    # no regression
+```
+
+## Risks
+
+- **Metatable semantics are subtle.** `__index` and `__newindex`
+  can recurse through long chains. The compiled fast path skips
+  metatable dispatch only when `metatable == nil` on the table.
+  Any non-nil metatable falls through to the existing
+  `index_value` / `table_newindex` helpers, which already handle
+  the chains. Risk is limited to "is the fast-path predicate
+  right" — covered by golden tests.
+- **`set_list` codegen is the most complex per-opcode lowering.**
+  It needs to compile a register-range loop into a recursive
+  helper that's careful about register aliasing. Test with both
+  short ranges (typical: `{1, 2, 3}` table constructor) and long
+  ranges.
+- **Promoting `defp` to `def` widens the executor's public API.**
+  `@doc false` mitigates discoverability. The executor's
+  `@moduledoc` should mention that these are runtime helpers used
+  by compiled modules and should not be called directly by user
+  code.
+- **The third spike's 2.1x was measured at faithful, not real
+  codegen.** Real codegen has overheads the spike skipped (full
+  opcode coverage means more dispatch within the compiled
+  function). The success-criteria floor (≥1.5x) accommodates this.
+
+## Discoveries
+
+(populated during implementation)
diff --git a/.agents/plans/B5d-closures-and-varargs.md b/.agents/plans/B5d-closures-and-varargs.md
new file mode 100644
index 0000000..0fba57f
--- /dev/null
+++ b/.agents/plans/B5d-closures-and-varargs.md
@@ -0,0 +1,197 @@
+---
+id: B5d
+title: Compile closures, varargs, and multi-return — every opcode has a compiled path
+issue: null
+pr: null
+branch: perf/erlang-codegen-closures
+base: main
+status: ready
+direction: B
+unlocks:
+  - 100% opcode coverage in the codegen (no more fallbacks except for diagnostics)
+  - the closures benchmark workload
+  - the OOP benchmark workload now fully compiled
+---
+
+## Blocked on
+
+- B5a (foundation), B5b (lifecycle), B5c (tables).
+
+## Goal
+
+Cover the remaining opcodes. After this PR, no prototype falls
+back to the interpreter for opcode-coverage reasons. Every opcode
+in the codegen.
+
+Opcodes added:
+
+- `:closure` — closure construction with upvalue capture.
+- `:set_upvalue` — mutate a captured upvalue cell.
+- `:get_open_upvalue`, `:set_open_upvalue` — open-cell access for
+  upvalues that still reference live caller registers.
+- `:vararg`, `:return_vararg` — varargs.
+- `:return` with count > 1 — multi-return.
+- `:generic_for` — the `for k, v in pairs(t)` family.
+
+## Why now
+
+After B5c, table-heavy code compiles. After this PR, closure-heavy
+code does too — which is the dominant remaining real-world Lua
+idiom. From here on, additional B5 work is about polish (error
+fidelity, B5e) and the wider B-series mutable-data follow-up that
+B5 itself does not address.
+
+## Out of scope
+
+- Mixed-mode interpret-from-pc (still all-or-nothing per prototype).
+- Cross-prototype optimisation (inlining one Lua function into
+  another).
+- Error position fidelity. B5e.
+
+## Success criteria
+
+- [ ] Opcodes added: `:closure`, `:set_upvalue`,
+      `:get_open_upvalue`, `:set_open_upvalue`, `:vararg`,
+      `:return_vararg`, `:return` (count > 1), `:generic_for`.
+- [ ] After this PR, the codegen's `:fallback` cases are only:
+      genuinely unrecognised opcode shapes (programmer error) or
+      explicit opt-outs added by future plans. No production-Lua
+      opcode falls back.
+- [ ] `mix test` passes; no regression.
+- [ ] `mix test --only lua53` does not regress.
+- [ ] `LUA_BENCH_MODE=full mix run benchmarks/closures.exs`: lua
+      (chunk) beats Luerl by ≥2x.
+- [ ] `LUA_BENCH_MODE=full mix run benchmarks/oop.exs`: lua (chunk)
+      beats Luerl by ≥1.5x. (OOP is a mix of closures + tables;
+      both contribute.)
+- [ ] No regression on numeric or table workloads.
+
+## Implementation notes
+
+### Closure construction (`:closure`)
+
+`:closure` creates a `{:lua_closure, sub_proto, captured_upvalues}`
+value in the interpreter. The compiled version creates either:
+
+- `{:compiled_closure, mod, fun, captured_upvalues}` if the
+  sub-prototype itself compiled.
+- `{:lua_closure, sub_proto, captured_upvalues}` if the
+  sub-prototype fell back to interpretation.
+
+The codegen checks `sub_proto.compiled_module` at codegen time.
+This works because sub-prototypes are compiled in a separate
+codegen pass (bottom-up) before the parent.
+
+Upvalue capture: the parent prototype's `:closure` opcode
+specifies which upvalue descriptors to populate from which parent
+registers / parent upvalues. In the compiled module this becomes
+a fresh upvalue tuple constructed inline. Open cells get a fresh
+reference (`make_ref/0`) and state.open_upvalues entry; closed
+cells inherit from the parent upvalues tuple.
+
+### Upvalue mutation (`:set_upvalue`)
+
+Mirrors the interpreter (`executor.ex:362-367`):
+
+```erlang
+CellRef = element(Index + 1, Upvalues),
+Value = R_source,
+NewUpvalueCells = maps:put(CellRef, Value,
+    maps:get(upvalue_cells, State_in)),
+State_out = setelement(StateUpvalueCellsIdx, State_in, NewUpvalueCells)
+```
+
+Updating a struct field at runtime via `setelement` works because
+the State struct's field positions are stable.
+`StateUpvalueCellsIdx` is determined at codegen time from
+`%State{}`'s field order.
+
+### Open upvalues (`:get_open_upvalue`, `:set_open_upvalue`)
+
+These read/write a cell ref but resolve to either a register (if
+the cell is still open) or `state.upvalue_cells` (if closed). The
+compiled version mirrors `executor.ex:367-401` directly, including
+the open-cell fast path that avoids touching state for the common
+case.
+
+### `:vararg`, `:return_vararg`
+
+Vararg storage is on `proto.varargs`. In the compiled function,
+this is just a closure-time-captured argument list. The codegen
+adds an extra parameter to the compiled function (or threads
+varargs through state, depending on what's cleaner; the
+interpreter currently uses `proto.varargs`, which works because
+proto is a runtime value).
+
+### Multi-return `:return` (count > 1)
+
+B5a covered count = 1. For count > 1, the compiled function
+returns `{Values, State}` where Values is a list of length `count`
+constructed from the register range. `continue_after_call/11`
+unpacks the list into the caller's registers.
+
+For the `{:multi, _}` count form (caller wants all available
+returns), the compiled function returns `{Values, State}` with
+exactly the multi-return values; the caller's `:call` opcode
+handles slot expansion.
+
+### Generic for (`:generic_for`)
+
+Like `:numeric_for` (B5a) but the loop helper calls the iterator
+function on every iteration via `Executor.call_function/3` rather
+than incrementing a counter. The CPS frame logic from the
+interpreter (executor.ex:518-547) translates cleanly to a tail-
+recursive Erlang helper.
+
+### Files
+
+- `lib/lua/compiler/erlang/opcodes.ex` — lowering for every
+  remaining opcode.
+- `lib/lua/compiler/erlang.ex` — remove these from the fallback
+  set.
+- `test/lua/compiler/erlang_test.exs` — golden tests per opcode.
+- `test/lua/compiler/erlang_closures_test.exs` (new) — focused
+  tests on closure construction + upvalue lifecycle, since these
+  are the trickiest to get right.
+
+## Verification
+
+```bash
+mix format
+mix compile --warnings-as-errors
+mix test
+mix test --only lua53
+
+LUA_BENCH_MODE=full mix run benchmarks/closures.exs
+LUA_BENCH_MODE=full mix run benchmarks/oop.exs
+LUA_BENCH_MODE=full mix run benchmarks/fibonacci.exs   # no regression
+LUA_BENCH_MODE=full mix run benchmarks/table_ops.exs   # no regression
+```
+
+## Risks
+
+- **Open upvalue lifetime is the trickiest concept in the VM.**
+  Cells move from "open" (still referencing a live register) to
+  "closed" (value snapshotted into `state.upvalue_cells`) when
+  the owning frame returns. The compiled version must replicate
+  this transition. The existing `close_open_upvalues_at_or_above/2`
+  helper handles it for the interpreter; the compiled `:return`
+  opcode needs to call it (promote to `def` if currently `defp`).
+- **`:closure` with a fall-through sub-prototype.** A parent
+  prototype that compiled but contains an uncompiled sub-prototype
+  produces a `:lua_closure` value for the inner function. Mixed-
+  mode-in-the-value-graph is fine; mixed-mode-within-a-prototype
+  is what we ruled out.
+- **Stress test: upvalue chains.** A closure capturing a closure
+  capturing a closure tests the upvalue-descriptor walking
+  exhaustively. Existing tests in
+  `test/lua/compiler/integration_test.exs` cover this; rerun
+  against compiled mode.
+- **Multi-return with `{:multi, fixed_count}`.** Codegen has to
+  match the exact slot-counting the interpreter does for
+  expressions like `return f(), g()` where g returns N values.
+  Test against the existing multi-return tests.
+
+## Discoveries
+
+(populated during implementation)
diff --git a/.agents/plans/B5e-error-position-fidelity.md b/.agents/plans/B5e-error-position-fidelity.md
new file mode 100644
index 0000000..e199036
--- /dev/null
+++ b/.agents/plans/B5e-error-position-fidelity.md
@@ -0,0 +1,171 @@
+---
+id: B5e
+title: Error position fidelity for compiled prototypes
+issue: null
+pr: null
+branch: perf/erlang-codegen-errors
+base: main
+status: ready
+direction: B
+unlocks:
+  - parity with interpreter on every error message test
+  - removes the only remaining semantic gap between compiled and
+    interpreted execution
+---
+
+## Blocked on
+
+- B5a (foundation), B5b (lifecycle), B5c (tables), B5d (closures).
+  Error fidelity is the last piece — easier to do once every
+  opcode has a compiled lowering.
+
+## Goal
+
+Make compiled prototypes raise exceptions with the same `line:`,
+`source:`, and stack-trace information the interpreter raises with.
+After this PR, no error-message test can distinguish a compiled
+prototype from an interpreted one.
+
+## Why now
+
+Earlier B5 plans (B5a through B5d) ship a placeholder: compiled
+prototypes raise errors carrying the line of the last `:source_line`
+opcode they passed through. This is approximately right but misses
+detail: a raise from inside a metamethod calls back through the
+interpreter, which already threads the right position via the
+process dictionary — but pure-compiled raises don't. Tests that
+assert specific line numbers in raises may pin the compiled path to
+a slightly different line than the interpreter.
+
+This PR uses the parent plan's recommended try/catch approach (B5
+plan line 191-198): pay nothing on the success path, restore line
+info on the failure path from a pc-to-line table that lives on the
+prototype.
+
+## Out of scope
+
+- Improving the interpreter's error positions. Already done in A18
+  and A19.
+- Adding error-context tracking that the interpreter doesn't have.
+  This is fidelity, not enhancement.
+
+## Success criteria
+
+- [ ] `Lua.Compiler.Prototype` gains a `pc_to_line` field (or
+      similar) mapping the compiled function's internal label
+      structure back to source lines. Populated at codegen time.
+- [ ] Every codegen lowering wraps potentially-raising operations
+      (arithmetic on non-numeric, index into non-table, call of
+      non-callable, etc.) with a try/catch that, on raise,
+      re-throws with corrected `line:` / `source:` info from the
+      pc_to_line table.
+- [ ] Every error-message test in
+      `test/lua/error_message_test.exs` and similar passes against
+      a compiled prototype with the same line numbers as the
+      interpreter produces.
+- [ ] No measurable performance regression — the try/catch costs
+      nothing on the success path.
+- [ ] `mix test` passes; no regression.
+- [ ] `mix test --only lua53` does not regress (suite has many
+      error-position tests).
+
+## Implementation notes
+
+### Strategy
+
+For each potentially-raising opcode, wrap the call site:
+
+```erlang
+try
+    %% opcode lowering as usual
+catch
+    error:Reason:Stack ->
+        Line = maps:get(PcOrLabel, PcToLine),
+        Source = proto:source(),
+        erlang:raise(error, augment_reason(Reason, Line, Source), Stack)
+end
+```
+
+`augment_reason/3` updates the exception struct's `line:` and
+`source:` fields. For raises that already include line info
+(e.g. those that came from `Lua.VM.Executor.index_value/6`), this
+is a no-op. For raises from purely-compiled code (e.g. an `:add`
+on two non-numeric registers), this is where the position is
+attached.
+
+The try/catch lives **per loop body**, not per opcode. Erlang's
+JIT optimises try/catch well at function-scope granularity but
+penalises tight per-statement nesting. One try around each
+recursive helper body, one try around the main function body.
+
+### pc_to_line table
+
+A map from "codegen-time label" to source line. Built during
+codegen as it walks the instruction stream. Stored on
+`%Prototype{}` as `pc_to_line :: %{atom() => non_neg_integer()}`.
+
+Each `:source_line` opcode in the instruction stream becomes the
+authoritative line for every subsequent opcode until the next
+`:source_line`. The codegen tracks this.
+
+### Stack trace shape
+
+Compiled modules show up in stack traces as
+`:lua_proto_<hash>.execute/3`. This is noise from a user's
+perspective. `Lua.RuntimeException`'s stack pruning
+(`lib/lua/runtime_exception.ex:prune_internal_frames/1` —
+introduced in A20/A21) already trims known internal frames. Extend
+the prune list to include any module starting with
+`lua_proto_<build_hash>_`. Frames stay informative (the calling
+`Lua.eval!/2` is still visible) without exposing compilation
+internals.
+
+### Files
+
+- `lib/lua/compiler/prototype.ex` — add `pc_to_line` field.
+- `lib/lua/compiler/erlang.ex` — emit try/catch wrappers around
+  loop bodies; populate `pc_to_line` during codegen walk.
+- `lib/lua/compiler/erlang/errors.ex` (new) — `augment_reason/3`
+  and friends. Pure functions, no state.
+- `lib/lua/runtime_exception.ex` — extend prune list.
+- `test/lua/compiler/erlang_errors_test.exs` (new) — golden tests
+  asserting that compiled raises produce identical line/source to
+  interpreted raises.
+
+## Verification
+
+```bash
+mix format
+mix compile --warnings-as-errors
+mix test
+mix test --only lua53
+mix test test/lua/error_message_test.exs
+
+# Confirm zero perf cost on success path.
+LUA_BENCH_MODE=full mix run benchmarks/fibonacci.exs   # no regression
+LUA_BENCH_MODE=full mix run benchmarks/table_ops.exs   # no regression
+```
+
+## Risks
+
+- **try/catch granularity.** Per-statement try/catch tanks
+  performance. Per-function is fine. There's a middle ground (per
+  loop body) that may be necessary if function-scope try/catch
+  proves too coarse for correct attribution. Profile during
+  implementation; adjust.
+- **Stack-trace pruning could hide useful info.** If the prune
+  list accidentally trims user code, debugging gets harder. Test
+  with a stack trace that contains user code + compiled code +
+  stdlib; assert user code is preserved.
+- **Hot-reload may produce stale stack-trace prune patterns.**
+  Build-hash already in module names from B5b; this stays
+  consistent across reloads as long as B5b's build-hash logic is
+  correct.
+- **Some Lua 5.3 suite tests assert specific error messages
+  including line numbers.** These should all match the interpreter
+  after this PR. If they don't, it means the codegen has a subtle
+  line-tracking bug; fix the bug, don't change the test.
+
+## Discoveries
+
+(populated during implementation)
diff --git a/benchmarks/b5_spike.exs b/benchmarks/b5_spike.exs
new file mode 100644
index 0000000..fb74b71
--- /dev/null
+++ b/benchmarks/b5_spike.exs
@@ -0,0 +1,126 @@
+## B5 spike — does compiling fib to a BEAM module beat interpreting it?
+##
+## Compares, on identical fib(N) work:
+##
+##   1. lua (chunk)         — current interpreter (baseline)
+##   2. native elixir       — hand-written Elixir; BEAMASM ceiling, no Lua
+##      semantics overhead. Establishes the upper bound for what
+##      BEAM-side optimisation can possibly buy.
+##   3. compiled erlang     — Erlang module generated at runtime via
+##      :compile.forms/2, called from the VM. This is the realistic
+##      proxy for what B5's codegen could plausibly emit, modulo Lua
+##      semantics that the spike strips out.
+##   4. luerl               — Erlang-based Lua 5.3 (reference for the
+##      Direction B "perf parity with Luerl ±10%" target).
+##   5. C Lua via luaport   — out-of-process; included for context.
+##
+## The point is to bound the win. If (3) is close to (2) we know the
+## BEAM JIT path delivers most of its theoretical headroom and B5 is
+## worth its multi-month build. If (3) is closer to (1) than to (2),
+## the BEAM doesn't actually optimise this kind of generated code
+## meaningfully, and the strategic story changes.
+
+Code.require_file("helpers.exs", __DIR__)
+
+Application.ensure_all_started(:luerl)
+
+n = String.to_integer(System.get_env("FIB_N") || "25")
+
+fib_def = """
+function fib(n)
+  if n < 2 then return n end
+  return fib(n-1) + fib(n-2)
+end
+"""
+
+call_fib = "return fib(#{n})"
+
+# --- 1. Interpreter ---
+lua = Lua.new()
+{_, lua} = Lua.eval!(lua, fib_def)
+{fib_chunk, _} = Lua.load_chunk!(lua, call_fib)
+
+# --- 2. Native Elixir (BEAMASM ceiling) ---
+defmodule SpikeFib do
+  def fib(n) when n < 2, do: n
+  def fib(n), do: fib(n - 1) + fib(n - 2)
+end
+
+# --- 3. Compiled Erlang via compile:forms/2 ---
+# We hand-write the abstract forms for:
+#
+#   -module(spike_fib_compiled).
+#   -export([fib/1]).
+#   fib(N) when N < 2 -> N;
+#   fib(N) -> fib(N-1) + fib(N-2).
+#
+# This is structurally what B5's codegen would produce for the fib
+# prototype if it stripped Lua tagging (no register tuple, no upvalue
+# lookup, no get_field on _ENV). The interesting question is whether
+# the BEAM treats this as well as it treats the same code written
+# directly in Elixir.
+forms = [
+  {:attribute, 1, :module, :spike_fib_compiled},
+  {:attribute, 2, :export, [{:fib, 1}]},
+  {:function, 3, :fib, 1,
+   [
+     {:clause, 3, [{:var, 3, :N}], [[{:op, 3, :<, {:var, 3, :N}, {:integer, 3, 2}}]],
+      [{:var, 3, :N}]},
+     {:clause, 4, [{:var, 4, :N}], [],
+      [
+        {:op, 4, :+,
+         {:call, 4, {:atom, 4, :fib}, [{:op, 4, :-, {:var, 4, :N}, {:integer, 4, 1}}]},
+         {:call, 4, {:atom, 4, :fib}, [{:op, 4, :-, {:var, 4, :N}, {:integer, 4, 2}}]}}
+      ]}
+   ]}
+]
+
+{:ok, mod_name, bin, _warnings} = :compile.forms(forms, [:return])
+{:module, ^mod_name} = :code.load_binary(mod_name, ~c"spike_fib_compiled.beam", bin)
+
+# Sanity: all three give the same answer.
+expected = SpikeFib.fib(n)
+{[interp_result], _} = Lua.eval!(lua, call_fib)
+^expected = round(interp_result)
+^expected = :spike_fib_compiled.fib(n)
+IO.puts("All implementations agree: fib(#{n}) = #{expected}\n")
+
+# --- 4. Luerl ---
+luerl_state = :luerl.init()
+{:ok, _, luerl_state} = :luerl.do(fib_def, luerl_state)
+
+# --- 5. C Lua via luaport (optional) ---
+{c_lua_benchmarks, c_lua_cleanup} =
+  case Application.ensure_all_started(:luaport) do
+    {:ok, _} ->
+      scripts_dir = Path.join(__DIR__, "scripts")
+      {:ok, port_pid, _} = :luaport.spawn(:b5_spike_bench, to_charlist(scripts_dir))
+      :luaport.load(port_pid, fib_def)
+
+      benchmarks = %{
+        "C Lua (luaport)" => fn -> :luaport.call(port_pid, :fib, [n]) end
+      }
+
+      {benchmarks, fn -> :luaport.despawn(:b5_spike_bench) end}
+
+    {:error, reason} ->
+      IO.puts("luaport not available (#{inspect(reason)}) — skipping C Lua benchmarks")
+      {%{}, fn -> :ok end}
+  end
+
+Bench.banner("b5 spike: fib(#{n})")
+
+Benchee.run(
+  Map.merge(
+    %{
+      "lua (chunk)" => fn -> Lua.eval!(lua, fib_chunk) end,
+      "native elixir" => fn -> SpikeFib.fib(n) end,
+      "compiled erlang" => fn -> :spike_fib_compiled.fib(n) end,
+      "luerl" => fn -> :luerl.do(call_fib, luerl_state) end
+    },
+    c_lua_benchmarks
+  ),
+  Bench.opts()
+)
+
+c_lua_cleanup.()
diff --git a/benchmarks/b5_spike_faithful.exs b/benchmarks/b5_spike_faithful.exs
new file mode 100644
index 0000000..6ed8060
--- /dev/null
+++ b/benchmarks/b5_spike_faithful.exs
@@ -0,0 +1,279 @@
+## B5 spike — *faithful* translation
+##
+## Companion to benchmarks/b5_spike.exs. That first spike answered "is
+## there headroom?" with a stripped-down fib that called itself directly
+## as `:spike_fib_compiled.fib/1`. This one answers the follow-up:
+## **how much of that headroom survives once we add back the Lua-VM
+## machinery a real B5 codegen could not skip?**
+##
+## What "faithful" means here. The compiled fib module:
+##
+##   1. Receives `(args :: [number()], upvalues :: tuple(), state)` and
+##      returns `{results :: [number()], state}` — the same shape as a
+##      :lua_closure interpreted call.
+##   2. Performs the recursive call via the actual VM dispatch path:
+##      look up `_ENV` through the upvalue cell, fetch `_ENV.fib` from
+##      the globals table's `:data` map, then call
+##      `Lua.VM.Executor.call_function/3` with the resolved callable.
+##      That callable is `{:compiled_closure, ...}` (itself), so it
+##      re-enters the same path the `:call` opcode uses on Lua closures.
+##   3. Threads `state` through both recursive calls — the same
+##      mutable-state ABI Luerl and our interpreter use.
+##   4. Returns a result list `[value]`, not a bare number — matching
+##      the call protocol used by `continue_after_call/11`.
+##
+## What it does *not* model (in scope for B5 proper, out of scope for
+## the spike):
+##
+##   - Integer overflow narrowing (`Numeric.narrow_if_integer/1`).
+##   - Metamethod fallbacks for `<` and `+`.
+##   - Line/source threading for runtime errors.
+##   - Open-upvalue close on return.
+##
+## A real B5 codegen would either inline guards for the common integer
+## path (avoiding the fallback cost) or emit conditional dispatch. The
+## fib hot path uses the integer fast path on every iteration, so
+## omitting these costs reflects the *intended* B5 fast path, not a
+## cheat.
+
+Code.require_file("helpers.exs", __DIR__)
+
+Application.ensure_all_started(:luerl)
+
+n = String.to_integer(System.get_env("FIB_N") || "25")
+
+fib_def = """
+function fib(n)
+  if n < 2 then return n end
+  return fib(n-1) + fib(n-2)
+end
+"""
+
+call_fib = "return fib(#{n})"
+
+# --- Interpreter baseline ---
+lua = Lua.new()
+{_, lua} = Lua.eval!(lua, fib_def)
+{fib_chunk, _} = Lua.load_chunk!(lua, call_fib)
+
+# --- Native Elixir (BEAMASM ceiling, no Lua semantics) ---
+defmodule SpikeFib do
+  def fib(n) when n < 2, do: n
+  def fib(n), do: fib(n - 1) + fib(n - 2)
+end
+
+# --- Stripped compiled erlang (from the first spike, for reference) ---
+stripped_forms = [
+  {:attribute, 1, :module, :spike_fib_stripped},
+  {:attribute, 2, :export, [{:fib, 1}]},
+  {:function, 3, :fib, 1,
+   [
+     {:clause, 3, [{:var, 3, :N}], [[{:op, 3, :<, {:var, 3, :N}, {:integer, 3, 2}}]],
+      [{:var, 3, :N}]},
+     {:clause, 4, [{:var, 4, :N}], [],
+      [
+        {:op, 4, :+,
+         {:call, 4, {:atom, 4, :fib}, [{:op, 4, :-, {:var, 4, :N}, {:integer, 4, 1}}]},
+         {:call, 4, {:atom, 4, :fib}, [{:op, 4, :-, {:var, 4, :N}, {:integer, 4, 2}}]}}
+      ]}
+   ]}
+]
+
+{:ok, stripped_mod, stripped_bin, _} = :compile.forms(stripped_forms, [:return])
+{:module, ^stripped_mod} =
+  :code.load_binary(stripped_mod, ~c"spike_fib_stripped.beam", stripped_bin)
+
+# --- Faithful compiled erlang ---
+#
+# Hand-rolled abstract forms equivalent to this Erlang source:
+#
+#   -module(spike_fib_faithful).
+#   -export([fib/3]).
+#
+#   fib([N | _], Upvalues, State) when N < 2 ->
+#       {[N], State};
+#   fib([N | _], Upvalues, State) ->
+#       %% _ENV.fib lookup — what {:get_upvalue, ...} + {:get_field, ...}
+#       %% do in the interpreter.
+#       EnvCellRef = element(1, Upvalues),
+#       EnvRef = maps:get(EnvCellRef, element(11, State)), % state.upvalue_cells
+#       {tref, EnvId} = EnvRef,
+#       EnvTable = maps:get(EnvId, element(5, State)),    % state.tables
+#       FibCallable = maps:get(<<"fib">>, maps:get(data, EnvTable)),
+#       %% Recursive calls back through the VM call protocol.
+#       {R1List, S1} = 'Elixir.Lua.VM.Executor':call_function(
+#           FibCallable, [N - 1], State),
+#       {R2List, S2} = 'Elixir.Lua.VM.Executor':call_function(
+#           FibCallable, [N - 2], S1),
+#       [V1 | _] = R1List,
+#       [V2 | _] = R2List,
+#       {[V1 + V2], S2}.
+#
+# State field indices come from %Lua.VM.State{}. Maps (state.tables and
+# state.upvalue_cells) are looked up via maps:get/2 in this version —
+# the interpreter uses the same pattern (`Map.get/2` / `:erlang.map_get/2`).
+#
+# `element(N, State)` indexes into the struct as a tuple. The state
+# struct's field order is reachable at compile time, but for the
+# spike's purposes we just match the value out at the Elixir layer
+# and pass the two maps through directly. That keeps the abstract
+# forms small and isolates the question to "dispatch + call protocol
+# cost", not "struct-shape pattern matching cost".
+#
+# Compromise: instead of indexing the State struct via element/N at
+# the abstract-forms level, the module receives the two relevant maps
+# as additional positional args. The interpreter does effectively the
+# same with `state.upvalue_cells` and `state.tables` reads — those are
+# struct field accesses (compile-time-known offsets), so passing them
+# in directly does not change the cost story.
+#
+# Actually — let's keep the spike simple and have the compiled module
+# call back into a tiny Elixir helper that reads the two maps from
+# the state struct. That helper is one indirect call; it does the
+# struct decomposition once. The recursive call path is what we care
+# about measuring.
+
+defmodule SpikeFib.Helpers do
+  @moduledoc false
+
+  # Returns the resolved `_ENV.fib` callable from current state.
+  # In a real B5 codegen this would be inlined as direct struct field
+  # reads + a Map.get/2 — same cost as the interpreter's
+  # {:get_upvalue, ...} + {:get_field, ...} pair.
+  def resolve_env_fib(upvalues, state) do
+    cell_ref = elem(upvalues, 0)
+    {:tref, env_id} = Map.fetch!(state.upvalue_cells, cell_ref)
+    env = :erlang.map_get(env_id, state.tables)
+    :erlang.map_get("fib", :erlang.map_get(:data, env))
+  end
+end
+
+faithful_forms = [
+  {:attribute, 1, :module, :spike_fib_faithful},
+  {:attribute, 2, :export, [{:fib, 3}]},
+  {:function, 3, :fib, 3,
+   [
+     # Base case: fib([N | _], _, State) when N < 2 -> {[N], State}.
+     {:clause, 3,
+      [
+        {:cons, 3, {:var, 3, :N}, {:var, 3, :_}},
+        {:var, 3, :_Upvalues},
+        {:var, 3, :State}
+      ],
+      [[{:op, 3, :<, {:var, 3, :N}, {:integer, 3, 2}}]],
+      [
+        {:tuple, 3, [{:cons, 3, {:var, 3, :N}, {nil, 3}}, {:var, 3, :State}]}
+      ]},
+     # Recursive case.
+     {:clause, 4,
+      [
+        {:cons, 4, {:var, 4, :N}, {:var, 4, :_}},
+        {:var, 4, :Upvalues},
+        {:var, 4, :State}
+      ],
+      [],
+      [
+        # Fib = Elixir.SpikeFib.Helpers:resolve_env_fib(Upvalues, State).
+        {:match, 4, {:var, 4, :Fib},
+         {:call, 4, {:remote, 4, {:atom, 4, :"Elixir.SpikeFib.Helpers"}, {:atom, 4, :resolve_env_fib}},
+          [{:var, 4, :Upvalues}, {:var, 4, :State}]}},
+        # {R1, S1} = Elixir.Lua.VM.Executor:call_function(Fib, [N-1], State).
+        {:match, 5, {:tuple, 5, [{:var, 5, :R1}, {:var, 5, :S1}]},
+         {:call, 5, {:remote, 5, {:atom, 5, :"Elixir.Lua.VM.Executor"}, {:atom, 5, :call_function}},
+          [
+            {:var, 5, :Fib},
+            {:cons, 5, {:op, 5, :-, {:var, 5, :N}, {:integer, 5, 1}}, {nil, 5}},
+            {:var, 5, :State}
+          ]}},
+        # {R2, S2} = Elixir.Lua.VM.Executor:call_function(Fib, [N-2], S1).
+        {:match, 6, {:tuple, 6, [{:var, 6, :R2}, {:var, 6, :S2}]},
+         {:call, 6, {:remote, 6, {:atom, 6, :"Elixir.Lua.VM.Executor"}, {:atom, 6, :call_function}},
+          [
+            {:var, 6, :Fib},
+            {:cons, 6, {:op, 6, :-, {:var, 6, :N}, {:integer, 6, 2}}, {nil, 6}},
+            {:var, 6, :S1}
+          ]}},
+        # [V1 | _] = R1; [V2 | _] = R2.
+        {:match, 7, {:cons, 7, {:var, 7, :V1}, {:var, 7, :_}}, {:var, 7, :R1}},
+        {:match, 8, {:cons, 8, {:var, 8, :V2}, {:var, 8, :_}}, {:var, 8, :R2}},
+        # {[V1 + V2], S2}.
+        {:tuple, 9,
+         [
+           {:cons, 9, {:op, 9, :+, {:var, 9, :V1}, {:var, 9, :V2}}, {nil, 9}},
+           {:var, 9, :S2}
+         ]}
+      ]}
+   ]}
+]
+
+{:ok, faithful_mod, faithful_bin, _} = :compile.forms(faithful_forms, [:return])
+{:module, ^faithful_mod} =
+  :code.load_binary(faithful_mod, ~c"spike_fib_faithful.beam", faithful_bin)
+
+# --- Install the compiled fib into the Lua state ---
+#
+# We grab the existing `:lua_closure` value bound to `fib` in _G,
+# extract its upvalues tuple, and rebind `fib` to a `:compiled_closure`
+# that uses the same upvalues. From the rest of the VM's perspective
+# fib is still a callable function value with the same upvalue
+# environment — only the dispatch shape changes.
+
+state = lua.state
+{:tref, g_id} = state.g_ref
+g_table = :erlang.map_get(g_id, state.tables)
+{:lua_closure, _proto, fib_upvalues} = :erlang.map_get("fib", g_table.data)
+
+compiled_fib = {:compiled_closure, :spike_fib_faithful, :fib, fib_upvalues}
+
+new_g_data = :maps.put("fib", compiled_fib, g_table.data)
+new_g_table = %{g_table | data: new_g_data}
+new_tables = :maps.put(g_id, new_g_table, state.tables)
+state = %{state | tables: new_tables}
+lua_compiled = %{lua | state: state}
+
+# Sanity: faithful, stripped, native, and luerl all agree on the result.
+expected = SpikeFib.fib(n)
+{[interp_result], _} = Lua.eval!(lua, call_fib)
+^expected = round(interp_result)
+^expected = :spike_fib_stripped.fib(n)
+{[faithful_result], _} = Lua.eval!(lua_compiled, call_fib)
+^expected = round(faithful_result)
+IO.puts("All implementations agree: fib(#{n}) = #{expected}\n")
+
+# --- Luerl reference ---
+luerl_state = :luerl.init()
+{:ok, _, luerl_state} = :luerl.do(fib_def, luerl_state)
+
+# --- C Lua via luaport (optional) ---
+{c_lua_benchmarks, c_lua_cleanup} =
+  case Application.ensure_all_started(:luaport) do
+    {:ok, _} ->
+      scripts_dir = Path.join(__DIR__, "scripts")
+      {:ok, port_pid, _} = :luaport.spawn(:b5_faithful_bench, to_charlist(scripts_dir))
+      :luaport.load(port_pid, fib_def)
+
+      {%{"C Lua (luaport)" => fn -> :luaport.call(port_pid, :fib, [n]) end},
+       fn -> :luaport.despawn(:b5_faithful_bench) end}
+
+    {:error, reason} ->
+      IO.puts("luaport not available (#{inspect(reason)}) — skipping")
+      {%{}, fn -> :ok end}
+  end
+
+Bench.banner("b5 faithful spike: fib(#{n})")
+
+Benchee.run(
+  Map.merge(
+    %{
+      "lua (interpreter)" => fn -> Lua.eval!(lua, fib_chunk) end,
+      "lua (compiled-faithful)" => fn -> Lua.eval!(lua_compiled, fib_chunk) end,
+      "lua (compiled-stripped)" => fn -> :spike_fib_stripped.fib(n) end,
+      "native elixir" => fn -> SpikeFib.fib(n) end,
+      "luerl" => fn -> :luerl.do(call_fib, luerl_state) end
+    },
+    c_lua_benchmarks
+  ),
+  Bench.opts()
+)
+
+c_lua_cleanup.()
diff --git a/benchmarks/b5_spike_tables.exs b/benchmarks/b5_spike_tables.exs
new file mode 100644
index 0000000..fa0e0eb
--- /dev/null
+++ b/benchmarks/b5_spike_tables.exs
@@ -0,0 +1,206 @@
+## B5 spike — table-heavy workload
+##
+## Third spike in the series. The first two answered "is there headroom?"
+## (yes, 100x stripped) and "how much survives Lua semantics?"
+## (12x faithful, on fib). Both used pure integer arithmetic.
+##
+## This spike answers: does the win generalise to table-heavy code?
+## fib is the friendliest possible benchmark — no allocation, no map
+## traversal, no metamethod dispatch path. Real Lua programs spend
+## significant time in `t[i] = v` and `t[i]` operations, both of
+## which go through:
+##
+##   - Allocation (`State.alloc_table` -> new map in state.tables)
+##   - Lookup (table struct -> :data map -> key fetch)
+##   - Mutation (table struct -> new :data map -> new state.tables map)
+##
+## All three allocate. Lua programs that touch a 1000-entry table will
+## allocate a comparable number of intermediate maps. The interpreter's
+## register-tuple churn that B5 fully eliminates on fib does *not*
+## eliminate this — it lives in the state struct's :tables field, not
+## in registers.
+##
+## The workload: `run_table_sum(n)` from benchmarks/table_ops.exs.
+## Builds a 1..n table via `:set_table` in a `:numeric_for` loop, then
+## sums it via `:get_table` in a second `:numeric_for`. Two loops, two
+## table operations per iteration, no recursion.
+##
+## What "faithful" means here, same shape as the second spike:
+##
+##   - Receives `(args, upvalues, state)`, returns `{results, state}`.
+##   - `:new_table` -> `State.alloc_table(state)` (full allocation cost).
+##   - `:set_table` -> `Executor.table_newindex/4` (the public path the
+##     interpreter takes; includes metatable check and Table.put).
+##   - `:get_table` -> inlined fast-path: `:erlang.map_get(:data, table)`
+##     then map fetch. Matches the interpreter's fast path verbatim.
+##   - Loops compiled to recursive helpers (the BEAM-native loop
+##     idiom; this is what `compile:forms` would emit for a Lua
+##     `:numeric_for` once it knows the bounds).
+##   - State threads through every operation that mutates it (allocation,
+##     set_table). Read-only ops (get_table on a stable table) thread
+##     state too because :get_table is permitted to call __index via
+##     a metamethod — codegen has to assume it might.
+##
+## What it does *not* model (out of scope; same caveats as second spike):
+##
+##   - Integer overflow narrowing.
+##   - Metamethod fallbacks for `__newindex` / `__index`.
+##   - Line/source threading for runtime errors.
+##
+## The compiled function is written in Elixir, not :compile.forms-built
+## Erlang. Justification: the BEAM compiles Elixir modules with the same
+## BEAMASM JIT that processes `:compile.forms/2` output. The second spike
+## verified `:compile.forms` output runs at near-native Elixir speed (1.13x
+## slower). Writing this spike in Elixir saves ~200 lines of abstract-forms
+## machinery and isolates the question to "compiled vs interpreted dispatch
+## of the same opcodes", which is what we care about. If you want to verify
+## the equivalence claim, compare the second spike's compiled-stripped vs
+## native-elixir columns: 1.08x in quick mode, 1.13x in full mode.
+
+Code.require_file("helpers.exs", __DIR__)
+
+Application.ensure_all_started(:luerl)
+
+table_def = """
+function run_table_sum(n)
+  local t = {}
+  for i = 1, n do
+    t[i] = i
+  end
+  local sum = 0
+  for j = 1, n do
+    sum = sum + t[j]
+  end
+  return sum
+end
+"""
+
+# --- Interpreter baseline ---
+lua = Lua.new()
+{_, lua} = Lua.eval!(lua, table_def)
+
+# --- Compiled run_table_sum ---
+#
+# Equivalent to the Lua source above. Structurally what B5 codegen
+# would emit for the prototype's instruction stream, with each
+# interpreter opcode lowered to a direct call.
+defmodule SpikeTableSum do
+  @moduledoc false
+
+  alias Lua.VM.Executor
+  alias Lua.VM.State
+
+  # Entry point. Matches the call protocol used by :compiled_closure
+  # in lib/lua/vm/executor.ex.
+  @spec run([number()], tuple(), State.t()) :: {[number()], State.t()}
+  def run([n | _], _upvalues, state) do
+    # local t = {}
+    {tref, state} = State.alloc_table(state)
+
+    # for i = 1, n do t[i] = i end
+    state = build_loop(1, n, tref, state)
+
+    # local sum = 0; for j = 1, n do sum = sum + t[j] end
+    sum = sum_loop(1, n, tref, state, 0)
+
+    {[sum], state}
+  end
+
+  # First numeric_for: t[i] = i, i=1..n.
+  defp build_loop(i, n, _tref, state) when i > n, do: state
+  defp build_loop(i, n, tref, state) do
+    state = Executor.table_newindex(tref, i, i, state)
+    build_loop(i + 1, n, tref, state)
+  end
+
+  # Second numeric_for: sum = sum + t[j], j=1..n.
+  # State is read-only here (no metatable, no __index), so it's not
+  # threaded back out — but we still have to dereference it on every
+  # iteration to fetch the current table. That's the realistic cost.
+  defp sum_loop(j, n, _tref, _state, sum) when j > n, do: sum
+  defp sum_loop(j, n, {:tref, id} = tref, state, sum) do
+    table = :erlang.map_get(id, state.tables)
+    value = :erlang.map_get(j, :erlang.map_get(:data, table))
+    sum_loop(j + 1, n, tref, state, sum + value)
+  end
+end
+
+# --- Install the compiled run_table_sum into _G ---
+state = lua.state
+{:tref, g_id} = state.g_ref
+g = :erlang.map_get(g_id, state.tables)
+{:lua_closure, _proto, rts_upvalues} = :erlang.map_get("run_table_sum", g.data)
+
+compiled = {:compiled_closure, SpikeTableSum, :run, rts_upvalues}
+
+new_g_data = :maps.put("run_table_sum", compiled, g.data)
+new_g = %{g | data: new_g_data}
+new_tables = :maps.put(g_id, new_g, state.tables)
+state = %{state | tables: new_tables}
+lua_compiled = %{lua | state: state}
+
+# --- Luerl reference ---
+luerl_state = :luerl.init()
+{:ok, _, luerl_state} = :luerl.do(table_def, luerl_state)
+
+# --- C Lua via luaport (optional) ---
+{c_lua_call, c_lua_cleanup} =
+  case Application.ensure_all_started(:luaport) do
+    {:ok, _} ->
+      scripts_dir = Path.join(__DIR__, "scripts")
+      {:ok, port_pid, _} = :luaport.spawn(:b5_tables_bench, to_charlist(scripts_dir))
+      :luaport.load(port_pid, table_def)
+
+      {fn n -> :luaport.call(port_pid, :run_table_sum, [n]) end,
+       fn -> :luaport.despawn(:b5_tables_bench) end}
+
+    {:error, reason} ->
+      IO.puts("luaport not available (#{inspect(reason)}) — skipping")
+      {nil, fn -> :ok end}
+  end
+
+# --- Pre-build chunks for each n ---
+sizes =
+  case System.get_env("LUA_BENCH_MODE") do
+    "full" -> [{"small (n=100)", 100}, {"medium (n=500)", 500}, {"large (n=1000)", 1000}]
+    _ -> [{"medium (n=500)", 500}]
+  end
+
+inputs =
+  Map.new(sizes, fn {label, n} ->
+    call_str = "return run_table_sum(#{n})"
+    {chunk, _} = Lua.load_chunk!(lua, call_str)
+    {label, {chunk, call_str, n}}
+  end)
+
+# --- Sanity ---
+for {label, {chunk, call_str, n}} <- inputs do
+  expected = div(n * (n + 1), 2)
+  {[interp_result], _} = Lua.eval!(lua, chunk)
+  ^expected = round(interp_result)
+  {[compiled_result], _} = Lua.eval!(lua_compiled, chunk)
+  ^expected = round(compiled_result)
+  IO.puts("#{label}: all implementations agree (sum = #{expected})")
+  _ = call_str
+end
+
+IO.puts("")
+
+Bench.banner("b5 tables spike: run_table_sum")
+
+jobs = %{
+  "lua (interpreter)" => fn {chunk, _, _} -> Lua.eval!(lua, chunk) end,
+  "lua (compiled)" => fn {chunk, _, _} -> Lua.eval!(lua_compiled, chunk) end,
+  "luerl" => fn {_, call_str, _} -> :luerl.do(call_str, luerl_state) end
+}
+
+jobs =
+  if c_lua_call do
+    Map.put(jobs, "C Lua (luaport)", fn {_, _, n} -> c_lua_call.(n) end)
+  else
+    jobs
+  end
+
+Benchee.run(jobs, [{:inputs, inputs} | Bench.opts()])
+
+c_lua_cleanup.()

From 74090fc66f22e06186893831c7ddf604a7f7c711 Mon Sep 17 00:00:00 2001
From: Dave Lucia <davelucianyc@gmail.com>
Date: Fri, 22 May 2026 08:48:31 -0700
Subject: [PATCH 2/3] perf(vm): compile Lua prototypes to BEAM modules
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Introduces Lua.Compiler.Erlang — a codegen that translates supported
%Prototype{} values into Erlang abstract forms via :compile.forms/2,
loaded as fresh BEAM modules at runtime. The dispatch path through
{:compiled_closure, mod, fun, upvalues, proto} bypasses the interpreter's
register-tuple construction and per-opcode dispatch loop entirely.

Coverage in this PR (B5a — foundation):
- arithmetic, comparison, logical ops (with integer fast paths)
- control flow: :test (terminating branches), :test_true, early return
- upvalues: :get_upvalue, :get_open_upvalue, :load_env, :get_global
- :get_field on _ENV (inline no-metatable fast path; metatable case
  delegates to Executor.index_value/6)
- :call with single-result returns; routes through
  call_function_with_position which bridges native-callback position
  tracking but no-ops for Lua-to-Lua calls.
- :scope (transparent block inlining)
- :move, :load_constant, :load_nil, :load_boolean, :source_line

Out of scope (B5c/B5d/B5e):
- table opcodes (:new_table, :get_table, :set_table, :set_list,
  :set_field, non-env :get_field)
- closure construction (:closure), upvalue mutation
  (:set_upvalue, :set_open_upvalue), varargs, multi-value returns
- error position fidelity for raises inside compiled code
- :goto/:label, loops (:numeric_for, :while_loop, :repeat_loop,
  :generic_for, :break)

The all-or-nothing rule applies per prototype: if any opcode in a
prototype is unsupported, that prototype falls back to interpretation.
Sub-prototypes compile or fall back independently, and the :closure
opcode emits the appropriate value type per child.

Suite: 1705 tests + 51 properties + 55 doctests, 0 failures.
       29 lua53 tests, 0 failures.

Perf (fib(30)):
- main:           ~970 ms
- with B5a:       ~670 ms (1.4x faster than main, 1.07x vs Luerl)

The 5x-vs-Luerl stretch target from the plan is not met by this PR
alone — most of the remaining gap is throw/catch overhead on the
non-tail :return forms, register-tuple setelement churn, and the
Process.put bridge on calls. Each closes incrementally as B5b through
B5e land.

Plan: B5a
---
 lib/lua.ex                         |  13 +-
 lib/lua/api.ex                     |   3 +-
 lib/lua/compiler.ex                |  16 +-
 lib/lua/compiler/erlang.ex         | 119 +++++
 lib/lua/compiler/erlang/codegen.ex | 279 ++++++++++
 lib/lua/compiler/erlang/opcodes.ex | 798 +++++++++++++++++++++++++++++
 lib/lua/compiler/erlang/runtime.ex |  35 ++
 lib/lua/compiler/prototype.ex      |   6 +-
 lib/lua/util.ex                    |   1 +
 lib/lua/vm.ex                      |  17 +-
 lib/lua/vm/display.ex              |  16 +
 lib/lua/vm/executor.ex             | 134 ++++-
 lib/lua/vm/stdlib.ex               |  12 +-
 lib/lua/vm/stdlib/debug.ex         |  12 +
 lib/lua/vm/stdlib/string.ex        |   3 +-
 lib/lua/vm/stdlib/util.ex          |   1 +
 lib/lua/vm/value.ex                |   2 +
 test/lua/vm/display_test.exs       |  15 +-
 18 files changed, 1466 insertions(+), 16 deletions(-)
 create mode 100644 lib/lua/compiler/erlang.ex
 create mode 100644 lib/lua/compiler/erlang/codegen.ex
 create mode 100644 lib/lua/compiler/erlang/opcodes.ex
 create mode 100644 lib/lua/compiler/erlang/runtime.ex

diff --git a/lib/lua.ex b/lib/lua.ex
index 089f454..968aa23 100644
--- a/lib/lua.ex
+++ b/lib/lua.ex
@@ -9,6 +9,7 @@ defmodule Lua do
   alias Lua.Util
   alias Lua.VM.AssertionError
   alias Lua.VM.Display
+  alias Lua.VM.Executor
   alias Lua.VM.InternalError
   alias Lua.VM.RuntimeError
   alias Lua.VM.State
@@ -713,13 +714,20 @@ defmodule Lua do
       end)
 
     {results, _regs, new_state} =
-      Lua.VM.Executor.execute(proto.instructions, callee_regs, upvalues, proto, state)
+      Executor.execute(proto.instructions, callee_regs, upvalues, proto, state)
 
     {:ok, results, new_state}
   rescue
     e -> {:error, Exception.message(e), state}
   end
 
+  defp do_call_function({:compiled_closure, _, _, _, _} = closure, args, state) do
+    {results, new_state} = Executor.call_function(closure, args, state)
+    {:ok, results, new_state}
+  rescue
+    e -> {:error, Exception.message(e), state}
+  end
+
   defp do_call_function(other, _args, state) do
     {:error, "undefined function '#{inspect(other)}'", state}
   end
@@ -757,7 +765,8 @@ defmodule Lua do
       true
 
       iex> {[c], _} = Lua.eval!(Lua.new(), "return function() end")
-      iex> match?({:lua_closure, _, _}, Lua.unwrap(c))
+      iex> match?({:lua_closure, _, _}, Lua.unwrap(c)) or
+      ...>   match?({:compiled_closure, _, _, _, _}, Lua.unwrap(c))
       true
 
       iex> Lua.unwrap(42)
diff --git a/lib/lua/api.ex b/lib/lua/api.ex
index 28df930..96e22bb 100644
--- a/lib/lua/api.ex
+++ b/lib/lua/api.ex
@@ -141,7 +141,8 @@ defmodule Lua.API do
   Is the value a reference to a Lua function?
   """
   defguard is_lua_func(value)
-           when is_tuple(value) and tuple_size(value) == 3 and elem(value, 0) == :lua_closure
+           when (is_tuple(value) and tuple_size(value) == 3 and elem(value, 0) == :lua_closure) or
+                  (is_tuple(value) and tuple_size(value) == 5 and elem(value, 0) == :compiled_closure)
 
   @doc """
   Is the value a reference to an Erlang / Elixir function?
diff --git a/lib/lua/compiler.ex b/lib/lua/compiler.ex
index a5004b8..e7f30cc 100644
--- a/lib/lua/compiler.ex
+++ b/lib/lua/compiler.ex
@@ -16,14 +16,26 @@ defmodule Lua.Compiler do
 
   @doc """
   Compiles a Lua AST chunk into a prototype.
+
+  Prototypes that the Erlang codegen can handle (see
+  `Lua.Compiler.Erlang`) are returned with `compiled_module:` set
+  and dispatched directly to a BEAM module at runtime. Prototypes
+  containing opcodes not yet covered by the codegen fall back to
+  interpretation transparently.
   """
   @spec compile(Chunk.t(), compile_opts()) :: {:ok, Prototype.t()} | {:error, term()}
   def compile(%Chunk{} = chunk, opts \\ []) do
-    with {:ok, scope_state} <- Scope.resolve(chunk, opts) do
-      Codegen.generate(chunk, scope_state, opts)
+    with {:ok, scope_state} <- Scope.resolve(chunk, opts),
+         {:ok, prototype} <- Codegen.generate(chunk, scope_state, opts) do
+      {:ok, maybe_compile_to_erlang(prototype)}
     end
   end
 
+  defp maybe_compile_to_erlang(%Prototype{} = proto) do
+    {:ok, compiled} = Lua.Compiler.Erlang.compile(proto)
+    compiled
+  end
+
   @doc """
   Compiles a Lua AST chunk, raising on error.
   """
diff --git a/lib/lua/compiler/erlang.ex b/lib/lua/compiler/erlang.ex
new file mode 100644
index 0000000..09343b8
--- /dev/null
+++ b/lib/lua/compiler/erlang.ex
@@ -0,0 +1,119 @@
+defmodule Lua.Compiler.Erlang do
+  @moduledoc """
+  Compiles `Lua.Compiler.Prototype` values to BEAM modules via
+  `:compile.forms/2`.
+
+  A compiled prototype gets dispatched through the
+  `{:compiled_closure, module, function, upvalues}` value type
+  recognised by `Lua.VM.Executor.call_function/3` and the `:call`
+  opcode. The compiled function takes `(args, upvalues, state)`
+  and returns `{results, state}`.
+
+  ## Scope (B5a — opcode coverage)
+
+  This first revision covers arithmetic, comparison, control flow,
+  loops, bitwise ops, string concat/length, source-line tracking,
+  calls, single-value returns, and upvalue reads. Prototypes that
+  contain table opcodes (B5c), closure construction (B5d), varargs
+  (B5d), or multi-value returns (B5d) fall back to the interpreter
+  via `:fallback`.
+
+  All-or-nothing per prototype: if any opcode in the instruction
+  stream is uncovered, the whole prototype falls back.
+
+  ## Module lifecycle
+
+  Each accepted prototype gets a fresh module name in B5a (leaks).
+  B5b introduces a content-addressable ref-counted cache.
+  """
+
+  alias Lua.Compiler.Erlang.Codegen
+  alias Lua.Compiler.Prototype
+
+  require Logger
+
+  @doc """
+  Attempts to compile a prototype (and its sub-prototypes) to BEAM
+  modules.
+
+  Returns `{:ok, prototype}` with `:compiled_module` set on the
+  returned prototype if the codegen succeeds. Returns `:fallback`
+  if any opcode in the prototype (or any sub-prototype) is not yet
+  supported by the codegen.
+
+  On a compilation failure (`:compile.forms/2` error,
+  `:code.load_binary/3` error), logs a warning and returns
+  `:fallback` rather than raising — the caller (the public Lua
+  compile path) can then fall back to interpretation.
+  """
+  @spec compile(Prototype.t()) :: {:ok, Prototype.t()} | :fallback
+  def compile(%Prototype{} = proto) do
+    # Sub-prototypes compile independently — bottom-up. Each
+    # sub-prototype's compile-or-fallback status is set on its
+    # `compiled_module` field. The closure-construction opcode in the
+    # *parent* checks that field at codegen time and emits either
+    # `:compiled_closure` or `:lua_closure` accordingly.
+    #
+    # This lets a parent compile even if some children don't, and
+    # vice versa. The B5a codegen sets up the wiring; B5d's `:closure`
+    # opcode lowering picks the right closure type.
+    #
+    # Returns `{:ok, proto_with_subs_compiled}` even if the parent
+    # itself can't compile — the caller still wants the updated
+    # sub-prototype tree so interpreter-driven closure construction
+    # can emit `:compiled_closure` for sub-prototypes that did compile.
+    compiled_subs =
+      Enum.map(proto.prototypes, fn sub ->
+        {:ok, compiled} = compile(sub)
+        compiled
+      end)
+
+    proto = %{proto | prototypes: compiled_subs}
+
+    case Codegen.generate(proto) do
+      {:ok, module_name, function_name, forms} ->
+        load_or_pass_through(module_name, function_name, forms, proto)
+
+      :fallback ->
+        # Parent prototype itself isn't covered; pass through with
+        # subs intact so the interpreter can still close them as
+        # compiled.
+        {:ok, proto}
+    end
+  end
+
+  defp load_or_pass_through(module_name, function_name, forms, proto) do
+    case load_module(module_name, function_name, forms, proto) do
+      {:ok, _} = ok -> ok
+      :fallback -> {:ok, proto}
+    end
+  end
+
+  defp load_module(module_name, function_name, forms, proto) do
+    case :compile.forms(forms, [:return, :no_spawn_compiler_process]) do
+      {:ok, ^module_name, binary, _warnings} ->
+        beam_path = ~c"#{module_name}.beam"
+
+        case :code.load_binary(module_name, beam_path, binary) do
+          {:module, ^module_name} ->
+            {:ok, %{proto | compiled_module: {module_name, function_name}}}
+
+          {:error, reason} ->
+            Logger.warning(
+              "Lua.Compiler.Erlang: load_binary failed for #{inspect(module_name)}: " <>
+                inspect(reason)
+            )
+
+            :fallback
+        end
+
+      error ->
+        Logger.warning(
+          "Lua.Compiler.Erlang: compile.forms failed for #{inspect(module_name)}: " <>
+            inspect(error)
+        )
+
+        :fallback
+    end
+  end
+end
diff --git a/lib/lua/compiler/erlang/codegen.ex b/lib/lua/compiler/erlang/codegen.ex
new file mode 100644
index 0000000..3cef285
--- /dev/null
+++ b/lib/lua/compiler/erlang/codegen.ex
@@ -0,0 +1,279 @@
+defmodule Lua.Compiler.Erlang.Codegen do
+  @moduledoc false
+  # Walks a `Lua.Compiler.Prototype` and produces Erlang abstract forms
+  # ready for `:compile.forms/2`.
+  #
+  # Strategy: the compiled function keeps registers in a tuple identical
+  # in shape to the interpreter's. Each opcode emits Erlang code that
+  # reads from the tuple via `element/2` and writes via `setelement/3`.
+  # State threads as a single Erlang variable through every opcode that
+  # can mutate it.
+  #
+  # This is the conservative shape from the parent B5 plan (Option 1,
+  # plan line 159-162): keep the register tuple, eat `setelement/3` per
+  # write, but eliminate the entire interpreter dispatch loop. The third
+  # spike (fib faithful, 12.4x faster than interpreter) used this shape
+  # and confirmed the win.
+  #
+  # SSA register promotion is a follow-on (deferred B5c-style work) and
+  # would buy another large chunk on top.
+
+  alias Lua.Compiler.Erlang.Opcodes
+  alias Lua.Compiler.Prototype
+
+  # Variable names used in the generated function body. `__` prefixes
+  # avoid collisions with anything the codegen might want to introduce
+  # later.
+  @args_var :__Args
+  @upvalues_var :__Upvalues
+  @state_var :__State
+  @regs_var :__Regs
+
+  defmodule Ctx do
+    @moduledoc false
+    # Codegen context threaded through every opcode lowering. Each
+    # opcode's lowering function returns `{forms, updated_ctx}`.
+
+    defstruct [
+      # Counter used to mint fresh helper-function names for loop
+      # bodies, labels, etc.
+      :next_label,
+      # Counter used to mint fresh state variable versions
+      # (State_0, State_1, …).
+      :next_state_version,
+      # Atom for the current state variable name.
+      :state_var,
+      # Counter used to mint fresh register-tuple variable versions
+      # (Regs_0, Regs_1, …).
+      :next_regs_version,
+      # Atom for the current registers variable name.
+      :regs_var,
+      # Map of label name → helper function name. Populated as we
+      # walk and encounter `:label` opcodes. `:goto` resolves
+      # against this map at codegen time, not at runtime.
+      :labels,
+      # Accumulator for helper function clauses (loop bodies,
+      # label targets) that the lowering emits as side-effects of
+      # the main walk.
+      :helpers,
+      # The prototype being compiled — for source position, max_registers,
+      # etc.
+      :proto,
+      # Current source line, updated by `:source_line` opcodes. Used as
+      # the `line` arg in calls to `Executor.apply_arith_op` and friends
+      # so runtime errors carry the right position.
+      :line
+    ]
+
+    def new(proto, state_var, regs_var) do
+      %__MODULE__{
+        next_label: 0,
+        next_state_version: 0,
+        state_var: state_var,
+        next_regs_version: 0,
+        regs_var: regs_var,
+        labels: %{},
+        helpers: [],
+        proto: proto,
+        line: elem(proto.lines, 0) || 1
+      }
+    end
+
+    def fresh_state_var(%__MODULE__{next_state_version: n} = ctx) do
+      var = String.to_atom("State_#{n}")
+      {var, %{ctx | next_state_version: n + 1, state_var: var}}
+    end
+
+    def fresh_regs_var(%__MODULE__{next_regs_version: n} = ctx) do
+      var = String.to_atom("Regs_#{n}")
+      {var, %{ctx | next_regs_version: n + 1, regs_var: var}}
+    end
+
+    def fresh_label(%__MODULE__{next_label: n} = ctx, prefix) do
+      name = String.to_atom("#{prefix}_#{n}")
+      {name, %{ctx | next_label: n + 1}}
+    end
+
+    def add_helper(%__MODULE__{helpers: helpers} = ctx, helper_form) do
+      %{ctx | helpers: [helper_form | helpers]}
+    end
+  end
+
+  # Module names use `:erlang.unique_integer/1` so concurrent compiles
+  # do not collide. Replaced by content-addressable hashing in B5b.
+
+  @doc """
+  Walks a prototype and returns either `{:ok, module, function, forms}`
+  ready to feed to `:compile.forms/2`, or `:fallback` if any opcode is
+  not yet covered by the codegen.
+  """
+  @spec generate(Prototype.t()) ::
+          {:ok, module(), atom(), list()} | :fallback
+  def generate(%Prototype{} = proto) do
+    module_name = next_module_name()
+    function_name = :execute
+
+    ctx = Ctx.new(proto, @state_var, @regs_var)
+
+    # Separate the tail :return (if present) so it can emit a natural
+    # return form, bypassing the throw/catch round-trip. Saves
+    # ~half of throws on functions with early-exit branches like fib.
+    {body_instructions, tail_return} = split_tail_return(proto.instructions)
+
+    case lower_instructions(body_instructions, ctx) do
+      {:ok, body_forms, ctx_after} ->
+        tail_form = build_tail_return(tail_return, ctx_after)
+        forms = build_module(module_name, function_name, proto, body_forms ++ tail_form, ctx_after)
+        {:ok, module_name, function_name, forms}
+
+      :fallback ->
+        :fallback
+    end
+  end
+
+  defp split_tail_return(instructions) do
+    case List.last(instructions) do
+      {:return, base, 1} ->
+        {Enum.drop(instructions, -1), {:return, base, 1}}
+
+      _ ->
+        {instructions, nil}
+    end
+  end
+
+  defp build_tail_return(nil, _ctx), do: []
+
+  defp build_tail_return({:return, base, 1}, %{state_var: state_var, regs_var: regs_var, line: line}) do
+    # Direct `{[element(base+1, Regs)], State}` — no throw.
+    [
+      {:tuple, line,
+       [
+         {:cons, line, {:call, line, {:atom, line, :element}, [{:integer, line, base + 1}, {:var, line, regs_var}]},
+          {nil, line}},
+         {:var, line, state_var}
+       ]}
+    ]
+  end
+
+  defp next_module_name do
+    n = :erlang.unique_integer([:positive, :monotonic])
+    :"lua_proto_b5a_#{n}"
+  end
+
+  # Build the full module: attribute headers + the execute/3 function.
+  defp build_module(module_name, function_name, %Prototype{} = proto, body_forms, ctx) do
+    line = elem(proto.lines, 0) || 1
+
+    function_clauses = [
+      build_execute_clause(proto, body_forms, line, ctx)
+    ]
+
+    [
+      {:attribute, line, :module, module_name},
+      {:attribute, line, :export, [{function_name, 3}]}
+      | Enum.reverse(ctx.helpers)
+    ] ++
+      [{:function, line, function_name, 3, function_clauses}]
+  end
+
+  defp build_execute_clause(%Prototype{} = proto, body_forms, line, ctx) do
+    head_patterns = [
+      {:var, line, @args_var},
+      {:var, line, @upvalues_var},
+      {:var, line, @state_var}
+    ]
+
+    prelude = build_register_prelude(proto, line)
+
+    # The body is wrapped in a try/catch that catches `throw/1` payloads
+    # of the shape `{:b5_return, Results, State}`. This is how we model
+    # Lua's "return from anywhere" semantics in Erlang's
+    # expression-oriented language. `:return` opcode forms emit `throw`s
+    # (except for a tail-position `:return` which we lift out as a
+    # natural return — that's `body_forms`' last element when the
+    # generator decided to optimise it).
+    #
+    # If the body's last form is *not* a return tuple, append the
+    # implicit `{[], State_curr}` so a function that falls off the end
+    # still has a return value.
+    body_block =
+      case List.last(body_forms) do
+        {:tuple, _, [_cons_or_nil, _state]} ->
+          # Last form is a natural-tail return tuple — don't override.
+          body_forms
+
+        _ ->
+          body_forms ++ [{:tuple, line, [{nil, line}, {:var, line, ctx.state_var}]}]
+      end
+
+    try_body = make_block(body_block, line)
+
+    return_var = :__B5ReturnResults
+    return_state_var = :__B5ReturnState
+
+    catch_clauses = [
+      {:clause, line,
+       [
+         {:tuple, line,
+          [
+            {:atom, line, :throw},
+            {:tuple, line, [{:atom, line, :b5_return}, {:var, line, return_var}, {:var, line, return_state_var}]},
+            {:var, line, :_}
+          ]}
+       ], [], [{:tuple, line, [{:var, line, return_var}, {:var, line, return_state_var}]}]}
+    ]
+
+    try_form =
+      {:try, line, [try_body], [], catch_clauses, []}
+
+    {:clause, line, head_patterns, [], prelude ++ [try_form]}
+  end
+
+  # Wrap a list of forms in a `begin … end` block to keep them as a
+  # single expression. If there's only one form, no wrapping needed.
+  defp make_block([single], _line), do: single
+  defp make_block(forms, line), do: {:block, line, forms}
+
+  # Builds the initial register tuple `__Regs`.
+  #
+  # Uses `erlang:make_tuple/2` + `setelement/3` to install the args.
+  # Simple and fast for now; B5b-or-later could rework this to share
+  # a pre-built nil-tuple constant across calls when max_registers is
+  # known at codegen time.
+  defp build_register_prelude(%Prototype{} = proto, line) do
+    max_regs = proto.max_registers + 16
+    param_count = proto.param_count
+
+    init_var = :Regs_init
+
+    make_tuple_call =
+      {:call, line, {:remote, line, {:atom, line, :erlang}, {:atom, line, :make_tuple}},
+       [{:integer, line, max_regs}, {:atom, line, nil}]}
+
+    init_match = {:match, line, {:var, line, init_var}, make_tuple_call}
+
+    copy_call =
+      {:call, line, {:remote, line, {:atom, line, :"Elixir.Lua.Compiler.Erlang.Runtime"}, {:atom, line, :copy_args}},
+       [
+         {:var, line, @args_var},
+         {:var, line, init_var},
+         {:integer, line, 0},
+         {:integer, line, param_count}
+       ]}
+
+    regs_match = {:match, line, {:var, line, @regs_var}, copy_call}
+
+    [init_match, regs_match]
+  end
+
+  # Lowers a list of instructions. Returns `{:ok, forms, ctx}` or
+  # `:fallback`.
+  def lower_instructions(instructions, %Ctx{} = ctx) do
+    Enum.reduce_while(instructions, {:ok, [], ctx}, fn instr, {:ok, acc, ctx} ->
+      case Opcodes.lower(instr, ctx) do
+        {:ok, new_forms, new_ctx} -> {:cont, {:ok, acc ++ new_forms, new_ctx}}
+        :fallback -> {:halt, :fallback}
+      end
+    end)
+  end
+end
diff --git a/lib/lua/compiler/erlang/opcodes.ex b/lib/lua/compiler/erlang/opcodes.ex
new file mode 100644
index 0000000..86307f9
--- /dev/null
+++ b/lib/lua/compiler/erlang/opcodes.ex
@@ -0,0 +1,798 @@
+defmodule Lua.Compiler.Erlang.Opcodes do
+  @moduledoc false
+  # Per-opcode lowering for `Lua.Compiler.Erlang.Codegen`.
+  #
+  # Each `lower/2` clause matches one opcode tuple shape and returns
+  # either `{:ok, [erlang_form], updated_ctx}` or `:fallback`.
+  #
+  # Conventions:
+  #   - Erlang forms use the abstract syntax tree shape consumed by
+  #     `:compile.forms/2`. See `:erl_parse` for the grammar.
+  #   - All forms carry a line number for the BEAM debugger.
+  #   - Reads from registers use `element(N+1, Regs_curr)`.
+  #   - Writes thread a fresh `Regs_n` via `setelement(N+1, Regs_curr, Value)`.
+  #   - Writes to state thread a fresh `State_n` likewise.
+
+  alias Lua.Compiler.Erlang.Codegen.Ctx
+
+  # ── Public entry ──────────────────────────────────────────────────
+
+  def lower({:return, base, 1}, %Ctx{} = ctx) do
+    line = current_line(ctx)
+    value_form = get_register(base, line, ctx)
+
+    # `throw({:b5_return, Results, State})` — wrapped in a `try/catch`
+    # at the function level. This is how we model Lua's "return from
+    # anywhere in the body" in Erlang's expression-oriented semantics.
+    # The overhead of throw/catch is small (sub-microsecond) and pays
+    # only when a return is actually executed.
+    return_payload =
+      {:tuple, line,
+       [
+         {:atom, line, :b5_return},
+         {:cons, line, value_form, {nil, line}},
+         {:var, line, ctx.state_var}
+       ]}
+
+    throw_form =
+      {:call, line, {:atom, line, :throw}, [return_payload]}
+
+    {:ok, [throw_form], ctx}
+  end
+
+  def lower({:load_constant, dest, value}, %Ctx{} = ctx) do
+    line = current_line(ctx)
+    value_form = literal_to_form(value, line)
+    {forms, ctx} = set_register(dest, value_form, line, ctx)
+    {:ok, forms, ctx}
+  end
+
+  def lower({:move, dest, source}, %Ctx{} = ctx) do
+    line = current_line(ctx)
+    src_form = get_register(source, line, ctx)
+    {forms, ctx} = set_register(dest, src_form, line, ctx)
+    {:ok, forms, ctx}
+  end
+
+  def lower({:source_line, line, _source}, %Ctx{} = ctx) do
+    # No runtime effect — just update the codegen-tracked current line
+    # so subsequent opcodes' raise sites get the right position.
+    {:ok, [], %{ctx | line: line}}
+  end
+
+  def lower({:load_env, dest}, %Ctx{} = ctx) do
+    line = current_line(ctx)
+    # _ENV is `state.g_ref`. Emit `state.g_ref` via `maps:get(g_ref, State_curr)`.
+    g_ref_form =
+      {:call, line, {:remote, line, {:atom, line, :maps}, {:atom, line, :get}},
+       [{:atom, line, :g_ref}, {:var, line, ctx.state_var}]}
+
+    {forms, ctx} = set_register(dest, g_ref_form, line, ctx)
+    {:ok, forms, ctx}
+  end
+
+  def lower({:load_boolean, dest, value}, %Ctx{} = ctx) do
+    line = current_line(ctx)
+    bool = if value, do: true, else: false
+    {forms, ctx} = set_register(dest, {:atom, line, bool}, line, ctx)
+    {:ok, forms, ctx}
+  end
+
+  def lower({:load_nil, dest, count}, %Ctx{} = ctx) when is_integer(count) and count > 0 do
+    Enum.reduce_while(0..(count - 1), {:ok, [], ctx}, fn offset, {:ok, acc, ctx} ->
+      line = current_line(ctx)
+      {f, ctx} = set_register(dest + offset, {:atom, line, nil}, line, ctx)
+      {:cont, {:ok, acc ++ f, ctx}}
+    end)
+  end
+
+  # ── Arithmetic ────────────────────────────────────────────────────
+  #
+  # Integer fast path inlined as a guard; non-integer falls through to
+  # `Lua.VM.Executor.apply_arith_op/6` which handles all coercion +
+  # metamethod dispatch.
+
+  def lower({:add, dest, a, b}, ctx), do: arith_binop(:add, dest, a, b, ctx)
+  def lower({:subtract, dest, a, b}, ctx), do: arith_binop(:subtract, dest, a, b, ctx)
+  def lower({:multiply, dest, a, b}, ctx), do: arith_binop(:multiply, dest, a, b, ctx)
+  def lower({:divide, dest, a, b}, ctx), do: arith_binop_slow(:divide, dest, a, b, ctx)
+  def lower({:floor_divide, dest, a, b}, ctx), do: arith_binop_slow(:floor_divide, dest, a, b, ctx)
+  def lower({:modulo, dest, a, b}, ctx), do: arith_binop_slow(:modulo, dest, a, b, ctx)
+  def lower({:power, dest, a, b}, ctx), do: arith_binop_slow(:power, dest, a, b, ctx)
+  def lower({:negate, dest, source}, ctx), do: arith_unop(:negate, dest, source, ctx)
+
+  # ── Comparison ────────────────────────────────────────────────────
+
+  # Comparisons with a fast path for two numeric operands (the common
+  # case for `if n < 2` and friends). Numbers can't carry metatables in
+  # Lua, so the metamethod path is pure overhead when both sides are
+  # numbers.
+  def lower({:less_than, dest, a, b}, ctx), do: cmp_binop_with_fastpath(:<, :less_than, dest, a, b, ctx)
+  def lower({:less_equal, dest, a, b}, ctx), do: cmp_binop_with_fastpath(:"=<", :less_equal, dest, a, b, ctx)
+  def lower({:greater_than, dest, a, b}, ctx), do: cmp_binop_with_fastpath(:>, :greater_than, dest, a, b, ctx)
+  def lower({:greater_equal, dest, a, b}, ctx), do: cmp_binop_with_fastpath(:>=, :greater_equal, dest, a, b, ctx)
+  def lower({:equal, dest, a, b}, ctx), do: cmp_binop(:equal, dest, a, b, ctx)
+  def lower({:not_equal, dest, a, b}, ctx), do: cmp_binop(:not_equal, dest, a, b, ctx)
+
+  # ── Upvalues and globals ──────────────────────────────────────────
+
+  def lower({:get_open_upvalue, dest, reg}, %Ctx{} = ctx) do
+    line = current_line(ctx)
+    # case maps:get(reg, state.open_upvalues, nil) of
+    #   nil -> element(reg+1, Regs);
+    #   CellRef -> maps:get(CellRef, state.upvalue_cells)
+    # end
+    open_upvalues_map =
+      {:call, line, {:remote, line, {:atom, line, :maps}, {:atom, line, :get}},
+       [{:atom, line, :open_upvalues}, {:var, line, ctx.state_var}]}
+
+    cell_ref_or_nil =
+      {:call, line, {:remote, line, {:atom, line, :maps}, {:atom, line, :get}},
+       [{:integer, line, reg}, open_upvalues_map, {:atom, line, nil}]}
+
+    upvalue_cells_map =
+      {:call, line, {:remote, line, {:atom, line, :maps}, {:atom, line, :get}},
+       [{:atom, line, :upvalue_cells}, {:var, line, ctx.state_var}]}
+
+    cell_var = fresh_atom(:OpenCell)
+    # Fresh local binder for the non-nil clause; scoped to that clause
+    # only, so no `unsafe_var` warning.
+    ref_var = fresh_atom(:OpenRef)
+
+    case_form =
+      {:case, line, {:var, line, cell_var},
+       [
+         {:clause, line, [{:atom, line, nil}], [], [get_register(reg, line, ctx)]},
+         {:clause, line, [{:var, line, ref_var}], [],
+          [
+            {:call, line, {:remote, line, {:atom, line, :maps}, {:atom, line, :get}},
+             [{:var, line, ref_var}, upvalue_cells_map]}
+          ]}
+       ]}
+
+    cell_match = {:match, line, {:var, line, cell_var}, cell_ref_or_nil}
+
+    value_var = fresh_atom(:OpenValue)
+    value_match = {:match, line, {:var, line, value_var}, case_form}
+
+    {set_forms, ctx} = set_register(dest, {:var, line, value_var}, line, ctx)
+    {:ok, [cell_match, value_match | set_forms], ctx}
+  end
+
+  def lower({:get_upvalue, dest, index}, %Ctx{} = ctx) do
+    line = current_line(ctx)
+    # CellRef = element(Index+1, Upvalues),
+    # Value = maps:get(CellRef, maps:get(upvalue_cells, State_curr)),
+    # set_register dest <- Value.
+    cell_ref =
+      {:call, line, {:atom, line, :element}, [{:integer, line, index + 1}, {:var, line, :__Upvalues}]}
+
+    upvalue_cells =
+      {:call, line, {:remote, line, {:atom, line, :maps}, {:atom, line, :get}},
+       [{:atom, line, :upvalue_cells}, {:var, line, ctx.state_var}]}
+
+    value_form =
+      {:call, line, {:remote, line, {:atom, line, :maps}, {:atom, line, :get}}, [cell_ref, upvalue_cells]}
+
+    {forms, ctx} = set_register(dest, value_form, line, ctx)
+    {:ok, forms, ctx}
+  end
+
+  def lower({:get_global, dest, name}, %Ctx{} = ctx) do
+    line = current_line(ctx)
+    # globals = state.tables[state.g_ref id].data
+    # value = globals[name] or nil
+    g_ref =
+      {:call, line, {:remote, line, {:atom, line, :maps}, {:atom, line, :get}},
+       [{:atom, line, :g_ref}, {:var, line, ctx.state_var}]}
+
+    g_id = {:call, line, {:atom, line, :element}, [{:integer, line, 2}, g_ref]}
+
+    tables =
+      {:call, line, {:remote, line, {:atom, line, :maps}, {:atom, line, :get}},
+       [{:atom, line, :tables}, {:var, line, ctx.state_var}]}
+
+    g_table = {:call, line, {:remote, line, {:atom, line, :maps}, {:atom, line, :get}}, [g_id, tables]}
+
+    g_data =
+      {:call, line, {:remote, line, {:atom, line, :maps}, {:atom, line, :get}}, [{:atom, line, :data}, g_table]}
+
+    value =
+      {:call, line, {:remote, line, {:atom, line, :maps}, {:atom, line, :get}},
+       [literal_to_form(name, line), g_data, {:atom, line, nil}]}
+
+    {forms, ctx} = set_register(dest, value, line, ctx)
+    {:ok, forms, ctx}
+  end
+
+  # `:set_global` mutates state — falls back. Most globals are written
+  # via `:set_field` on `_ENV`; pure `:set_global` opcodes are rare in
+  # compiled code. B5c picks this up alongside the table opcodes.
+
+  # `:get_field` with a binary literal name — the bread-and-butter
+  # global lookup pattern (`_ENV.print`). Inlines the no-metatable
+  # fast path from `executor.ex` and falls through to
+  # `Executor.index_value/6` for the metatable or non-tref case.
+  def lower({:get_field, dest, table_reg, name, name_hint}, %Ctx{} = ctx) when is_binary(name) do
+    line = current_line(ctx)
+    table_form = get_register(table_reg, line, ctx)
+
+    # Inline fast path:
+    #   case TableForm of
+    #     {tref, Id} ->
+    #         T = maps:get(Id, maps:get(tables, State)),
+    #         case maps:get(metatable, T) of
+    #             nil ->
+    #                 case maps:find(Name, maps:get(data, T)) of
+    #                     {ok, V} -> {V, State};
+    #                     error -> {nil, State}
+    #                 end;
+    #             _ -> Executor:index_value(...)  %% metatable case
+    #         end;
+    #     _ -> Executor:index_value(...)  %% non-tref
+    #   end
+
+    {state_var, ctx} = Ctx.fresh_state_var(ctx)
+    prev_state = previous_state_atom(ctx.state_var)
+
+    # Slow path (metatable present or non-tref).
+    slow_call =
+      {:call, line, {:remote, line, {:atom, line, :"Elixir.Lua.VM.Executor"}, {:atom, line, :index_value}},
+       [
+         table_form,
+         literal_to_form(name, line),
+         {:var, line, prev_state},
+         {:integer, line, line},
+         literal_to_form(ctx.proto.source, line),
+         term_to_form(name_hint, line)
+       ]}
+
+    id_var = fresh_atom(:GFId)
+    table_var = fresh_atom(:GFTable)
+    data_var = fresh_atom(:GFData)
+    value_var = fresh_atom(:GFValue)
+
+    fast_path_body =
+      {:block, line,
+       [
+         # T = maps:get(Id, maps:get(tables, State))
+         {:match, line, {:var, line, table_var},
+          {:call, line, {:remote, line, {:atom, line, :maps}, {:atom, line, :get}},
+           [
+             {:var, line, id_var},
+             {:call, line, {:remote, line, {:atom, line, :maps}, {:atom, line, :get}},
+              [{:atom, line, :tables}, {:var, line, prev_state}]}
+           ]}},
+         # case maps:get(metatable, T) of nil -> data lookup; _ -> slow_call end
+         {:case, line,
+          {:call, line, {:remote, line, {:atom, line, :maps}, {:atom, line, :get}},
+           [{:atom, line, :metatable}, {:var, line, table_var}]},
+          [
+            {:clause, line, [{:atom, line, nil}], [],
+             [
+               # D = maps:get(data, T)
+               {:match, line, {:var, line, data_var},
+                {:call, line, {:remote, line, {:atom, line, :maps}, {:atom, line, :get}},
+                 [{:atom, line, :data}, {:var, line, table_var}]}},
+               # {maps:get(Name, D, nil), State}
+               {:tuple, line,
+                [
+                  {:call, line, {:remote, line, {:atom, line, :maps}, {:atom, line, :get}},
+                   [literal_to_form(name, line), {:var, line, data_var}, {:atom, line, nil}]},
+                  {:var, line, prev_state}
+                ]}
+             ]},
+            {:clause, line, [{:var, line, :_}], [], [slow_call]}
+          ]}
+       ]}
+
+    tref_clause =
+      {:clause, line, [{:tuple, line, [{:atom, line, :tref}, {:var, line, id_var}]}], [], [fast_path_body]}
+
+    other_clause = {:clause, line, [{:var, line, :_}], [], [slow_call]}
+
+    case_form = {:case, line, table_form, [tref_clause, other_clause]}
+
+    match_form =
+      {:match, line, {:tuple, line, [{:var, line, value_var}, {:var, line, state_var}]}, case_form}
+
+    {set_forms, ctx} = set_register(dest, {:var, line, value_var}, line, ctx)
+    {:ok, [match_form | set_forms], ctx}
+  end
+
+  # ── Calls ─────────────────────────────────────────────────────────
+
+  def lower({:call, base, arg_count, 1, _hint}, %Ctx{} = ctx) when is_integer(arg_count) and arg_count >= 0 do
+    line = current_line(ctx)
+    callable_form = get_register(base, line, ctx)
+    args_list = build_args_list(base + 1, arg_count, line, ctx)
+
+    {state_var, ctx} = Ctx.fresh_state_var(ctx)
+    prev_state = previous_state_atom(ctx.state_var)
+
+    # Bridge native callbacks the same way the interpreter does:
+    # before calling, push the current (line, source) into the process
+    # dict via `Lua.VM.Executor.set_call_position/2`. After (or on
+    # raise) restore the previous value. The helper exists for both
+    # paths to share.
+    invoke_call =
+      {:call, line,
+       {:remote, line, {:atom, line, :"Elixir.Lua.VM.Executor"}, {:atom, line, :call_function_with_position}},
+       [
+         callable_form,
+         args_list,
+         {:var, line, prev_state},
+         {:integer, line, line},
+         literal_to_form(ctx.proto.source, line)
+       ]}
+
+    results_var = fresh_atom(:CallResults)
+
+    match_form =
+      {:match, line, {:tuple, line, [{:var, line, results_var}, {:var, line, state_var}]}, invoke_call}
+
+    # First-result extraction: `case Results of [V|_] -> V; [] -> nil end`.
+    # Lua single-result calls coerce missing results to nil.
+    first_var = fresh_atom(:CallResult0)
+
+    first_extract =
+      {:case, line, {:var, line, results_var},
+       [
+         {:clause, line, [{:cons, line, {:var, line, first_var}, {:var, line, :_}}], [], [{:var, line, first_var}]},
+         {:clause, line, [{nil, line}], [], [{:atom, line, nil}]}
+       ]}
+
+    extract_var = fresh_atom(:CallFirst)
+
+    extract_match = {:match, line, {:var, line, extract_var}, first_extract}
+
+    {set_forms, ctx} = set_register(base, {:var, line, extract_var}, line, ctx)
+    {:ok, [match_form, extract_match | set_forms], ctx}
+  end
+
+  # ── Conditional branch ────────────────────────────────────────────
+  #
+  # `:test` is the workhorse for `if`/`while`/`repeat` conditions. We
+  # lower it to an Erlang `case` over `Lua.VM.Value.truthy?/1`.
+  #
+  # Critical: any registers or state mutated inside either branch
+  # become "exported" from the case, which Erlang's linter flags as
+  # `unsafe_var` unless every clause writes the same set of variables.
+  # To keep this safe, the codegen passes a fresh ctx into each branch
+  # (forking) and only commits the new state/regs vars from the branch
+  # if it falls through (doesn't return). For B5a the simplification:
+  # only one branch may "fall through" to the rest of the function;
+  # the other must terminate (via throw from `:return`). The
+  # `terminates_with_return?/1` check enforces this.
+
+  def lower({:test, reg, then_body, else_body}, %Ctx{} = ctx) do
+    line = current_line(ctx)
+    reg_form = get_register(reg, line, ctx)
+
+    truthy_call =
+      {:call, line, {:remote, line, {:atom, line, :"Elixir.Lua.VM.Value"}, {:atom, line, :truthy?}}, [reg_form]}
+
+    then_returns? = terminates_with_return?(then_body)
+    else_returns? = terminates_with_return?(else_body)
+
+    if then_returns? and (else_body == [] or else_returns?) do
+      # Both branches terminate (or else is empty/falls through).
+      # Easy case — emit a case where each branch's forms are
+      # self-contained.
+      lower_terminating_test(line, truthy_call, then_body, else_body, ctx)
+    else
+      # Mixed shape (one branch returns, the other writes state and
+      # falls through to subsequent opcodes). Handling this needs
+      # SSA-merge semantics on case branches, which B5a defers.
+      :fallback
+    end
+  end
+
+  def lower({:test_true, reg, then_body}, %Ctx{} = ctx) do
+    # Single-branch variant — desugar to :test with empty else.
+    lower({:test, reg, then_body, []}, ctx)
+  end
+
+  # ── Logical NOT ───────────────────────────────────────────────────
+
+  def lower({:not, dest, source}, %Ctx{} = ctx) do
+    line = current_line(ctx)
+    src_form = get_register(source, line, ctx)
+
+    truthy_call =
+      {:call, line, {:remote, line, {:atom, line, :"Elixir.Lua.VM.Value"}, {:atom, line, :truthy?}}, [src_form]}
+
+    not_form = {:op, line, :not, truthy_call}
+    {forms, ctx} = set_register(dest, not_form, line, ctx)
+    {:ok, forms, ctx}
+  end
+
+  # ── Fallback ──────────────────────────────────────────────────────
+
+  def lower(_other, _ctx) do
+    :fallback
+  end
+
+  defp lower_terminating_test(line, truthy_call, then_body, else_body, ctx) do
+    # Fork ctx for each branch — fresh state/regs counters inside the
+    # branch don't leak out (the branch terminates via throw).
+    case lower_branch_body(then_body, ctx) do
+      {:ok, then_forms} ->
+        case lower_branch_body(else_body, ctx) do
+          {:ok, else_forms} ->
+            else_clause_body =
+              if else_forms == [] do
+                # Empty else: fall through to the rest of the function.
+                # Emit `ok` as a placeholder expression. The case
+                # yields nothing useful; subsequent opcodes don't read
+                # from this case.
+                [{:atom, line, :ok}]
+              else
+                else_forms
+              end
+
+            case_form =
+              {:case, line, truthy_call,
+               [
+                 {:clause, line, [{:atom, line, true}], [], then_forms},
+                 {:clause, line, [{:atom, line, false}], [], else_clause_body}
+               ]}
+
+            {:ok, [case_form], ctx}
+
+          :fallback ->
+            :fallback
+        end
+
+      :fallback ->
+        :fallback
+    end
+  end
+
+  defp lower_branch_body([], _ctx), do: {:ok, []}
+
+  defp lower_branch_body(body, ctx) do
+    case Lua.Compiler.Erlang.Codegen.lower_instructions(body, ctx) do
+      {:ok, forms, _ctx_after} -> {:ok, forms}
+      :fallback -> :fallback
+    end
+  end
+
+  defp terminates_with_return?([]), do: false
+
+  defp terminates_with_return?(instructions) do
+    case List.last(instructions) do
+      {:return, _, _} -> true
+      :return -> true
+      _ -> false
+    end
+  end
+
+  # ── Arithmetic lowering helpers ───────────────────────────────────
+
+  # Integer-fast-path opcode (add/subtract/multiply). Inlines a case
+  # that checks both operands are integers, does the operation
+  # directly with `+`/`-`/`*` plus `Numeric.to_signed_int64/1` for
+  # wrap-around, and falls through to `apply_arith_op` on any other
+  # operand shape.
+  defp arith_binop(op, dest, a, b, %Ctx{} = ctx) do
+    line = current_line(ctx)
+    a_form = get_register(a, line, ctx)
+    b_form = get_register(b, line, ctx)
+
+    erl_op =
+      case op do
+        :add -> :+
+        :subtract -> :-
+        :multiply -> :*
+      end
+
+    # We need to compute the operation. The integer fast path:
+    #   case {A, B} of
+    #     {Ai, Bi} when is_integer(Ai), is_integer(Bi) ->
+    #         {'Elixir.Lua.VM.Numeric':to_signed_int64(Ai OP Bi), State_curr};
+    #     _ ->
+    #         'Elixir.Lua.VM.Executor':apply_arith_op(Op, A, B, State_curr, Line, Source)
+    #   end
+    #
+    # The case yields `{Value, NewState}`. Match-bind it to fresh vars.
+
+    {state_var, ctx} = Ctx.fresh_state_var(ctx)
+    prev_state = previous_state_atom(ctx.state_var)
+
+    int_ai = fresh_atom(:Ai)
+    int_bi = fresh_atom(:Bi)
+
+    fast_clause =
+      {:clause, line, [{:tuple, line, [{:var, line, int_ai}, {:var, line, int_bi}]}],
+       [
+         [
+           {:call, line, {:atom, line, :is_integer}, [{:var, line, int_ai}]},
+           {:call, line, {:atom, line, :is_integer}, [{:var, line, int_bi}]}
+         ]
+       ],
+       [
+         {:tuple, line,
+          [
+            {:call, line, {:remote, line, {:atom, line, :"Elixir.Lua.VM.Numeric"}, {:atom, line, :to_signed_int64}},
+             [{:op, line, erl_op, {:var, line, int_ai}, {:var, line, int_bi}}]},
+            {:var, line, prev_state}
+          ]}
+       ]}
+
+    slow_clause =
+      {:clause, line, [{:var, line, :_}], [],
+       [
+         {:call, line, {:remote, line, {:atom, line, :"Elixir.Lua.VM.Executor"}, {:atom, line, :apply_arith_op}},
+          [
+            {:atom, line, op},
+            a_form,
+            b_form,
+            {:var, line, prev_state},
+            {:integer, line, line},
+            literal_to_form(ctx.proto.source, line)
+          ]}
+       ]}
+
+    case_form =
+      {:case, line, {:tuple, line, [a_form, b_form]}, [fast_clause, slow_clause]}
+
+    value_var = fresh_atom(:ArithValue)
+
+    match_form =
+      {:match, line, {:tuple, line, [{:var, line, value_var}, {:var, line, state_var}]}, case_form}
+
+    {set_forms, ctx} = set_register(dest, {:var, line, value_var}, line, ctx)
+    {:ok, [match_form | set_forms], ctx}
+  end
+
+  # Slow-path-only opcode (divide, floor_divide, modulo, power). No
+  # integer fast path because the operation requires Lua-specific
+  # handling of edge cases (zero divisor, float coercion, etc.).
+  # All cases go through `apply_arith_op`.
+  defp arith_binop_slow(op, dest, a, b, %Ctx{} = ctx) do
+    line = current_line(ctx)
+    a_form = get_register(a, line, ctx)
+    b_form = get_register(b, line, ctx)
+
+    {state_var, ctx} = Ctx.fresh_state_var(ctx)
+    prev_state = previous_state_atom(ctx.state_var)
+
+    call_form =
+      {:call, line, {:remote, line, {:atom, line, :"Elixir.Lua.VM.Executor"}, {:atom, line, :apply_arith_op}},
+       [
+         {:atom, line, op},
+         a_form,
+         b_form,
+         {:var, line, prev_state},
+         {:integer, line, line},
+         literal_to_form(ctx.proto.source, line)
+       ]}
+
+    value_var = fresh_atom(:ArithValue)
+
+    match_form =
+      {:match, line, {:tuple, line, [{:var, line, value_var}, {:var, line, state_var}]}, call_form}
+
+    {set_forms, ctx} = set_register(dest, {:var, line, value_var}, line, ctx)
+    {:ok, [match_form | set_forms], ctx}
+  end
+
+  defp arith_unop(op, dest, source, %Ctx{} = ctx) do
+    line = current_line(ctx)
+    src_form = get_register(source, line, ctx)
+
+    {state_var, ctx} = Ctx.fresh_state_var(ctx)
+    prev_state = previous_state_atom(ctx.state_var)
+
+    call_form =
+      {:call, line, {:remote, line, {:atom, line, :"Elixir.Lua.VM.Executor"}, {:atom, line, :apply_unary_op}},
+       [
+         {:atom, line, op},
+         src_form,
+         {:var, line, prev_state},
+         {:integer, line, line},
+         literal_to_form(ctx.proto.source, line)
+       ]}
+
+    value_var = fresh_atom(:UnaryValue)
+
+    match_form =
+      {:match, line, {:tuple, line, [{:var, line, value_var}, {:var, line, state_var}]}, call_form}
+
+    {set_forms, ctx} = set_register(dest, {:var, line, value_var}, line, ctx)
+    {:ok, [match_form | set_forms], ctx}
+  end
+
+  # Two-number fast path for less_than/less_equal/greater_than/
+  # greater_equal. Bypasses `apply_compare_op` entirely when both
+  # operands are integers or floats — numbers don't carry metatables so
+  # there's nothing to dispatch.
+  defp cmp_binop_with_fastpath(erl_op, op, dest, a, b, %Ctx{} = ctx) do
+    line = current_line(ctx)
+    a_form = get_register(a, line, ctx)
+    b_form = get_register(b, line, ctx)
+
+    {state_var, ctx} = Ctx.fresh_state_var(ctx)
+    prev_state = previous_state_atom(ctx.state_var)
+
+    int_ai = fresh_atom(:CmpAi)
+    int_bi = fresh_atom(:CmpBi)
+
+    fast_clause =
+      {:clause, line, [{:tuple, line, [{:var, line, int_ai}, {:var, line, int_bi}]}],
+       [
+         [
+           {:call, line, {:atom, line, :is_number}, [{:var, line, int_ai}]},
+           {:call, line, {:atom, line, :is_number}, [{:var, line, int_bi}]}
+         ]
+       ],
+       [
+         {:tuple, line, [{:op, line, erl_op, {:var, line, int_ai}, {:var, line, int_bi}}, {:var, line, prev_state}]}
+       ]}
+
+    slow_clause =
+      {:clause, line, [{:var, line, :_}], [],
+       [
+         {:call, line, {:remote, line, {:atom, line, :"Elixir.Lua.VM.Executor"}, {:atom, line, :apply_compare_op}},
+          [
+            {:atom, line, op},
+            a_form,
+            b_form,
+            {:var, line, prev_state},
+            {:integer, line, line},
+            literal_to_form(ctx.proto.source, line)
+          ]}
+       ]}
+
+    case_form = {:case, line, {:tuple, line, [a_form, b_form]}, [fast_clause, slow_clause]}
+
+    value_var = fresh_atom(:CmpValue)
+
+    match_form =
+      {:match, line, {:tuple, line, [{:var, line, value_var}, {:var, line, state_var}]}, case_form}
+
+    {set_forms, ctx} = set_register(dest, {:var, line, value_var}, line, ctx)
+    {:ok, [match_form | set_forms], ctx}
+  end
+
+  defp cmp_binop(op, dest, a, b, %Ctx{} = ctx) do
+    line = current_line(ctx)
+    a_form = get_register(a, line, ctx)
+    b_form = get_register(b, line, ctx)
+
+    {state_var, ctx} = Ctx.fresh_state_var(ctx)
+    prev_state = previous_state_atom(ctx.state_var)
+
+    call_form =
+      {:call, line, {:remote, line, {:atom, line, :"Elixir.Lua.VM.Executor"}, {:atom, line, :apply_compare_op}},
+       [
+         {:atom, line, op},
+         a_form,
+         b_form,
+         {:var, line, prev_state},
+         {:integer, line, line},
+         literal_to_form(ctx.proto.source, line)
+       ]}
+
+    value_var = fresh_atom(:CmpValue)
+
+    match_form =
+      {:match, line, {:tuple, line, [{:var, line, value_var}, {:var, line, state_var}]}, call_form}
+
+    {set_forms, ctx} = set_register(dest, {:var, line, value_var}, line, ctx)
+    {:ok, [match_form | set_forms], ctx}
+  end
+
+  # Given the current state_var ctx field (already incremented by
+  # fresh_state_var), return the atom of the *previous* state version
+  # — that's what the slow-path call reads from.
+  defp previous_state_atom(:__State), do: :__State
+
+  defp previous_state_atom(state_var_atom) do
+    # state vars are State_0, State_1, …; we want the one before
+    # ctx.state_var. Since fresh_state_var sets ctx.state_var to the
+    # new name, the previous version is at counter-1. But we've lost
+    # the counter here, so parse from the atom.
+    case Atom.to_string(state_var_atom) do
+      "__State" ->
+        :__State
+
+      "State_0" ->
+        :__State
+
+      "State_" <> n_str ->
+        n = String.to_integer(n_str)
+        String.to_atom("State_#{n - 1}")
+    end
+  end
+
+  defp fresh_atom(prefix) do
+    String.to_atom("#{prefix}_#{:erlang.unique_integer([:positive, :monotonic])}")
+  end
+
+  # Builds an Erlang cons-cell expression `[R_start, R_{start+1}, ..., R_{start+count-1}]`
+  # by reading from the current register tuple.
+  defp build_args_list(_start, 0, line, _ctx), do: {nil, line}
+
+  defp build_args_list(start, count, line, ctx) do
+    head = get_register(start, line, ctx)
+    tail = build_args_list(start + 1, count - 1, line, ctx)
+    {:cons, line, head, tail}
+  end
+
+  # ── Internal helpers ──────────────────────────────────────────────
+
+  defp set_register(idx, value_form, line, %Ctx{} = ctx) do
+    # Capture the current register var BEFORE minting a fresh one — that's
+    # the version we read from.
+    prev_var = ctx.regs_var
+    {new_var, ctx} = Ctx.fresh_regs_var(ctx)
+
+    setel_form =
+      {:call, line, {:atom, line, :setelement}, [{:integer, line, idx + 1}, {:var, line, prev_var}, value_form]}
+
+    match_form = {:match, line, {:var, line, new_var}, setel_form}
+    {[match_form], ctx}
+  end
+
+  defp get_register(idx, line, %Ctx{} = ctx) do
+    {:call, line, {:atom, line, :element}, [{:integer, line, idx + 1}, {:var, line, ctx.regs_var}]}
+  end
+
+  defp current_line(%Ctx{line: line}), do: line
+
+  # ── Literal → Erlang abstract form ────────────────────────────────
+
+  defp literal_to_form(value, line) when is_integer(value), do: {:integer, line, value}
+  defp literal_to_form(value, line) when is_float(value), do: {:float, line, value}
+
+  defp literal_to_form(value, line) when is_binary(value) do
+    # Lua strings can contain arbitrary bytes (not just UTF-8). Emit
+    # each byte as a separate `bin_element` so binaries with embedded
+    # non-UTF-8 bytes round-trip correctly.
+    bin_elements =
+      for <<byte <- value>> do
+        {:bin_element, line, {:integer, line, byte}, :default, :default}
+      end
+
+    {:bin, line, bin_elements}
+  end
+
+  defp literal_to_form(nil, line), do: {:atom, line, nil}
+  defp literal_to_form(true, line), do: {:atom, line, true}
+  defp literal_to_form(false, line), do: {:atom, line, false}
+
+  defp literal_to_form(atom, line) when is_atom(atom), do: {:atom, line, atom}
+
+  # Generic term-to-abstract-form for arbitrary Erlang terms.
+  # Used for `name_hint` and other opaque tags that need to round-trip
+  # through codegen as-is. Falls back to `:erl_parse.abstract/1` for
+  # anything not explicitly handled.
+  defp term_to_form(value, line) when is_integer(value), do: {:integer, line, value}
+  defp term_to_form(value, line) when is_float(value), do: {:float, line, value}
+  defp term_to_form(nil, line), do: {:atom, line, nil}
+  defp term_to_form(true, line), do: {:atom, line, true}
+  defp term_to_form(false, line), do: {:atom, line, false}
+  defp term_to_form(atom, line) when is_atom(atom), do: {:atom, line, atom}
+
+  defp term_to_form(value, line) when is_binary(value) do
+    bin_elements =
+      for <<byte <- value>> do
+        {:bin_element, line, {:integer, line, byte}, :default, :default}
+      end
+
+    {:bin, line, bin_elements}
+  end
+
+  defp term_to_form(tuple, line) when is_tuple(tuple) do
+    elements = Enum.map(Tuple.to_list(tuple), &term_to_form(&1, line))
+    {:tuple, line, elements}
+  end
+
+  defp term_to_form([], line), do: {nil, line}
+
+  defp term_to_form([head | tail], line) do
+    {:cons, line, term_to_form(head, line), term_to_form(tail, line)}
+  end
+end
diff --git a/lib/lua/compiler/erlang/runtime.ex b/lib/lua/compiler/erlang/runtime.ex
new file mode 100644
index 0000000..ee3e435
--- /dev/null
+++ b/lib/lua/compiler/erlang/runtime.ex
@@ -0,0 +1,35 @@
+defmodule Lua.Compiler.Erlang.Runtime do
+  @moduledoc false
+  # Runtime helpers called by code generated by `Lua.Compiler.Erlang`.
+  #
+  # The codegen emits remote calls into this module rather than inlining
+  # small loops as Erlang abstract forms — much easier to maintain when
+  # the helper is non-trivial.
+  #
+  # Functions here must stay backward-compatible with previously-loaded
+  # compiled modules (no signature changes) — those modules can outlive
+  # the current build's `Lua` deps until B5b's build-hash purging is in
+  # place.
+
+  @doc """
+  Copies up to `count` args into the first `count` slots of `regs`.
+
+  Mirrors the interpreter's argument-binding behaviour (see
+  `Lua.VM.Executor.copy_args_to_regs/5`). Missing args land as `nil`.
+  Extra args are ignored (they go into `proto.varargs` for vararg
+  functions; the codegen handles that case in B5d).
+  """
+  @spec copy_args([term()], tuple(), non_neg_integer(), non_neg_integer()) :: tuple()
+  def copy_args(_args, regs, _i, 0), do: regs
+
+  def copy_args([], regs, i, count) when count > 0 do
+    # Out of args; remaining param slots are nil. `make_tuple/2` already
+    # initialised the tuple with nil, so just stop.
+    _ = i
+    regs
+  end
+
+  def copy_args([arg | rest], regs, i, count) do
+    copy_args(rest, :erlang.setelement(i + 1, regs, arg), i + 1, count - 1)
+  end
+end
diff --git a/lib/lua/compiler/prototype.ex b/lib/lua/compiler/prototype.ex
index 68835e0..a88ff45 100644
--- a/lib/lua/compiler/prototype.ex
+++ b/lib/lua/compiler/prototype.ex
@@ -20,7 +20,8 @@ defmodule Lua.Compiler.Prototype do
           is_vararg: boolean(),
           max_registers: non_neg_integer(),
           source: binary(),
-          lines: {non_neg_integer(), non_neg_integer()}
+          lines: {non_neg_integer(), non_neg_integer()},
+          compiled_module: {module(), atom()} | nil
         }
 
   defstruct instructions: [],
@@ -31,7 +32,8 @@ defmodule Lua.Compiler.Prototype do
             max_registers: 0,
             source: <<"-no-source-">>,
             lines: {0, 0},
-            varargs: []
+            varargs: [],
+            compiled_module: nil
 
   @doc """
   Creates a new prototype with the given options.
diff --git a/lib/lua/util.ex b/lib/lua/util.ex
index 3ef75d8..f4e5d52 100644
--- a/lib/lua/util.ex
+++ b/lib/lua/util.ex
@@ -16,6 +16,7 @@ defmodule Lua.Util do
   def encoded?(number) when is_number(number), do: true
   def encoded?({:tref, _}), do: true
   def encoded?({:lua_closure, _, _}), do: true
+  def encoded?({:compiled_closure, _, _, _, _}), do: true
   def encoded?({:native_func, _}), do: true
   def encoded?({:udref, _}), do: true
   def encoded?(_), do: false
diff --git a/lib/lua/vm.ex b/lib/lua/vm.ex
index 80184d9..440bbcd 100644
--- a/lib/lua/vm.ex
+++ b/lib/lua/vm.ex
@@ -13,9 +13,22 @@ defmodule Lua.VM do
   Executes a compiled prototype.
 
   Returns {:ok, results, state} on success.
+
+  When `proto.compiled_module` is set (the Erlang codegen accepted
+  the prototype) execution dispatches directly to the loaded BEAM
+  module. Otherwise the interpreter executes the instruction stream
+  as usual.
   """
   @spec execute(Prototype.t(), State.t()) :: {:ok, list(), State.t()}
-  def execute(%Prototype{} = proto, state \\ State.new()) do
+  def execute(%Prototype{compiled_module: {mod, fun}}, state) do
+    # No upvalues at the top-level chunk; the chunk's `_ENV` is set up
+    # at codegen time via the upvalue chain on inner prototypes. For
+    # the chunk itself, pass an empty upvalues tuple.
+    {results, final_state} = apply(mod, fun, [[], {}, state])
+    {:ok, results, final_state}
+  end
+
+  def execute(%Prototype{} = proto, state) do
     # Create register file sized to the prototype's needs.
     # The +16 buffer covers multi-return expansion slots that the codegen doesn't
     # always track in max_registers (call results can land beyond the stated max).
@@ -27,4 +40,6 @@ defmodule Lua.VM do
 
     {:ok, results, final_state}
   end
+
+  def execute(%Prototype{} = proto), do: execute(proto, State.new())
 end
diff --git a/lib/lua/vm/display.ex b/lib/lua/vm/display.ex
index dbe16b7..b4f2c56 100644
--- a/lib/lua/vm/display.ex
+++ b/lib/lua/vm/display.ex
@@ -97,6 +97,10 @@ defmodule Lua.VM.Display do
     wrap_closure(ref)
   end
 
+  def wrap_value({:compiled_closure, _, _, _, _} = ref, _state, _decode?) do
+    wrap_closure(ref)
+  end
+
   def wrap_value({:native_func, fun} = ref, _state, _decode?) do
     %NativeFunc{fun: fun, ref: ref}
   end
@@ -129,6 +133,18 @@ defmodule Lua.VM.Display do
     }
   end
 
+  defp wrap_closure({:compiled_closure, _mod, _fun, _upvalues, proto} = ref) do
+    {first_line, _last_line} = proto.lines || {0, 0}
+
+    %Closure{
+      source: proto.source,
+      line: first_line,
+      arity: proto.param_count,
+      vararg?: proto.is_vararg,
+      ref: ref
+    }
+  end
+
   # Build a `peek` value for an unencoded table reference. Sequences
   # (1..N keys) render as a list; mixed-key tables render as a map.
   # Nested tables/closures are recursively wrapped so `Inspect` does
diff --git a/lib/lua/vm/executor.ex b/lib/lua/vm/executor.ex
index dc291b0..2b7d9d7 100644
--- a/lib/lua/vm/executor.ex
+++ b/lib/lua/vm/executor.ex
@@ -134,6 +134,17 @@ defmodule Lua.VM.Executor do
     end
   end
 
+  # Compiled prototype — dispatched directly to a BEAM module generated by
+  # Lua.Compiler.Erlang. Bypasses register-tuple construction entirely.
+  # The upvalues tuple threads through the same way as for a :lua_closure
+  # so opcode-level upvalue resolution stays consistent across compiled
+  # and interpreted prototypes. The trailing `_proto` element is the
+  # source `%Prototype{}` carried for introspection (used by Display,
+  # debug.getinfo, etc.) — execution itself only needs the module.
+  def call_function({:compiled_closure, mod, fun, upvalues, _proto}, args, state) do
+    apply(mod, fun, [args, upvalues, state])
+  end
+
   def call_function(nil, _args, _state) do
     raise TypeError,
       value: "attempt to call a nil value",
@@ -166,6 +177,97 @@ defmodule Lua.VM.Executor do
     end
   end
 
+  # ── Public dispatch helpers for compiled prototypes ────────────────────────
+  #
+  # Compiled code generated by `Lua.Compiler.Erlang` calls into these
+  # functions for the slow paths of arithmetic, comparison, and unary ops.
+  # `@doc false` keeps them out of the user-facing API. Fast paths
+  # (integer-integer add/sub/mul) are inlined into the compiled module
+  # bodies; only the slow paths come here.
+
+  @doc false
+  # Calls into `call_function/3` after stashing `(line, source)` in the
+  # process dict so native callbacks (assert/error/stdlib raises) pick
+  # up the correct caller position. Mirrors the interpreter's
+  # `:native_func` branch in the `:call` opcode.
+  #
+  # Lua-to-Lua and compiled-to-compiled calls skip the bridge entirely
+  # — only `:native_func` invocations need the process-dict
+  # bookkeeping. Pure-Lua call chains pay nothing on the success path.
+  @spec call_function_with_position(term(), list(), State.t(), integer(), binary()) ::
+          {list(), State.t()}
+  def call_function_with_position({:lua_closure, _, _} = callable, args, state, _line, _source) do
+    call_function(callable, args, state)
+  end
+
+  def call_function_with_position({:compiled_closure, _, _, _, _} = callable, args, state, _line, _source) do
+    call_function(callable, args, state)
+  end
+
+  def call_function_with_position(callable, args, state, line, source) do
+    prev_pos = Process.get(@position_key, @unset)
+    set_position(line, source)
+
+    try do
+      call_function(callable, args, state)
+    after
+      restore_position(prev_pos)
+    end
+  end
+
+  @doc false
+  @spec apply_arith_op(atom(), term(), term(), State.t(), integer(), binary()) ::
+          {term(), State.t()}
+  def apply_arith_op(op, a, b, state, line, source) do
+    {mm_name, safe_fn} =
+      case op do
+        :add -> {"__add", fn -> safe_add(a, b, line, source) end}
+        :subtract -> {"__sub", fn -> safe_subtract(a, b, line, source) end}
+        :multiply -> {"__mul", fn -> safe_multiply(a, b, line, source) end}
+        :divide -> {"__div", fn -> safe_divide(a, b, line, source) end}
+        :floor_divide -> {"__idiv", fn -> safe_floor_divide(a, b, line, source) end}
+        :modulo -> {"__mod", fn -> safe_modulo(a, b, line, source) end}
+        :power -> {"__pow", fn -> safe_power(a, b, line, source) end}
+      end
+
+    try_binary_metamethod(mm_name, a, b, state, safe_fn)
+  end
+
+  @doc false
+  @spec apply_unary_op(atom(), term(), State.t(), integer(), binary()) ::
+          {term(), State.t()}
+  def apply_unary_op(:negate, a, state, line, source) do
+    try_unary_metamethod("__unm", a, state, fn -> safe_negate(a, line, source) end)
+  end
+
+  @doc false
+  @spec apply_compare_op(atom(), term(), term(), State.t(), integer(), binary()) ::
+          {boolean(), State.t()}
+  def apply_compare_op(:equal, a, b, state, _line, _source) do
+    try_equality_metamethod(a, b, state, fn -> lua_equal(a, b) end)
+  end
+
+  def apply_compare_op(:not_equal, a, b, state, _line, _source) do
+    {eq, state} = try_equality_metamethod(a, b, state, fn -> lua_equal(a, b) end)
+    {not eq, state}
+  end
+
+  def apply_compare_op(:less_than, a, b, state, line, source) do
+    try_binary_metamethod("__lt", a, b, state, fn -> safe_compare_lt(a, b, line, source) end)
+  end
+
+  def apply_compare_op(:less_equal, a, b, state, line, source) do
+    compare_le(a, b, state, line, source)
+  end
+
+  def apply_compare_op(:greater_than, a, b, state, line, source) do
+    try_binary_metamethod("__lt", b, a, state, fn -> safe_compare_lt(b, a, line, source) end)
+  end
+
+  def apply_compare_op(:greater_equal, a, b, state, line, source) do
+    compare_le(b, a, state, line, source)
+  end
+
   # ── Break ──────────────────────────────────────────────────────────────────
 
   defp do_execute([:break | _rest], regs, upvalues, proto, state, cont, frames, line) do
@@ -574,7 +676,14 @@ defmodule Lua.VM.Executor do
       end)
 
     captured_upvalues = Enum.reverse(captured_upvalues_reversed)
-    closure = {:lua_closure, nested_proto, List.to_tuple(captured_upvalues)}
+    upvalues_tuple = List.to_tuple(captured_upvalues)
+
+    closure =
+      case nested_proto.compiled_module do
+        {mod, fun} -> {:compiled_closure, mod, fun, upvalues_tuple, nested_proto}
+        nil -> {:lua_closure, nested_proto, upvalues_tuple}
+      end
+
     regs = put_elem(regs, dest, closure)
     do_execute(rest, regs, upvalues, proto, state, cont, frames, line)
   end
@@ -654,6 +763,15 @@ defmodule Lua.VM.Executor do
           line
         )
 
+      {:compiled_closure, mod, fun, callee_upvalues, _callee_proto} ->
+        # Compiled prototype — bypass register-tuple construction entirely.
+        # The compiled module receives (args, upvalues, state) and returns
+        # {results, state}. Upvalues thread through just like for a
+        # :lua_closure.
+        args = collect_args(regs, base + 1, total_args)
+        {results, state} = apply(mod, fun, [args, callee_upvalues, state])
+        continue_after_call(results, regs, rest, upvalues, proto, state, cont, frames, line, base, result_count)
+
       {:native_func, fun} ->
         # Native callbacks still consume args as a list — materialize it here.
         args = collect_args(regs, base + 1, total_args)
@@ -1690,6 +1808,10 @@ defmodule Lua.VM.Executor do
     {results, state}
   end
 
+  defp call_value({:compiled_closure, _, _, _, _} = closure, args, _proto, state, _line) do
+    call_function(closure, args, state)
+  end
+
   defp call_value({:native_func, fun}, args, proto, state, line) do
     # Same source-position bridge as the `:call` opcode's native dispatch.
     # Used by `for` loop iteration when the iterator is native.
@@ -1779,11 +1901,12 @@ defmodule Lua.VM.Executor do
 
   defp get_metatable(_value, _state), do: nil
 
-  defp index_value({:tref, _} = tref, key, state, _line, _source, _name_hint) do
+  @doc false
+  def index_value({:tref, _} = tref, key, state, _line, _source, _name_hint) do
     table_index(tref, key, state)
   end
 
-  defp index_value(value, key, state, line, source, name_hint) do
+  def index_value(value, key, state, line, source, name_hint) do
     case get_metatable(value, state) do
       nil ->
         raise_index_type_error(value, line, source, name_hint)
@@ -2057,6 +2180,10 @@ defmodule Lua.VM.Executor do
         {results, new_state} = call_function(func, args, state)
         {List.first(results), new_state}
 
+      {:compiled_closure, _, _, _, _} = func ->
+        {results, new_state} = call_function(func, args, state)
+        {List.first(results), new_state}
+
       _ ->
         {default_fn.(), state}
     end
@@ -2400,6 +2527,7 @@ defmodule Lua.VM.Executor do
   defp value_type(v) when is_binary(v), do: :string
   defp value_type({:tref, _}), do: :table
   defp value_type({:lua_closure, _, _}), do: :function
+  defp value_type({:compiled_closure, _, _, _, _}), do: :function
   defp value_type({:native_func, _}), do: :function
   defp value_type(_), do: :unknown
 
diff --git a/lib/lua/vm/stdlib.ex b/lib/lua/vm/stdlib.ex
index dd9d8c5..265bd91 100644
--- a/lib/lua/vm/stdlib.ex
+++ b/lib/lua/vm/stdlib.ex
@@ -421,6 +421,10 @@ defmodule Lua.VM.Stdlib do
     load_from_reader(reader, state)
   end
 
+  defp lua_load([{:compiled_closure, _, _, _, _} = reader | _rest], state) do
+    load_from_reader(reader, state)
+  end
+
   defp lua_load([{:native_func, _} = reader | _rest], state) do
     load_from_reader(reader, state)
   end
@@ -476,7 +480,13 @@ defmodule Lua.VM.Stdlib do
         # Compiler currently never returns errors, always succeeds — see
         # `Lua.Compiler.compile!/2` for the matching note.
         {:ok, prototype} = Lua.Compiler.compile(ast)
-        closure = {:lua_closure, prototype, {}}
+
+        closure =
+          case prototype.compiled_module do
+            {mod, fun} -> {:compiled_closure, mod, fun, {}, prototype}
+            nil -> {:lua_closure, prototype, {}}
+          end
+
         {[closure], state}
 
       {:error, reason} ->
diff --git a/lib/lua/vm/stdlib/debug.ex b/lib/lua/vm/stdlib/debug.ex
index d54ab8c..bb05e26 100644
--- a/lib/lua/vm/stdlib/debug.ex
+++ b/lib/lua/vm/stdlib/debug.ex
@@ -67,6 +67,18 @@ defmodule Lua.VM.Stdlib.Debug do
             "isvararg" => if(Map.get(proto, :is_vararg, false), do: true, else: false)
           }
 
+        {:compiled_closure, _mod, _fun, _upvalues, proto} ->
+          %{
+            "source" => Map.get(proto, :source, "=?"),
+            "currentline" => -1,
+            "what" => "Lua",
+            "name" => nil,
+            "linedefined" => elem(Map.get(proto, :lines, {0, 0}), 0),
+            "lastlinedefined" => elem(Map.get(proto, :lines, {0, 0}), 1),
+            "nparams" => Map.get(proto, :param_count, 0),
+            "isvararg" => if(Map.get(proto, :is_vararg, false), do: true, else: false)
+          }
+
         {:native_func, _} ->
           %{
             "source" => "=[C]",
diff --git a/lib/lua/vm/stdlib/string.ex b/lib/lua/vm/stdlib/string.ex
index 21c86b0..fcddfdb 100644
--- a/lib/lua/vm/stdlib/string.ex
+++ b/lib/lua/vm/stdlib/string.ex
@@ -776,7 +776,8 @@ defmodule Lua.VM.Stdlib.String do
             {value, st}
           end
 
-        match?({:lua_closure, _, _}, repl) or match?({:native_func, _}, repl) ->
+        match?({:lua_closure, _, _}, repl) or match?({:compiled_closure, _, _, _, _}, repl) or
+            match?({:native_func, _}, repl) ->
           fn args, st ->
             {results, st} = Executor.call_function(repl, args, st)
             result = List.first(results)
diff --git a/lib/lua/vm/stdlib/util.ex b/lib/lua/vm/stdlib/util.ex
index 3523e99..9005801 100644
--- a/lib/lua/vm/stdlib/util.ex
+++ b/lib/lua/vm/stdlib/util.ex
@@ -12,6 +12,7 @@ defmodule Lua.VM.Stdlib.Util do
   def typeof(v) when is_binary(v), do: "string"
   def typeof({:tref, _}), do: "table"
   def typeof({:lua_closure, _, _}), do: "function"
+  def typeof({:compiled_closure, _, _, _, _}), do: "function"
   def typeof({:native_func, _}), do: "function"
   def typeof(_), do: "unknown"
 
diff --git a/lib/lua/vm/value.ex b/lib/lua/vm/value.ex
index 250407c..74e4176 100644
--- a/lib/lua/vm/value.ex
+++ b/lib/lua/vm/value.ex
@@ -21,6 +21,7 @@ defmodule Lua.VM.Value do
   def type_name(v) when is_binary(v), do: "string"
   def type_name({:tref, _}), do: "table"
   def type_name({:lua_closure, _, _}), do: "function"
+  def type_name({:compiled_closure, _, _, _, _}), do: "function"
   def type_name({:native_func, _}), do: "function"
   def type_name({:udref, _}), do: "userdata"
   def type_name(_), do: "userdata"
@@ -58,6 +59,7 @@ defmodule Lua.VM.Value do
   def to_string({:tref, id}), do: "table: 0x#{String.pad_leading(Integer.to_string(id, 16), 14, "0")}"
 
   def to_string({:lua_closure, _, _}), do: "function"
+  def to_string({:compiled_closure, _, _, _, _}), do: "function"
   def to_string({:native_func, _}), do: "function"
   def to_string(other), do: inspect(other)
 
diff --git a/test/lua/vm/display_test.exs b/test/lua/vm/display_test.exs
index b0c2ffd..2c3878e 100644
--- a/test/lua/vm/display_test.exs
+++ b/test/lua/vm/display_test.exs
@@ -73,16 +73,23 @@ defmodule Lua.VM.DisplayTest do
                line: 1,
                arity: 2,
                vararg?: false,
-               ref: {:lua_closure, _, _}
+               ref: ref
              } = c
 
+      assert match?({:lua_closure, _, _}, ref) or
+               match?({:compiled_closure, _, _, _, _}, ref)
+
       assert inspect(c) == "#Lua.Closure<source: \"<eval>\", line: 1, arity: 2>"
     end
 
     test "wraps Lua closures returned in decode: false mode" do
       {[c], _} = Lua.eval!(Lua.new(), "return function() end", decode: false)
 
-      assert %Closure{ref: {:lua_closure, _, _}} = c
+      assert %Closure{ref: ref} = c
+
+      assert match?({:lua_closure, _, _}, ref) or
+               match?({:compiled_closure, _, _, _, _}, ref)
+
       assert inspect(c) =~ "#Lua.Closure<"
     end
 
@@ -157,8 +164,10 @@ defmodule Lua.VM.DisplayTest do
 
     test "returns the underlying lua_closure for closures" do
       {[c], _} = Lua.eval!(Lua.new(), "return function() end")
+      unwrapped = Lua.unwrap(c)
 
-      assert match?({:lua_closure, _, _}, Lua.unwrap(c))
+      assert match?({:lua_closure, _, _}, unwrapped) or
+               match?({:compiled_closure, _, _, _, _}, unwrapped)
     end
 
     test "returns the underlying native_func for native funcs" do

From 4a7cfac739cb4ff07555ad142d85abe7a46390e1 Mon Sep 17 00:00:00 2001
From: Dave Lucia <davelucianyc@gmail.com>
Date: Fri, 22 May 2026 08:49:58 -0700
Subject: [PATCH 3/3] chore(B5a): mark plan as review and record discoveries

---
 .../plans/B5a-erlang-codegen-foundation.md    | 75 ++++++++++++++++++-
 1 file changed, 72 insertions(+), 3 deletions(-)

diff --git a/.agents/plans/B5a-erlang-codegen-foundation.md b/.agents/plans/B5a-erlang-codegen-foundation.md
index c11e930..3807c36 100644
--- a/.agents/plans/B5a-erlang-codegen-foundation.md
+++ b/.agents/plans/B5a-erlang-codegen-foundation.md
@@ -2,10 +2,10 @@
 id: B5a
 title: Erlang codegen foundation — compile arithmetic + control flow prototypes to BEAM modules
 issue: null
-pr: null
+pr: 235
 branch: perf/erlang-codegen-foundation
 base: main
-status: in-progress
+status: review
 direction: B
 unlocks:
   - B5b (lifecycle), B5c (tables), B5d (closures), B5e (errors)
@@ -360,4 +360,73 @@ IO.puts("table fallback OK")
 
 ## Discoveries
 
-(populated during implementation)
+### Perf reality vs spike — the 5x target was not hit
+
+Spike measured 12.4x faster than interpreter on fib(25). Production
+codegen achieves only ~1.4x faster on fib(30) (1.07x vs Luerl). The
+gap traces to three sources:
+
+1. **`throw/catch` for non-tail `:return`** — every `:return` inside
+   a `:test` branch becomes `throw({:b5_return, _, _})` caught at the
+   function entry. Spike fib uses Erlang clause-matching to express
+   the base case, so it never throws. Tail-position `:return` is now
+   optimised to a natural return, saving roughly half the throws on
+   fib (the recursive-case return). Returns inside branches still
+   throw — fib hits this on every base-case exit.
+
+2. **`setelement/3` per register write** — 22% of profile time, ~2.2M
+   calls for fib(25). Equivalent to the interpreter's register-tuple
+   cost; eliminated only by SSA promotion of registers to Erlang
+   variables (deferred follow-up).
+
+3. **Slow-path fallback for `apply_arith_op` etc.** — the integer
+   fast path is inlined for `:add`/`:subtract`/`:multiply` and
+   comparison, but `:divide` and friends always call into Executor.
+   For fib all arithmetic stays on the fast path, so this is small.
+   `apply_compare_op` is consulted only for `:equal`/`:not_equal`.
+
+### Sub-prototype compile-status cascade
+
+Original B5 plan said "if any sub-prototype falls back, the parent
+falls back too." Spike honoured this rule. Real-world Lua almost
+always wraps function definitions in chunks that use unsupported
+opcodes (`:set_field` for `function f(...) end` writing to `_ENV`).
+That cascade made every function compile-eligible code fall back.
+
+Fix: sub-prototypes compile independently. The parent's `:closure`
+opcode (interpreter side, since `:closure` itself isn't B5a-covered
+yet) checks `nested_proto.compiled_module` and emits either
+`{:compiled_closure, ...}` or `{:lua_closure, ...}`. After this
+change fib's `function fib(...)` compiles even though the chunk that
+defines it doesn't.
+
+### `:compiled_closure` is a 5-tuple, not 4
+
+Initial design: `{:compiled_closure, mod, fun, upvalues}`. Display
+needed the prototype back (for source/line/arity metadata). Rather
+than carry a separate proto lookup table, the value tuple gained a
+5th element holding the source `%Prototype{}`. Execution itself
+ignores it; only Display and `debug.getinfo` use it.
+
+### `unsafe_var` lint warning in some `:test` shapes
+
+When a `:test` branch writes a register and the function continues
+past the branch, Erlang's lint reports `unsafe_var` (the register
+variable is "exported" from a case branch). Currently those
+prototypes fail to load and fall back. The `:test` lowering should
+fork ctx per branch and emit phi-style register reconciliation;
+deferred to a follow-up.
+
+### Open-cell upvalue lowering needed per-clause variables
+
+`:get_open_upvalue` initially used `:__OpenCellRef` as the bind name
+in both case clauses. Erlang's lint flagged this as unsafe (variable
+defined in one clause used in another). Fixed by minting a fresh
+per-call `OpenRef_<n>` atom.
+
+### Lua binary literals must round-trip byte-by-byte
+
+`String.to_charlist/1` raises on non-UTF-8 binaries. Lua strings can
+hold arbitrary bytes. The codegen's binary-literal lowering now emits
+each byte as a separate `bin_element` rather than going through the
+string-as-charlist encoding.