Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 33 additions & 2 deletions .agents/plans/B6-direct-table-refs.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ issue: null
pr: null
branch: perf/direct-table-refs
base: main
status: ready
status: deferred
direction: B
unlocks:
- measurable reduction in Map.get/2,3 cost across all table-heavy
Expand Down Expand Up @@ -209,4 +209,35 @@ mix run benchmarks/oop.exs

## Discoveries

(populated during implementation)
Deferred after profile re-baseline post-PR #223 / PR #227. The plan's
hypothesis (per-tref `Map.fetch!(state.tables, id)` is the dominant
table-access cost) no longer matches the data:

- **fib(22)**: `Map.get/2 + Map.get/3` combined is **3.28%** (1.62 + 1.66),
not the plan's claimed 6.4%. PR #223's fast paths already trimmed the
worst offenders. The plan's stretch target was "< 3% combined" — we're
already 0.3 percentage points away from that with no work.
- **OOP (50 instances)**: `Map.get/2 + Map.get/3` is **2.81%**. Matches the
plan's 2.6% claim. A 0.3-point improvement at best, on a workload where
the bottleneck is elsewhere (`do_execute` 42.7%, `setelement` 20.9%,
`continue_after_call` 2.7%).
- **table_build (n=500)**: `Map.get` combined is **0.04%**. Not a target
worth chasing. The real hot spots on this workload are:
- `:erlang.setelement/3` — 17.5%
- `Lua.VM.Table.insert/3` — 11.8%
- `Lua.VM.Table.put/3` — 6.5%
- `Lua.VM.Value.do_sequence_length/2` — 4.0%
- `Lua.VM.Table.normalize_key/1` — 3.3%

Those are all internal to `Lua.VM.Table` and `Lua.VM.Value` — B7's
territory (array+hash split eliminates `do_sequence_length`,
`normalize_key`, and the per-key `Map.put` on contiguous integer
keys).

B6 would touch ~12 sites in `executor.ex` and ~15 more in stdlib for a
projected wall-clock win below 1% — inside benchee's deviation band on
every measured workload. Audit cleanup is still arguably worth it
later, but not as a perf plan and not before B7.

Skip-to-B7 decision recorded by Dave on 2026-05-21 after re-profiling
post-B8.
77 changes: 74 additions & 3 deletions .agents/plans/B7-table-array-hash-split.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@
id: B7
title: Split table storage into array + hash parts
issue: null
pr: null
pr: 229
branch: perf/table-array-hash-split
base: main
status: ready
status: review
direction: B
unlocks:
- O(1) `t[#t + 1] = x` (supersedes A10b)
Expand Down Expand Up @@ -250,4 +250,75 @@ IO.puts("delta=#{after_mem - before_mem}B")

## Discoveries

(populated during implementation)
- The first iteration used `:erlang.append_element/2` for every
append. That's O(n) per call (copies the tuple), so a 500-element
build is O(n²). Result: +12% slower than baseline, +160% memory.
The plan's risks section explicitly warned about this; exponential
growth was the cited mitigation. Switching to capacity-doubling
(floor 4) with `setelement/3` into pre-nil slots made the path
amortized O(1) and is what landed.
- The first version eagerly demoted array slots to the hash part when
`t[k] = nil` punched a hole. That broke
`for k,v in pairs(t) do t[k] = nil end` because the cleared key was
no longer findable for `next(t, k)` to advance past it. PUC-Lua's
nil-as-hole semantics (set the slot to nil in place, flip an
`array_has_holes` flag) is both simpler and correct. `Table.length/1`
consults the flag and stays O(1) for the dominant no-holes case.
- Most of the plan's projected wins assumed the dominant
per-write cost was `Map.put` on the data map. In the post-B8 profile
the picture was more diffuse: `setelement/3` (register writes) is
still the biggest line item, and the per-key `Map.put`/`order_tail`/
`dead` pipeline only accounts for ~25% of write cost. So the array
split removed the most fixable share but couldn't move the bigger
cost line.
- B6 (direct table refs) was deferred in this same branch after the
post-B8 profile showed `Map.get/2,3` combined was already at 3.28%
on fib(22) (plan claimed 6.4%) and 0.04% on table_build. Recorded
in `.agents/plans/B6-direct-table-refs.md`.

## What changed

- `lib/lua/vm/table.ex` — bulk of the change. Added `array`,
`array_len`, `array_has_holes` fields; added `get/2`, `has?/2`,
`length/1`, `to_map/1`, `keys/1` helpers; rewrote `put/3` to route
integer writes through the array part with exponential growth;
rewrote `next_entry/2` to walk array then hash, skipping nil slots.
- `lib/lua/vm/executor.ex` — `get_table` fast path now checks the
array part for positive integer keys; `table_index`, `table_newindex`,
`table_length`, and the `:length` opcode call the new helpers.
- `lib/lua/vm/stdlib.ex` — `rawget`, `rawlen`, `ipairs` migrated to
helpers. `cache_module_result` now goes through `Table.put/3` instead
of mutating `data` directly.
- `lib/lua.ex` — `set_in_table`/`get_in_table` traversal calls
`Table.get/2` so integer-key paths see array entries.
- `lib/lua/vm/state.ex` — `globals/1` returns `Table.to_map(table)`.
- `lib/lua/vm/value.ex`, `lib/lua/vm/display.ex` — full-table decoders
use `Table.to_map/1`.
- `test/lua/vm/value_test.exs` — two implementation-coupled assertions
on `table.data[N]` for integer keys were updated to use `Table.get/2`.

PR: #229

Also in this PR:
- B6 deferred (`.agents/plans/B6-direct-table-refs.md`) — profile
doesn't support the hypothesis after PR #223 / #227 / #229.
- B8 marked merged (`.agents/plans/B8-inline-numeric-narrowing.md`,
shipped via #227).

Suite delta: 1692 tests passing → 1692 tests passing (no regression).
lua53 suite: 29 tests, 0 failures (matches main).

Benchmarks vs baseline (lua chunk path):

| workload | baseline | after B7 | delta | beats luerl? |
|---|---|---|---|---|
| Table Build | 89.45 µs | 83.80 µs | -6.3% | **yes** |
| Table Sort | 245.45 µs | 191.93 µs | -21.8% | no (was 2.2x, now 1.7x) |
| Iterate/Sum | 129.91 µs | 117.19 µs | -9.8% | **yes** |
| Map+Reduce | 277.32 µs | 249.02 µs | -10.2% | **yes** |
| OOP | 135.69 µs | 122.26 µs | -10% | no (was 1.27x, now 1.14x) |
| table.concat | 44.21 µs | 32.22 µs | -27% | **yes** |
| fib(30) chunk | 873 ms | ~860 ms | within noise (±3%) | — |

Memory regressed ~2-3x on table-heavy workloads (e.g. table_build
0.65 MB → 1.68 MB). Bounded by BEAM immutable-tuple semantics.
2 changes: 1 addition & 1 deletion .agents/plans/B8-inline-numeric-narrowing.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ issue: null
pr: 227
branch: perf/inline-numeric-narrowing
base: main
status: review
status: merged
direction: B
unlocks:
- small but free win on all integer-arithmetic workloads
Expand Down
6 changes: 3 additions & 3 deletions lib/lua.ex
Original file line number Diff line number Diff line change
Expand Up @@ -326,7 +326,7 @@ defmodule Lua do
defp set_in_table(state, tref, [key | rest], value) do
table = State.get_table(state, tref)

case Map.get(table.data, key) do
case Table.get(table, key) do
{:tref, _} = child_tref ->
set_in_table(state, child_tref, rest, value)

Expand Down Expand Up @@ -399,13 +399,13 @@ defmodule Lua do

defp get_in_table(state, tref, [key]) do
table = State.get_table(state, tref)
Map.get(table.data, key)
Table.get(table, key)
end

defp get_in_table(state, tref, [key | rest]) do
table = State.get_table(state, tref)

case Map.get(table.data, key) do
case Table.get(table, key) do
{:tref, _} = child_tref ->
get_in_table(state, child_tref, rest)

Expand Down
2 changes: 1 addition & 1 deletion lib/lua/vm/display.ex
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ defmodule Lua.VM.Display do
defp peek_table(state, id, decode?) do
case Map.fetch(state.tables, id) do
{:ok, table} ->
data = table.data
data = Lua.VM.Table.to_map(table)

if sequence_like?(data) do
Enum.map(1..map_size(data), &wrap_value(Map.fetch!(data, &1), state, decode?))
Expand Down
63 changes: 54 additions & 9 deletions lib/lua/vm/executor.ex
Original file line number Diff line number Diff line change
Expand Up @@ -1230,7 +1230,7 @@ defmodule Lua.VM.Executor do
case value do
{:tref, id} ->
table = Map.fetch!(state.tables, id)
Value.sequence_length(table.data)
Table.length(table)

v when is_binary(v) ->
byte_size(v)
Expand Down Expand Up @@ -1262,10 +1262,55 @@ defmodule Lua.VM.Executor do
key = elem(regs, key_reg)

case table_val do
{:tref, id} when is_integer(key) or is_binary(key) ->
# Fast path mirroring get_field: integer/string key on a tref. Skip
# normalize_key (no-op for these) and the full index_value pipeline
# when the entry is present or no metatable is set.
{:tref, id} when is_integer(key) and key >= 1 ->
# Fast path: positive integer key on a tref. Check the array part
# first (constant-time `element/2`); fall back to the hash side or
# the full index_value pipeline if missing.
table = :erlang.map_get(id, state.tables)
array_len = :erlang.map_get(:array_len, table)

if key <= array_len do
case :erlang.element(key, :erlang.map_get(:array, table)) do
nil ->
# Hole in the array part — fall through to metatable check.
case :erlang.map_get(:metatable, table) do
nil ->
regs = put_elem(regs, dest, nil)
do_execute(rest, regs, upvalues, proto, state, cont, frames, line)

_ ->
{value, state} = index_value(table_val, key, state, line, proto.source)
regs = put_elem(regs, dest, value)
do_execute(rest, regs, upvalues, proto, state, cont, frames, line)
end

value ->
regs = put_elem(regs, dest, value)
do_execute(rest, regs, upvalues, proto, state, cont, frames, line)
end
else
case :erlang.map_get(:data, table) do
%{^key => value} ->
regs = put_elem(regs, dest, value)
do_execute(rest, regs, upvalues, proto, state, cont, frames, line)

_data ->
case :erlang.map_get(:metatable, table) do
nil ->
regs = put_elem(regs, dest, nil)
do_execute(rest, regs, upvalues, proto, state, cont, frames, line)

_ ->
{value, state} = index_value(table_val, key, state, line, proto.source)
regs = put_elem(regs, dest, value)
do_execute(rest, regs, upvalues, proto, state, cont, frames, line)
end
end
end

{:tref, id} when is_binary(key) ->
# Binary keys never live in the array part, so the hash-side fast
# path is unchanged.
table = :erlang.map_get(id, state.tables)

case :erlang.map_get(:data, table) do
Expand Down Expand Up @@ -1769,7 +1814,7 @@ defmodule Lua.VM.Executor do

table = Map.fetch!(state.tables, id)

case Table.get_data(table.data, key) do
case Table.get(table, key) do
nil ->
case table.metatable do
nil ->
Expand Down Expand Up @@ -1828,7 +1873,7 @@ defmodule Lua.VM.Executor do
%{state | tables: Map.put(state.tables, id, updated)}

{:tref, mt_id} ->
if Table.has_data?(table.data, key) do
if Table.has?(table, key) do
updated = Table.put(table, key, value)
%{state | tables: Map.put(state.tables, id, updated)}
else
Expand All @@ -1854,7 +1899,7 @@ defmodule Lua.VM.Executor do
Returns the integer length of `tref`, honoring `__len`.

When `__len` is defined the metamethod is invoked and its result is
coerced to an integer. Otherwise falls back to `Value.sequence_length/1`.
coerced to an integer. Otherwise falls back to `Lua.VM.Table.length/1`.

Returns `{integer_length, state}`. Raises `TypeError` when `__len`
returns a value that cannot be coerced to an integer (matching
Expand All @@ -1867,7 +1912,7 @@ defmodule Lua.VM.Executor do
try_unary_metamethod("__len", tref, state, fn ->
{:tref, id} = tref
table = Map.fetch!(state.tables, id)
Value.sequence_length(table.data)
Table.length(table)
end)

case raw do
Expand Down
2 changes: 1 addition & 1 deletion lib/lua/vm/state.ex
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ defmodule Lua.VM.State do
@spec globals(t()) :: map()
def globals(%__MODULE__{g_ref: g_ref} = state) when not is_nil(g_ref) do
table = get_table(state, g_ref)
table.data
Table.to_map(table)
end

@doc """
Expand Down
8 changes: 4 additions & 4 deletions lib/lua/vm/stdlib.ex
Original file line number Diff line number Diff line change
Expand Up @@ -257,7 +257,7 @@ defmodule Lua.VM.Stdlib do
# rawget(table, key) — get without metamethods
defp lua_rawget([{:tref, id}, key | _], state) do
table = Map.fetch!(state.tables, id)
{[Table.get_data(table.data, key)], state}
{[Table.get(table, key)], state}
end

# rawset(table, key, value) — set without metamethods
Expand All @@ -274,7 +274,7 @@ defmodule Lua.VM.Stdlib do
# values, which broke `pcall(rawlen, ...)` patterns in events.lua.
defp lua_rawlen([{:tref, id} | _], state) do
table = Map.fetch!(state.tables, id)
{[Value.sequence_length(table.data)], state}
{[Table.length(table)], state}
end

defp lua_rawlen([v | _], state) when is_binary(v) do
Expand Down Expand Up @@ -359,7 +359,7 @@ defmodule Lua.VM.Stdlib do
{:tref, id} = table_ref
table = Map.fetch!(state.tables, id)

case Map.get(table.data, i) do
case Table.get(table, i) do
nil -> {[nil], state}
v -> {[i, v], state}
end
Expand Down Expand Up @@ -722,7 +722,7 @@ defmodule Lua.VM.Stdlib do
{:tref, loaded_id} = get_package_loaded_ref(state)

State.update_table(state, {:tref, loaded_id}, fn loaded_table ->
%{loaded_table | data: Map.put(loaded_table.data, modname, result)}
Table.put(loaded_table, modname, result)
end)
end

Expand Down
Loading
Loading