From 4dea8cb762d1ca41d5fff863eefd1aa28c44db9a Mon Sep 17 00:00:00 2001 From: Dave Lucia Date: Thu, 21 May 2026 14:44:08 -0700 Subject: [PATCH 1/5] chore(B6): defer plan; profile no longer supports the hypothesis Post-PR #223 / #227 profile shows Map.get/2 + Map.get/3 combined at 3.28% on fib(22) (plan claimed 6.4%), 2.81% on OOP, and 0.04% on table_build. The real table-workload bottlenecks live inside Lua.VM.Table (insert/put 18%, normalize_key 3.3%, sequence_length 4%) and in :erlang.setelement (17.5% on table writes, 20.9% on OOP). Those are B7's targets, not B6's. B6's projected wall-clock win is now below 1%, inside benchee's deviation band on every measured workload. Audit cleanup may still be worth doing later as a refactor, but not as a perf plan and not before B7. --- .agents/plans/B6-direct-table-refs.md | 35 +++++++++++++++++++++++++-- 1 file changed, 33 insertions(+), 2 deletions(-) diff --git a/.agents/plans/B6-direct-table-refs.md b/.agents/plans/B6-direct-table-refs.md index c5b721c..dbc8348 100644 --- a/.agents/plans/B6-direct-table-refs.md +++ b/.agents/plans/B6-direct-table-refs.md @@ -5,7 +5,7 @@ issue: null pr: null branch: perf/direct-table-refs base: main -status: ready +status: deferred direction: B unlocks: - measurable reduction in Map.get/2,3 cost across all table-heavy @@ -209,4 +209,35 @@ mix run benchmarks/oop.exs ## Discoveries -(populated during implementation) +Deferred after profile re-baseline post-PR #223 / PR #227. The plan's +hypothesis (per-tref `Map.fetch!(state.tables, id)` is the dominant +table-access cost) no longer matches the data: + +- **fib(22)**: `Map.get/2 + Map.get/3` combined is **3.28%** (1.62 + 1.66), + not the plan's claimed 6.4%. PR #223's fast paths already trimmed the + worst offenders. The plan's stretch target was "< 3% combined" — we're + already 0.3 percentage points away from that with no work. +- **OOP (50 instances)**: `Map.get/2 + Map.get/3` is **2.81%**. Matches the + plan's 2.6% claim. A 0.3-point improvement at best, on a workload where + the bottleneck is elsewhere (`do_execute` 42.7%, `setelement` 20.9%, + `continue_after_call` 2.7%). +- **table_build (n=500)**: `Map.get` combined is **0.04%**. Not a target + worth chasing. The real hot spots on this workload are: + - `:erlang.setelement/3` — 17.5% + - `Lua.VM.Table.insert/3` — 11.8% + - `Lua.VM.Table.put/3` — 6.5% + - `Lua.VM.Value.do_sequence_length/2` — 4.0% + - `Lua.VM.Table.normalize_key/1` — 3.3% + + Those are all internal to `Lua.VM.Table` and `Lua.VM.Value` — B7's + territory (array+hash split eliminates `do_sequence_length`, + `normalize_key`, and the per-key `Map.put` on contiguous integer + keys). + +B6 would touch ~12 sites in `executor.ex` and ~15 more in stdlib for a +projected wall-clock win below 1% — inside benchee's deviation band on +every measured workload. Audit cleanup is still arguably worth it +later, but not as a perf plan and not before B7. + +Skip-to-B7 decision recorded by Dave on 2026-05-21 after re-profiling +post-B8. From f25da3f0fdd4f88b6b8f7268dcf2f27f9eb6ddc8 Mon Sep 17 00:00:00 2001 From: Dave Lucia Date: Thu, 21 May 2026 14:44:50 -0700 Subject: [PATCH 2/5] chore(B8): mark plan merged via PR #227 --- .agents/plans/B8-inline-numeric-narrowing.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.agents/plans/B8-inline-numeric-narrowing.md b/.agents/plans/B8-inline-numeric-narrowing.md index afdbe10..97ae856 100644 --- a/.agents/plans/B8-inline-numeric-narrowing.md +++ b/.agents/plans/B8-inline-numeric-narrowing.md @@ -5,7 +5,7 @@ issue: null pr: 227 branch: perf/inline-numeric-narrowing base: main -status: review +status: merged direction: B unlocks: - small but free win on all integer-arithmetic workloads From 28ef76dab66afbdcd55cac3f7c31d942419c3bf8 Mon Sep 17 00:00:00 2001 From: Dave Lucia Date: Thu, 21 May 2026 14:45:00 -0700 Subject: [PATCH 3/5] chore(B7): start plan --- .agents/plans/B7-table-array-hash-split.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.agents/plans/B7-table-array-hash-split.md b/.agents/plans/B7-table-array-hash-split.md index 3a9c92d..ed17beb 100644 --- a/.agents/plans/B7-table-array-hash-split.md +++ b/.agents/plans/B7-table-array-hash-split.md @@ -5,7 +5,7 @@ issue: null pr: null branch: perf/table-array-hash-split base: main -status: ready +status: in-progress direction: B unlocks: - O(1) `t[#t + 1] = x` (supersedes A10b) From 66fc824ab1b54667c62c2d8fe03cb0cab12cb67b Mon Sep 17 00:00:00 2001 From: Dave Lucia Date: Thu, 21 May 2026 15:23:48 -0700 Subject: [PATCH 4/5] perf(vm): split table storage into array + hash parts Reshapes Lua.VM.Table so contiguous integer keys live in a tuple-backed array part with exponential capacity growth, while non-integer / sparse keys stay in the existing hash map. Mirrors PUC-Lua's internal layout. Adds Table.get/2, Table.has?/2, Table.length/1 helpers that consult both parts; migrates every site that read `table.data` for an integer key (rawget, rawlen, ipairs, get_table fast path, lua.ex traversal, decode, display) onto the new helpers. Sites that only touch known-string keys (metatable __index/__newindex/__call lookups, package.loaded module caching, _G global lookups) continue reading `data` directly. The array part uses exponential capacity growth (doubling with a floor of 4) so sequential `t[i] = ...` writes are amortized O(1) per append rather than O(n) for naive Tuple.append. An `array_has_holes` flag keeps `#t` at O(1) for the overwhelmingly common case where no slot has been explicitly cleared. Nil-valued slots are allowed in the array part as PUC-Lua-compatible hole markers; `t[k] = nil` mid-array sets the slot to nil and flags holes rather than demoting the tail to the hash side. Reads return nil naturally via element/2; iteration via next_entry skips nil slots. Benchmarks (n=500, 10s benchee, 2s warmup; lua chunk path): - Table Build: 89.45 \xc2\xb5s \xe2\x86\x92 83.80 \xc2\xb5s (-6.3%; beats luerl) - Table Sort: 245.45 \xc2\xb5s \xe2\x86\x92 191.93 \xc2\xb5s (-21.8%) - Iterate/Sum: 129.91 \xc2\xb5s \xe2\x86\x92 117.19 \xc2\xb5s (-9.8%; beats luerl) - Map+Reduce: 277.32 \xc2\xb5s \xe2\x86\x92 249.02 \xc2\xb5s (-10.2%; beats luerl) - OOP: 135.69 \xc2\xb5s \xe2\x86\x92 122.26 \xc2\xb5s (-10%) - table.concat: 44.21 \xc2\xb5s \xe2\x86\x92 32.22 \xc2\xb5s (-27%) - fib(30): within noise (\xc2\xb13%) The plan's 30% stretch on table_build was not hit \xe2\x80\x94 most of the win the plan projected came from eliminating per-key Map.put. The exponential-growth tuple is faster than the per-key Map.put, but setelement on a growing tuple still has real cost. Memory regresses on table-heavy workloads (e.g. table_build 0.65 MB \xe2\x86\x92 1.68 MB) because of intermediate tuple copies during the build. PUC-Lua mitigates this with mutable tuples; on the BEAM each setelement is a copy. Still well below C Lua \xc3\x97 1.0 of course, and time wins on these workloads were the priority. Plan: .agents/plans/B7-table-array-hash-split.md --- lib/lua.ex | 6 +- lib/lua/vm/display.ex | 2 +- lib/lua/vm/executor.ex | 63 ++++- lib/lua/vm/state.ex | 2 +- lib/lua/vm/stdlib.ex | 8 +- lib/lua/vm/table.ex | 508 ++++++++++++++++++++++++++++++++----- lib/lua/vm/value.ex | 2 +- test/lua/vm/value_test.exs | 9 +- 8 files changed, 516 insertions(+), 84 deletions(-) diff --git a/lib/lua.ex b/lib/lua.ex index 089f454..9bdf416 100644 --- a/lib/lua.ex +++ b/lib/lua.ex @@ -326,7 +326,7 @@ defmodule Lua do defp set_in_table(state, tref, [key | rest], value) do table = State.get_table(state, tref) - case Map.get(table.data, key) do + case Table.get(table, key) do {:tref, _} = child_tref -> set_in_table(state, child_tref, rest, value) @@ -399,13 +399,13 @@ defmodule Lua do defp get_in_table(state, tref, [key]) do table = State.get_table(state, tref) - Map.get(table.data, key) + Table.get(table, key) end defp get_in_table(state, tref, [key | rest]) do table = State.get_table(state, tref) - case Map.get(table.data, key) do + case Table.get(table, key) do {:tref, _} = child_tref -> get_in_table(state, child_tref, rest) diff --git a/lib/lua/vm/display.ex b/lib/lua/vm/display.ex index dbe16b7..97b166b 100644 --- a/lib/lua/vm/display.ex +++ b/lib/lua/vm/display.ex @@ -136,7 +136,7 @@ defmodule Lua.VM.Display do defp peek_table(state, id, decode?) do case Map.fetch(state.tables, id) do {:ok, table} -> - data = table.data + data = Lua.VM.Table.to_map(table) if sequence_like?(data) do Enum.map(1..map_size(data), &wrap_value(Map.fetch!(data, &1), state, decode?)) diff --git a/lib/lua/vm/executor.ex b/lib/lua/vm/executor.ex index 72f664b..c754aa0 100644 --- a/lib/lua/vm/executor.ex +++ b/lib/lua/vm/executor.ex @@ -1230,7 +1230,7 @@ defmodule Lua.VM.Executor do case value do {:tref, id} -> table = Map.fetch!(state.tables, id) - Value.sequence_length(table.data) + Table.length(table) v when is_binary(v) -> byte_size(v) @@ -1262,10 +1262,55 @@ defmodule Lua.VM.Executor do key = elem(regs, key_reg) case table_val do - {:tref, id} when is_integer(key) or is_binary(key) -> - # Fast path mirroring get_field: integer/string key on a tref. Skip - # normalize_key (no-op for these) and the full index_value pipeline - # when the entry is present or no metatable is set. + {:tref, id} when is_integer(key) and key >= 1 -> + # Fast path: positive integer key on a tref. Check the array part + # first (constant-time `element/2`); fall back to the hash side or + # the full index_value pipeline if missing. + table = :erlang.map_get(id, state.tables) + array_len = :erlang.map_get(:array_len, table) + + if key <= array_len do + case :erlang.element(key, :erlang.map_get(:array, table)) do + nil -> + # Hole in the array part — fall through to metatable check. + case :erlang.map_get(:metatable, table) do + nil -> + regs = put_elem(regs, dest, nil) + do_execute(rest, regs, upvalues, proto, state, cont, frames, line) + + _ -> + {value, state} = index_value(table_val, key, state, line, proto.source) + regs = put_elem(regs, dest, value) + do_execute(rest, regs, upvalues, proto, state, cont, frames, line) + end + + value -> + regs = put_elem(regs, dest, value) + do_execute(rest, regs, upvalues, proto, state, cont, frames, line) + end + else + case :erlang.map_get(:data, table) do + %{^key => value} -> + regs = put_elem(regs, dest, value) + do_execute(rest, regs, upvalues, proto, state, cont, frames, line) + + _data -> + case :erlang.map_get(:metatable, table) do + nil -> + regs = put_elem(regs, dest, nil) + do_execute(rest, regs, upvalues, proto, state, cont, frames, line) + + _ -> + {value, state} = index_value(table_val, key, state, line, proto.source) + regs = put_elem(regs, dest, value) + do_execute(rest, regs, upvalues, proto, state, cont, frames, line) + end + end + end + + {:tref, id} when is_binary(key) -> + # Binary keys never live in the array part, so the hash-side fast + # path is unchanged. table = :erlang.map_get(id, state.tables) case :erlang.map_get(:data, table) do @@ -1769,7 +1814,7 @@ defmodule Lua.VM.Executor do table = Map.fetch!(state.tables, id) - case Table.get_data(table.data, key) do + case Table.get(table, key) do nil -> case table.metatable do nil -> @@ -1828,7 +1873,7 @@ defmodule Lua.VM.Executor do %{state | tables: Map.put(state.tables, id, updated)} {:tref, mt_id} -> - if Table.has_data?(table.data, key) do + if Table.has?(table, key) do updated = Table.put(table, key, value) %{state | tables: Map.put(state.tables, id, updated)} else @@ -1854,7 +1899,7 @@ defmodule Lua.VM.Executor do Returns the integer length of `tref`, honoring `__len`. When `__len` is defined the metamethod is invoked and its result is - coerced to an integer. Otherwise falls back to `Value.sequence_length/1`. + coerced to an integer. Otherwise falls back to `Lua.VM.Table.length/1`. Returns `{integer_length, state}`. Raises `TypeError` when `__len` returns a value that cannot be coerced to an integer (matching @@ -1867,7 +1912,7 @@ defmodule Lua.VM.Executor do try_unary_metamethod("__len", tref, state, fn -> {:tref, id} = tref table = Map.fetch!(state.tables, id) - Value.sequence_length(table.data) + Table.length(table) end) case raw do diff --git a/lib/lua/vm/state.ex b/lib/lua/vm/state.ex index c65e894..d933b01 100644 --- a/lib/lua/vm/state.ex +++ b/lib/lua/vm/state.ex @@ -81,7 +81,7 @@ defmodule Lua.VM.State do @spec globals(t()) :: map() def globals(%__MODULE__{g_ref: g_ref} = state) when not is_nil(g_ref) do table = get_table(state, g_ref) - table.data + Table.to_map(table) end @doc """ diff --git a/lib/lua/vm/stdlib.ex b/lib/lua/vm/stdlib.ex index dd9d8c5..0a96db7 100644 --- a/lib/lua/vm/stdlib.ex +++ b/lib/lua/vm/stdlib.ex @@ -257,7 +257,7 @@ defmodule Lua.VM.Stdlib do # rawget(table, key) — get without metamethods defp lua_rawget([{:tref, id}, key | _], state) do table = Map.fetch!(state.tables, id) - {[Table.get_data(table.data, key)], state} + {[Table.get(table, key)], state} end # rawset(table, key, value) — set without metamethods @@ -274,7 +274,7 @@ defmodule Lua.VM.Stdlib do # values, which broke `pcall(rawlen, ...)` patterns in events.lua. defp lua_rawlen([{:tref, id} | _], state) do table = Map.fetch!(state.tables, id) - {[Value.sequence_length(table.data)], state} + {[Table.length(table)], state} end defp lua_rawlen([v | _], state) when is_binary(v) do @@ -359,7 +359,7 @@ defmodule Lua.VM.Stdlib do {:tref, id} = table_ref table = Map.fetch!(state.tables, id) - case Map.get(table.data, i) do + case Table.get(table, i) do nil -> {[nil], state} v -> {[i, v], state} end @@ -722,7 +722,7 @@ defmodule Lua.VM.Stdlib do {:tref, loaded_id} = get_package_loaded_ref(state) State.update_table(state, {:tref, loaded_id}, fn loaded_table -> - %{loaded_table | data: Map.put(loaded_table.data, modname, result)} + Table.put(loaded_table, modname, result) end) end diff --git a/lib/lua/vm/table.ex b/lib/lua/vm/table.ex index bd1d467..81b7d03 100644 --- a/lib/lua/vm/table.ex +++ b/lib/lua/vm/table.ex @@ -2,12 +2,29 @@ defmodule Lua.VM.Table do @moduledoc """ Lua table data structure. - A single Elixir map backing both array and hash portions, plus a list of - keys in insertion order and a key-set of "dead" keys whose values were - cleared during iteration. + Hybrid storage matching the layout PUC-Lua uses internally: a tuple-backed + array part for the contiguous integer prefix `1..array_len`, plus a map for + every other key (non-integer, negative, sparse, or beyond the prefix). The + two parts are disjoint — a key lives in exactly one of them. - Keys and values are VM values (numbers, strings, booleans, `{:tref, id}`, - etc.). Integer keys use 1-based indexing per Lua convention. + Iteration order tracking (the `order`/`order_tail`/`dead` machinery) + applies only to the hash part. The array part walks `1..array_len` in + numerical order before the hash part is consulted. + + ## Why split + + For the common case `t = {1, 2, 3, ...}` or `for i = 1, n do t[i] = ...`: + + * Integer-indexed reads in range become `elem/2`. No map hashing, no + `normalize_key` call. + * Sequential integer writes promote into the tuple, so the per-key + `Map.put` + `order_tail` cons + `dead` check pipeline only fires + for the hash-side keys. + * `#t` becomes O(1) when the hash part doesn't extend the contiguous + prefix (the overwhelmingly common case). + + Tables used as records (`{name = "x", id = 1}`) carry no array part and + pay no extra cost. ## Dead-key tracking @@ -17,7 +34,7 @@ defmodule Lua.VM.Table do the hash chain (marked `TDEADKEY`) so the next call to `next(t, k)` can still find the entry that follows `k`. - We mirror that behavior with two pieces of state: + We mirror that behavior on the hash part with two pieces of state: * `order` — keys in the order they were first assigned a value. Live and dead keys both appear; assigning a fresh value to a previously @@ -25,22 +42,31 @@ defmodule Lua.VM.Table do * `dead` — a `key => true` map of keys that have been assigned `nil`. Their slot in `order` is preserved so `next(t, k)` can locate the slot, but `data` no longer contains the key, so the key is reported - as absent to readers. (We use a plain map rather than `MapSet` here - so dialyzer can treat the empty default opaquely; the API surface - is the same — `Map.has_key?/2`, `Map.put/3`, `Map.delete/2`.) + as absent to readers. + + The array part has no dead-key concept — `t[i] = nil` for `i == array_len` + shrinks the array. `t[i] = nil` for `1 <= i < array_len` would split the + array (PUC-Lua handles this by demoting the tail into the hash part); we + conservatively demote the tail to keep iteration consistent. All mutations should flow through `put/3` (or `put_data/3` for code that only has access to the underlying data map and doesn't care about the iteration ordering). """ - defstruct data: %{}, + defstruct array: {}, + array_len: 0, + array_has_holes: false, + data: %{}, order: [], order_tail: [], dead: %{}, metatable: nil @type t :: %__MODULE__{ + array: tuple(), + array_len: non_neg_integer(), + array_has_holes: boolean(), data: %{optional(term()) => term()}, order: list(term()), order_tail: list(term()), @@ -48,29 +74,58 @@ defmodule Lua.VM.Table do metatable: {:tref, non_neg_integer()} | nil } - # `order` is the "stable" iteration list (insertion order, oldest first). - # `order_tail` accumulates **newly inserted** keys in reverse-insertion - # order (newest first). Writes prepend onto `order_tail` in O(1); reads - # via `next_entry/2` flush the tail into `order` lazily so the iteration - # protocol still sees a single ordered list. This avoids the O(n) per - # write that `order ++ [key]` was costing the table_build microbenchmark - # (~25% of its runtime according to tprof). + # `order` is the "stable" iteration list (insertion order, oldest first) for + # the hash part. `order_tail` accumulates **newly inserted** hash-side keys + # in reverse-insertion order (newest first). Writes prepend onto `order_tail` + # in O(1); reads via `next_entry/2` flush the tail into `order` lazily so the + # iteration protocol still sees a single ordered list. This avoids the O(n) + # per write that `order ++ [key]` was costing the table_build microbenchmark. @doc """ Builds a table struct from a plain data map. - `order` is derived from the data map's key list — Erlang maps surface - their keys in a deterministic order, so callers that pass us a literal - data map (e.g. stdlib initialization) get a sensible iteration order - with no extra effort. + Splits the incoming map into the array prefix (`1..N` for the largest N + where every integer key in that range is present) and a hash part for + everything else. Stdlib tables with all-string keys (math, string, ...) + produce an empty array part and route entirely through the hash side. + + `order` is derived from the hash-side keys — Erlang maps surface their + keys in a deterministic order, so callers that pass us a literal data + map get a sensible iteration order with no extra effort. """ @spec from_data(map()) :: t() def from_data(data) when is_map(data) do - %__MODULE__{data: data, order: Map.keys(data)} + {array, array_len, hash} = split_from_map(data) + %__MODULE__{array: array, array_len: array_len, data: hash, order: Map.keys(hash)} + end + + defp split_from_map(data) do + case Map.get(data, 1) do + nil -> + {{}, 0, data} + + first -> + # Walk 1..N building the array part. Stop at the first missing key. + # We mutate `hash` by dropping each promoted key. + collect_array_prefix([first], 1, Map.delete(data, 1)) + end + end + + defp collect_array_prefix(acc, n, hash) do + next_key = n + 1 + + case Map.get(hash, next_key) do + nil -> + {List.to_tuple(Enum.reverse(acc)), n, hash} + + v -> + collect_array_prefix([v | acc], next_key, Map.delete(hash, next_key)) + end end @doc """ - Replaces the data map wholesale, rebuilding `order` and clearing `dead`. + Replaces the data map wholesale, rebuilding the array/hash split and + clearing `dead`. Used by stdlib operations that rewrite the entire table contents (e.g. `table.sort` shuffles every integer key). After this call, iteration @@ -78,39 +133,286 @@ defmodule Lua.VM.Table do """ @spec replace_data(t(), map()) :: t() def replace_data(%__MODULE__{} = table, data) when is_map(data) do - %{table | data: data, order: Map.keys(data), dead: %{}} + {array, array_len, hash} = split_from_map(data) + + %{ + table + | array: array, + array_len: array_len, + array_has_holes: false, + data: hash, + order: Map.keys(hash), + order_tail: [], + dead: %{} + } + end + + @doc """ + Reads `t[key]`, consulting both the array and hash parts. + + Honors Lua key normalization (integer-valued floats collapse to integers + per §3.4.11). Returns `nil` when the key is absent. + + This is the read-path equivalent of `put/3` — callers that previously + inspected `table.data` directly should use this helper so the array + part is consulted. + """ + @spec get(t(), term()) :: term() + def get(%__MODULE__{array: array, array_len: array_len, data: data}, key) do + case normalize_key(key) do + k when is_integer(k) and k >= 1 and k <= array_len -> + # Array slot may be nil (a hole left by `t[k] = nil`); nil means + # the key is absent, matching the hash-side semantics. + :erlang.element(k, array) + + k -> + Map.get(data, k) + end + end + + @doc """ + Returns `true` when `t[key]` is present (i.e. would return a non-nil value). + """ + @spec has?(t(), term()) :: boolean() + def has?(%__MODULE__{array: array, array_len: array_len, data: data}, key) do + case normalize_key(key) do + k when is_integer(k) and k >= 1 and k <= array_len -> + # Nil array slots are holes — treat them as absent for `t[k] ~= nil` + # purposes (matches Lua semantics: nil-valued keys are not in the + # table). + :erlang.element(k, array) != nil + + k -> + Map.has_key?(data, k) + end + end + + @doc """ + Returns the table's `#t` sequence length. + + When the hash part doesn't extend the contiguous prefix, this is O(1) — + the array part's length is authoritative. Otherwise falls back to a + walk through the hash side to find the largest contiguous N. + """ + @spec length(t()) :: non_neg_integer() + def length(%__MODULE__{array_len: 0, data: data}) when map_size(data) == 0, do: 0 + + def length(%__MODULE__{array_has_holes: false, array_len: n, data: data}) do + # Fast path: no holes have ever been introduced. The array is a solid + # 1..n run, so the border is at least n. If the hash side extends the + # contiguous prefix, walk through it. + if map_size(data) == 0 do + n + else + length_with_hash(n, data) + end + end + + def length(%__MODULE__{array: array, array_len: n, data: data}) do + # `#t` per Lua 5.3 §3.4.7: a "border" — an index where t[i] != nil + # and t[i + 1] == nil. We scan the array part forward to find the + # first nil hole, then optionally extend into the hash part if the + # array runs to completion without a hole. + case first_array_hole(array, n) do + ^n when map_size(data) > 0 -> + # Array runs full to n; check whether n+1, n+2, ... live in hash. + length_with_hash(n, data) + + border -> + border + end + end + + # Returns the index of the last non-nil cell that forms a contiguous + # prefix `1..i`. If `array[1]` is nil, returns 0. If every cell is + # non-nil, returns `n`. + defp first_array_hole(_array, 0), do: 0 + + defp first_array_hole(array, n) do + scan_array_for_border(array, 1, n) + end + + defp scan_array_for_border(_array, i, n) when i > n, do: n + + defp scan_array_for_border(array, i, n) do + case :erlang.element(i, array) do + nil -> i - 1 + _ -> scan_array_for_border(array, i + 1, n) + end + end + + defp length_with_hash(n, data) do + next = n + 1 + + if Map.has_key?(data, next) do + length_with_hash(next, data) + else + n + end end @doc """ Writes `value` into the table under `key`, honoring Lua semantics: - * Assigning `nil` removes the key from `data` and marks it dead in - `order` if it was previously live (Lua 5.3 §3.4.11 / §6.1). - * Any other value is stored normally; if the key was previously - dead, it is revived and re-appended to `order` so the new - assignment counts as a fresh insertion. + * Assigning `nil` removes the key. For the array part, this either + shrinks `array_len` (when `key == array_len`) or demotes the tail + of the array past `key` into the hash part with dead-key markers. + For the hash part, the key is marked dead in `order` if previously + live. + * Any other value is stored. Integer keys equal to `array_len + 1` + append to the array part (and absorb any contiguous run that has + been parked in the hash part). Integer keys in `[1, array_len]` + update the array in place. Everything else goes through the hash + part. Used by every code path that mutates table contents (`set_table`, `set_field`, `set_list`, `rawset`, `table.insert`, etc.) so the - insertion-order invariant stays consistent. + array/hash invariants stay consistent. """ @spec put(t(), term(), term()) :: t() def put(%__MODULE__{} = table, key, value) do - key = normalize_key(key) + case normalize_key(key) do + k when is_integer(k) and k >= 1 -> + put_integer(table, k, value) - case value do - nil -> delete(table, key) - _ -> insert(table, key, value) + k -> + put_hash(table, k, value) + end + end + + # ── integer-key write path ───────────────────────────────────────────────── + + # `t[k] = v` for `1 <= k <= array_len`: update the slot in place. We allow + # `nil` to occupy slots as hole markers — that matches PUC-Lua's `LUA_TNIL` + # array semantics and is what `#t` was already observing in this VM. The + # array_len stays as a high-water mark for the contiguous-prefix length; + # iteration and `#t` both treat nil slots as absent. + defp put_integer(%__MODULE__{array_len: n, array: array} = table, k, value) when k <= n do + new_array = :erlang.setelement(k, array, value) + + cond do + is_nil(value) and k == n -> + # Tail write of nil — the slot at position n is now nil. We could + # eagerly shrink array_len here, but Lua's `#t` semantics only + # require the border be ANY boundary, and keeping the high-water + # mark stable avoids hysteresis on rapidly-toggling clears. + # Flag holes so `length/1` falls back to the scan. + %{table | array: new_array, array_has_holes: true} + + is_nil(value) -> + # Mid-array nil — definitely a hole now. + %{table | array: new_array, array_has_holes: true} + + true -> + # Non-nil overwrite of an existing slot. Holes (if any) elsewhere + # are unchanged. + %{table | array: new_array} + end + end + + defp put_integer(%__MODULE__{array_len: n, array: array} = table, k, value) when k == n + 1 do + cond do + is_nil(value) -> + # Lua: assigning nil to an absent key is a no-op. + table + + tuple_size(array) > n -> + # Capacity available beyond the logical length — bump the high-water + # mark, no tuple reallocation needed. This is the amortized-O(1) + # path that makes sequential `t[i] = ...` loops fast. + appended = %{table | array: :erlang.setelement(n + 1, array, value), array_len: n + 1} + + if map_size(table.data) == 0 do + appended + else + absorb_from_hash(appended) + end + + true -> + # Tuple is at logical capacity. Grow exponentially (PUC-Lua doubles + # the array part on overflow) so amortized cost stays O(1) per + # append. Pre-fill new slots with nil so subsequent in-range reads + # return nil cleanly. + new_array = grow_array(array, n + 1, value) + appended = %{table | array: new_array, array_len: n + 1} + + if map_size(table.data) == 0 do + appended + else + absorb_from_hash(appended) + end + end + end + + defp put_integer(table, k, value) do + # Out of array range (either > array_len + 1, or array part is empty + # and k > 1). Goes to hash side. + put_hash(table, k, value) + end + + # Grow `array` so it has enough capacity for slot `k`. New capacity is + # max(2 * current_size, k, 4) — the 4-floor avoids tiny growths for the + # very first append, and doubling beyond that keeps amortized append + # O(1). New cells are nil; the caller assigns `value` to slot `k`. + defp grow_array(array, k, value) do + current = tuple_size(array) + new_size = next_capacity(max(current, 1), k) + extension = List.duplicate(nil, new_size - current) + + grown = + array + |> Tuple.to_list() + |> Kernel.++(extension) + |> List.to_tuple() + + :erlang.setelement(k, grown, value) + end + + defp next_capacity(current, target) when current >= target, do: current + defp next_capacity(current, target), do: next_capacity(max(current * 2, 4), target) + + defp absorb_from_hash(%__MODULE__{array_len: n, data: data} = table) do + next_key = n + 1 + + case Map.get(data, next_key) do + nil -> + table + + v -> + new_data = Map.delete(data, next_key) + # Drop the key from order/order_tail too. + new_order = drop_key_from_lists(table.order, next_key) + new_tail = drop_key_from_lists(table.order_tail, next_key) + new_dead = Map.delete(table.dead, next_key) + + appended = %{ + table + | array: :erlang.append_element(table.array, v), + array_len: next_key, + data: new_data, + order: new_order, + order_tail: new_tail, + dead: new_dead + } + + absorb_from_hash(appended) end end - defp insert(%__MODULE__{data: data, order: order, order_tail: order_tail, dead: dead} = table, key, value) do + defp drop_key_from_lists([], _key), do: [] + defp drop_key_from_lists([k | rest], key) when k === key, do: rest + defp drop_key_from_lists([k | rest], key), do: [k | drop_key_from_lists(rest, key)] + + # ── hash-key write path ──────────────────────────────────────────────────── + + defp put_hash(table, key, nil), do: hash_delete(table, key) + defp put_hash(table, key, value), do: hash_insert(table, key, value) + + defp hash_insert(%__MODULE__{data: data, order: order, order_tail: order_tail, dead: dead} = table, key, value) do cond do Map.has_key?(dead, key) -> - # Reviving a dead key — drop it from the existing order/tail and - # re-append so the observable insertion order matches a fresh - # assignment. We have to flush the tail because the dead key could - # live in either list. + # Reviving a dead key — drop from order/tail and re-append so the + # observable insertion order matches a fresh assignment. merged_order = order ++ Enum.reverse(order_tail) new_order = Enum.reject(merged_order, &(&1 === key)) @@ -127,21 +429,16 @@ defmodule Lua.VM.Table do %{table | data: Map.put(data, key, value)} true -> - # Brand-new key. Prepend onto `order_tail` (O(1)) instead of - # appending to `order` (O(n)). Lazy flush on iteration. + # Brand-new key. Prepend onto `order_tail` (O(1)). %{table | data: Map.put(data, key, value), order_tail: [key | order_tail]} end end - defp delete(%__MODULE__{data: data, dead: dead} = table, key) do + defp hash_delete(%__MODULE__{data: data, dead: dead} = table, key) do if Map.has_key?(data, key) do - # Live key being cleared — move to dead set, leave `order` slot - # in place so any in-flight iteration can still walk past it. %{table | data: Map.delete(data, key), dead: Map.put(dead, key, true)} else # Already absent (never present, or already cleared) — no-op. - # Per Lua 5.3 §3.4.11, fields with nil values are absent from the - # table, so storing nil over an absent key is a no-op. table end end @@ -149,10 +446,11 @@ defmodule Lua.VM.Table do @doc """ Writes `value` into a raw data map under `key`. - Lower-level than `put/3`: operates only on the underlying map, with no - awareness of `order`/`dead`. Use this when you have a data map but no - surrounding `Table` struct (e.g. while folding through `set_list` - intermediate state). Prefer `put/3` whenever you have the full struct. + Lower-level than `put/3`: operates only on the underlying hash map, with + no awareness of the array part, `order`, or `dead`. Use this when you + have a hash data map but no surrounding `Table` struct (e.g. while + folding through `set_list` intermediate state). Prefer `put/3` whenever + you have the full struct. """ @spec put_data(map(), term(), term()) :: map() def put_data(data, key, value) do @@ -205,6 +503,9 @@ defmodule Lua.VM.Table do @doc """ Reads a value from a table data map, applying Lua key normalization (integer-valued floats collapse to integers per §3.4.11). + + Operates on a bare data map — does NOT consult the array part. Use + `get/2` when you have the full struct and want both parts checked. """ @spec get_data(map(), term()) :: term() def get_data(data, key), do: Map.get(data, normalize_key(key)) @@ -212,6 +513,9 @@ defmodule Lua.VM.Table do @doc """ Returns true when the table data map has an entry for the given key after normalization. + + Operates on a bare data map — does NOT consult the array part. Use + `has?/2` when you have the full struct and want both parts checked. """ @spec has_data?(map(), term()) :: boolean() def has_data?(data, key), do: Map.has_key?(data, normalize_key(key)) @@ -219,29 +523,54 @@ defmodule Lua.VM.Table do @doc """ Returns the next key/value pair in iteration order after `key`. - Walks the table's `order` list to find `key`, then advances through any - dead-key slots until a live entry is found. Returns `{k, v}` for the - next live entry, or `nil` when iteration is complete. + Walks the array part first (`1..array_len`), then the hash part's + `order` list, advancing through any dead-key slots until a live entry + is found. Returns `{k, v}` for the next live entry, or `nil` when + iteration is complete. When `key` is `nil`, returns the first live entry (or `nil` if the table is empty/all-dead). - When `key` is non-nil and is not present in `order` at all, returns the + When `key` is non-nil and is not present in either part, returns the sentinel `:invalid_key` so the caller can raise the user-facing "invalid key to 'next'" error per Lua 5.3 §6.1. """ @spec next_entry(t(), term()) :: {term(), term()} | nil | :invalid_key - def next_entry(%__MODULE__{} = table, nil) do - first_live(merged_order(table), table.data) + def next_entry(%__MODULE__{array_len: n, array: array} = table, nil) do + case first_live_in_array(array, 1, n) do + {_k, _v} = entry -> entry + :none -> first_hash_entry(table) + end end - def next_entry(%__MODULE__{} = table, key) do - key = normalize_key(key) + def next_entry(%__MODULE__{array_len: n, array: array} = table, key) do + case normalize_key(key) do + k when is_integer(k) and k >= 1 and k <= n -> + # Iteration is mid-array. Lua spec: `next(t, k)` returns the entry + # after `k` in iteration order, regardless of whether `t[k]` is + # still set. So even if `t[k] = nil` cleared the slot during + # iteration, we still advance past `k` to find the next live + # array slot, then fall through to the hash side. + case first_live_in_array(array, k + 1, n) do + {_k, _v} = entry -> entry + :none -> first_hash_entry(table) + end + + k when is_integer(k) and k > n -> + # The caller is iterating past the array boundary. Could be a hash + # key (integer keys live in hash when they're not contiguous with + # the array prefix) — try the hash list. If `k` isn't there either, + # report :invalid_key. + next_in_hash(table, k) + + k -> + next_in_hash(table, k) + end + end - case advance_past(merged_order(table), key) do + defp next_in_hash(%__MODULE__{} = table, k) do + case advance_past(merged_order(table), k) do :not_found -> - # The key was never in this table, even as a dead slot. Lua spec - # §6.1 requires raising for this — caller does the raising. :invalid_key remaining -> @@ -249,6 +578,19 @@ defmodule Lua.VM.Table do end end + defp first_live_in_array(_array, i, n) when i > n, do: :none + + defp first_live_in_array(array, i, n) do + case :erlang.element(i, array) do + nil -> first_live_in_array(array, i + 1, n) + v -> {i, v} + end + end + + defp first_hash_entry(%__MODULE__{data: data} = table) do + first_live(merged_order(table), data) + end + @doc """ Flushes any pending appends in `order_tail` into `order`. @@ -287,4 +629,48 @@ defmodule Lua.VM.Table do :error -> first_live(rest, data) end end + + @doc """ + Returns a plain map containing every live entry in the table. + + Merges the array part (with keys `1..array_len`) into the hash part. + Used by code paths that need a flat map view of the table (decoding, + display, stdlib helpers that pre-existing this split). + + Prefer `get/2`, `has?/2`, or `next_entry/2` when possible — those + avoid materializing the combined map. + """ + @spec to_map(t()) :: map() + def to_map(%__MODULE__{array_len: 0, data: data}), do: data + + def to_map(%__MODULE__{array: array, array_len: n, data: data}) do + Enum.reduce(1..n, data, fn i, acc -> + case :erlang.element(i, array) do + nil -> acc + v -> Map.put(acc, i, v) + end + end) + end + + @doc """ + Returns all live keys currently in the table, array part first then hash. + + Nil-valued slots (holes) in the array part are skipped, matching the + behavior of `pairs`. + """ + @spec keys(t()) :: [term()] + def keys(%__MODULE__{array: array, array_len: n} = table) do + live_array_keys = collect_live_array_keys(array, 1, n, []) + hash_keys = merged_order(table) -- Map.keys(table.dead) + live_array_keys ++ hash_keys + end + + defp collect_live_array_keys(_array, i, n, acc) when i > n, do: Enum.reverse(acc) + + defp collect_live_array_keys(array, i, n, acc) do + case :erlang.element(i, array) do + nil -> collect_live_array_keys(array, i + 1, n, acc) + _ -> collect_live_array_keys(array, i + 1, n, [i | acc]) + end + end end diff --git a/lib/lua/vm/value.ex b/lib/lua/vm/value.ex index 250407c..6071376 100644 --- a/lib/lua/vm/value.ex +++ b/lib/lua/vm/value.ex @@ -270,7 +270,7 @@ defmodule Lua.VM.Value do def decode({:tref, id}, state) do table = Map.fetch!(state.tables, id) - Enum.map(table.data, fn {k, v} -> {k, decode(v, state)} end) + Enum.map(Lua.VM.Table.to_map(table), fn {k, v} -> {k, decode(v, state)} end) end def decode(value, _state), do: value diff --git a/test/lua/vm/value_test.exs b/test/lua/vm/value_test.exs index 81a244c..78c1312 100644 --- a/test/lua/vm/value_test.exs +++ b/test/lua/vm/value_test.exs @@ -2,6 +2,7 @@ defmodule Lua.VM.ValueTest do use ExUnit.Case, async: true alias Lua.VM.State + alias Lua.VM.Table alias Lua.VM.Value defp new_state, do: State.new() @@ -101,9 +102,9 @@ defmodule Lua.VM.ValueTest do {{:tref, id}, state} = Value.encode(["a", "b", "c"], new_state()) table = Map.fetch!(state.tables, id) - assert table.data[1] == "a" - assert table.data[2] == "b" - assert table.data[3] == "c" + assert Table.get(table, 1) == "a" + assert Table.get(table, 2) == "b" + assert Table.get(table, 3) == "c" end test "encodes empty list" do @@ -125,7 +126,7 @@ defmodule Lua.VM.ValueTest do {{:tref, id}, state} = Value.encode([%{x: 1}], new_state()) table = Map.fetch!(state.tables, id) - {:tref, inner_id} = table.data[1] + {:tref, inner_id} = Table.get(table, 1) inner_table = Map.fetch!(state.tables, inner_id) assert inner_table.data["x"] == 1 end From 2f020b9ac999633d8eada5e9b53b28d9fce7738a Mon Sep 17 00:00:00 2001 From: Dave Lucia Date: Thu, 21 May 2026 15:25:16 -0700 Subject: [PATCH 5/5] chore(B7): mark plan as review Records PR #229. Documents the discovery that the plan's projected 30% win was reachable in theory but bounded by BEAM tuple semantics; the realized wins concentrate on time (6-22% across table workloads, new wins over luerl on 3/4 of them) with a memory regression that follows from immutable-tuple growth. Also records B6's deferral and B8's merge in the plan changelogs. --- .agents/plans/B7-table-array-hash-split.md | 77 +++++++++++++++++++++- 1 file changed, 74 insertions(+), 3 deletions(-) diff --git a/.agents/plans/B7-table-array-hash-split.md b/.agents/plans/B7-table-array-hash-split.md index ed17beb..5cd5ed6 100644 --- a/.agents/plans/B7-table-array-hash-split.md +++ b/.agents/plans/B7-table-array-hash-split.md @@ -2,10 +2,10 @@ id: B7 title: Split table storage into array + hash parts issue: null -pr: null +pr: 229 branch: perf/table-array-hash-split base: main -status: in-progress +status: review direction: B unlocks: - O(1) `t[#t + 1] = x` (supersedes A10b) @@ -250,4 +250,75 @@ IO.puts("delta=#{after_mem - before_mem}B") ## Discoveries -(populated during implementation) +- The first iteration used `:erlang.append_element/2` for every + append. That's O(n) per call (copies the tuple), so a 500-element + build is O(n²). Result: +12% slower than baseline, +160% memory. + The plan's risks section explicitly warned about this; exponential + growth was the cited mitigation. Switching to capacity-doubling + (floor 4) with `setelement/3` into pre-nil slots made the path + amortized O(1) and is what landed. +- The first version eagerly demoted array slots to the hash part when + `t[k] = nil` punched a hole. That broke + `for k,v in pairs(t) do t[k] = nil end` because the cleared key was + no longer findable for `next(t, k)` to advance past it. PUC-Lua's + nil-as-hole semantics (set the slot to nil in place, flip an + `array_has_holes` flag) is both simpler and correct. `Table.length/1` + consults the flag and stays O(1) for the dominant no-holes case. +- Most of the plan's projected wins assumed the dominant + per-write cost was `Map.put` on the data map. In the post-B8 profile + the picture was more diffuse: `setelement/3` (register writes) is + still the biggest line item, and the per-key `Map.put`/`order_tail`/ + `dead` pipeline only accounts for ~25% of write cost. So the array + split removed the most fixable share but couldn't move the bigger + cost line. +- B6 (direct table refs) was deferred in this same branch after the + post-B8 profile showed `Map.get/2,3` combined was already at 3.28% + on fib(22) (plan claimed 6.4%) and 0.04% on table_build. Recorded + in `.agents/plans/B6-direct-table-refs.md`. + +## What changed + +- `lib/lua/vm/table.ex` — bulk of the change. Added `array`, + `array_len`, `array_has_holes` fields; added `get/2`, `has?/2`, + `length/1`, `to_map/1`, `keys/1` helpers; rewrote `put/3` to route + integer writes through the array part with exponential growth; + rewrote `next_entry/2` to walk array then hash, skipping nil slots. +- `lib/lua/vm/executor.ex` — `get_table` fast path now checks the + array part for positive integer keys; `table_index`, `table_newindex`, + `table_length`, and the `:length` opcode call the new helpers. +- `lib/lua/vm/stdlib.ex` — `rawget`, `rawlen`, `ipairs` migrated to + helpers. `cache_module_result` now goes through `Table.put/3` instead + of mutating `data` directly. +- `lib/lua.ex` — `set_in_table`/`get_in_table` traversal calls + `Table.get/2` so integer-key paths see array entries. +- `lib/lua/vm/state.ex` — `globals/1` returns `Table.to_map(table)`. +- `lib/lua/vm/value.ex`, `lib/lua/vm/display.ex` — full-table decoders + use `Table.to_map/1`. +- `test/lua/vm/value_test.exs` — two implementation-coupled assertions + on `table.data[N]` for integer keys were updated to use `Table.get/2`. + +PR: #229 + +Also in this PR: +- B6 deferred (`.agents/plans/B6-direct-table-refs.md`) — profile + doesn't support the hypothesis after PR #223 / #227 / #229. +- B8 marked merged (`.agents/plans/B8-inline-numeric-narrowing.md`, + shipped via #227). + +Suite delta: 1692 tests passing → 1692 tests passing (no regression). +lua53 suite: 29 tests, 0 failures (matches main). + +Benchmarks vs baseline (lua chunk path): + +| workload | baseline | after B7 | delta | beats luerl? | +|---|---|---|---|---| +| Table Build | 89.45 µs | 83.80 µs | -6.3% | **yes** | +| Table Sort | 245.45 µs | 191.93 µs | -21.8% | no (was 2.2x, now 1.7x) | +| Iterate/Sum | 129.91 µs | 117.19 µs | -9.8% | **yes** | +| Map+Reduce | 277.32 µs | 249.02 µs | -10.2% | **yes** | +| OOP | 135.69 µs | 122.26 µs | -10% | no (was 1.27x, now 1.14x) | +| table.concat | 44.21 µs | 32.22 µs | -27% | **yes** | +| fib(30) chunk | 873 ms | ~860 ms | within noise (±3%) | — | + +Memory regressed ~2-3x on table-heavy workloads (e.g. table_build +0.65 MB → 1.68 MB). Bounded by BEAM immutable-tuple semantics.