Generate client-ready Ethereum databases for geth, reth, besu, and nethermind — without going through each client's init path.
Quick Start • Usage • Runbook • Spec format • Architecture • For agents
Tip
If you want Claude, Codex, Cursor, or another coding agent to operate
state-actor for you, point it at AGENTS.md (the entry
pointer) or directly at docs/SKILL.md (the deep doc).
The canonical syntax reference for what --spec can express is
examples/full-matrix-spec-feature.yaml —
CI keeps it correct.
You want a pre-populated Ethereum database that a client can boot against directly — for cross-client determinism tests, devnet bootstrapping, EIP-7702 / ERC-20 fixtures, or state-bloat experiments. The alternative (run the client's init against a genesis with millions of alloc entries) is slow and client-specific. State Actor writes each client's on-disk format directly: Pebble for geth, MDBX + RocksDB + nippy-jar for reth, single RocksDB + 8 Bonsai column families for besu, seven RocksDB instances for nethermind.
Three flags carry most of the weight: --client (which client's format to write), --spec (concrete entities to include, declared in YAML), --target-size (the DB-size budget; auto-fills mainnet-shaped 20 % / 10 % / 70 % across account-trie / bytecode / storage up to the cap). One of --spec or --target-size is required; everything else has a sane default.
# Smoke test: a small geth DB, no spec — auto-fill emits mainnet-shaped state.
go run . --client=geth --db=/tmp/sa-geth/geth/chaindata --target-size=100MB
# Spec-driven: load a curated YAML verbatim (no --target-size, so the
# spec is never truncated and no auto-fill runs on top).
go run . --client=geth --db=/tmp/sa-spec/geth/chaindata \
--spec=examples/spec-minimal.yaml
# Pick a different client (Docker required for cgo clients).
go run . --client=reth --db=/tmp/sa-reth \
--spec=examples/spec-minimal.yamlAfter the run, boot the client against the produced datadir — the per-client recipes are in docs/RUNBOOK.md.
git clone https://github.com/nerolation/state-actor.git
cd state-actor
go build -o state-actor . # geth client only (pure Go)
docker build -f Dockerfile.reth -t state-actor-reth . # cgo clientsbesu, nethermind, and reth need cgo bindings (RocksDB / MDBX). On macOS, build them via Docker; per-client Dockerfile.<client> files ship in this repo.
Geth has a pure-Go writer. The path you pass to --db must end in /geth/chaindata — geth itself appends that suffix to its --datadir.
state-actor --client=geth --db=/tmp/sa-geth/geth/chaindata --target-size=1GBdocker run --rm -v /tmp/sa-reth:/data state-actor-reth \
./state-actor --client=reth --db=/data --target-size=1GBSubstitute besu / nethermind for reth (and pick the matching Dockerfile). The on-disk layout is documented per client in docs/RUNBOOK.md.
The --spec flag points at a YAML file describing exactly which contracts and EOAs to write — ERC-20 tokens with chosen sizes, EIP-7702 delegating EOAs, raw bytecode contracts, address-mode demonstrations. See docs/SPEC.md for the schema; examples/README.md is the picker.
state-actor --client=geth --db=/tmp/sa/geth/chaindata \
--spec=examples/spec-erc20-mixed-sizes.yamlWithout --target-size, only the spec entities are written — no synthetic fill runs on top, so there's no risk of random EOAs colliding with spec-derived addresses.
--target-size is an upper bound on the projected trie footprint of the whole generated database. When set, the auto-fill emits mainnet-shaped synthetic state (20 % account-trie / 10 % bytecode / 70 % storage) up to the cap. With --spec, the spec entities count first; the auto-fill fills the headroom after their projected cost. If the spec alone would exceed the budget, the spec is truncated to the longest prefix that fits, with a warning on stderr; no auto-fill runs in that case. To generate a spec verbatim with no synthetic fill, omit --target-size.
state-actor --client=reth --db=/tmp/sa --target-size=10GBAccepted suffixes: KB, MB, GB, TB (base-1024). Bare numbers are bytes.
state-actor --client=geth --db=/tmp/sa/geth/chaindata \
--chain-id=12345 \
--fork=osaka \
--gas-limit=60000000 \
--timestamp=1700000000 \
--extra-data=0xdeadbeefRun state-actor --list-forks for accepted --fork values. The default fork is the latest one each --client supports (currently osaka across all four).
The boot command differs per client. The full recipes — verbatim from each client's TestE2ESuite — are in docs/RUNBOOK.md.
| Client | Recipe |
|---|---|
| geth | docs/RUNBOOK.md#geth |
| reth | docs/RUNBOOK.md#reth |
| besu | docs/RUNBOOK.md#besu |
| nethermind | docs/RUNBOOK.md#nethermind |
A spec file lists entities — EOAs or contracts — with explicit addresses, name-derived addresses (keccak256(seed || name)[12:]), or position-derived addresses. Contracts can use a template (erc20 is shipped today) or carry raw bytecode. EIP-7702 delegating EOAs are first-class.
entities:
- kind: eoa
address: 0x1111111111111111111111111111111111111111
balance: "1000000000000000000"
- kind: contract
name: usdc-mock # name-derived address
template: erc20
parameters: { symbol: USDC, name: USD Coin, decimals: 18, total_owners: 1000 }
approximate_size_bytes: 100000000 # ~100 MB synthetic storageFull schema: docs/SPEC.md. Curated examples: examples/README.md.
| Flag | Type | Default | Purpose |
|---|---|---|---|
--db |
string | (required) | Path to the database directory |
--client |
string | geth |
Target client: geth, nethermind, besu, reth |
--spec |
string | (none) | Path to a YAML state-spec file; see docs/SPEC.md. Required if --target-size is unset. |
--target-size |
string | (none) | Upper bound on the whole DB (5GB, 500MB, …). Required if --spec is unset. With --spec, auto-fill fills the headroom after the spec; without --spec, drives the whole DB. Auto-fill emits a fixed 20 % / 10 % / 70 % split across account-trie / bytecode / storage. |
--seed |
int | 1 | Random seed; --seed=0 randomises (footgun) |
--fork |
string | (latest) | Hard fork active at genesis; --list-forks lists choices |
--chain-id |
int | 1337 | Chain ID embedded in the synthesized chainspec |
--gas-limit |
uint | 30000000 | Genesis block gas limit |
--timestamp |
uint | 0 | Genesis block timestamp (unix seconds) |
--extra-data |
string | (empty) | Genesis block extraData (hex) |
--archive |
bool | false | Archive-mode metadata; geth + reth only |
--binary-trie |
bool | false | EIP-7864 binary trie; geth only |
--group-depth |
int | 8 | Binary-trie group depth (1-8) |
--list-forks |
bool | false | Print accepted --fork values and exit |
--verbose |
bool | false | Verbose output |
--benchmark |
bool | false | Print detailed timing stats |
Run state-actor --help for the canonical list (this table is a snapshot).
- You need a real testnet's history. State Actor writes one genesis block; there is no chain to replay.
- You need post-genesis transactions in the DB. The output is genesis state; drive the chain forward with a separate tool (
spamoor, your own test harness). - The exact byte-shape of mainnet trie nodes matters more than state size. State Actor synthesises shapes; it does not mirror live state.
See docs/ARCHITECTURE.md for the deep dive (writer phases, per-client format notes, the streaming-trie / streaming-sort packages). The short version: entity generation streams into a per-client writer that produces the client-native database directly, with cross-client determinism guaranteed by internal/sizecal/'s single global bytesPerSlot constant (identical across all four clients by design).
go test ./... # full suite
go test -run TestE2ESuite ./client/... # per-client end-to-end (Docker required for cgo clients)
go test -short ./... # skip the e2e suitesCI matrix lives in .github/workflows/ci.yml. The cross-client cross-client-genesis-root job pins the invariant: same --seed + same spec → identical state root across all four MPT clients.
Read CONTRIBUTING.md. Agent collaborators should start at AGENTS.md.
MIT — see LICENSE.
- go-ethereum for the database and state primitives.
- reth for the MDBX writer reference.
- hyperledger/besu and nethermind for the chainspec formats.
- ethereum-package for Kurtosis integration patterns.