feat(evmrpc): bound debug_trace struct-logger memory via max_trace_struct_log_bytes (PLT-205)#3677
feat(evmrpc): bound debug_trace struct-logger memory via max_trace_struct_log_bytes (PLT-205)#3677amir-deris wants to merge 4 commits into
Conversation
PR SummaryMedium Risk Overview
Tests cover clamp branches and config parsing for the new flag. Reviewed by Cursor Bugbot for commit a1c065d. Bugbot is set up for automated code reviews on this repo. Configure here. |
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3677 +/- ##
==========================================
- Coverage 59.29% 58.28% -1.01%
==========================================
Files 2272 2185 -87
Lines 188210 178194 -10016
==========================================
- Hits 111594 103859 -7735
+ Misses 66565 65104 -1461
+ Partials 10051 9231 -820
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
A clean, well-tested, well-documented change that adds an operator cap on default struct-logger trace output. However, Codex flags a likely-correctness issue that defeats the feature's purpose: geth's logger.Config.Limit may count retained struct-log entries, not bytes, which would make the byte-named MaxTraceStructLogBytes config and its 256 MiB default ineffective — this must be verified against the pinned geth version before merge.
Findings: 2 blocking | 4 non-blocking | 1 posted inline
Blockers
- Verify the units of geth's
eth/tracers/logger.Config.Limitagainst the pinned dependency (sei-protocol/go-ethereum v1.15.7-sei-17). Codex reports (and historically geth implemented)Limitas a count of retained StructLog entries (len(l.logs) >= cfg.Limit), not bytes. If that holds at this version, the config nameMaxTraceStructLogBytes, the doc comments ("in bytes"), and the 256 MiB default are misleading and provide essentially no memory protection (256M entries × hundreds of bytes each = tens of GB), so the PR would not accomplish its stated goal. If verified that this geth version bounds by output bytes (resultSize/logs.Size()), the PR is correct and this resolves to a no-op. I could not verify this in-sandbox (module cache,gh api, and web access were all blocked), so this needs author confirmation. If it is entry-count, fix by either renaming the knob to reflect entries and choosing a sane entry-count default, or by enforcing a true byte cap. - 1 blocking issue(s) flagged inline on specific lines.
Non-blocking
- Behavior change: the default debug_trace* path was previously unlimited and now truncates at the cap. When the struct logger hits its Limit it silently stops appending entries — the trace result is returned incomplete with no error/flag. Tools consuming full opcode traces may now get partial results without indication. Worth a CHANGELOG/release note and possibly surfacing the truncation in the response.
- Consider an integration-level test that actually exercises a trace exceeding the cap and asserts the output is bounded/truncated end-to-end; current tests only unit-test the
clampDefaultStructLogLimitbranch logic (which is good but doesn't prove the limit takes effect in a real trace). - Cursor's review file (
cursor-review.md) was empty — that pass produced no output. Codex contributed the single (High) finding noted above. - Nit:
int(debugCfg.MaxTraceStructLogBytes)narrows a uint64 to int. On 64-bit nodes (the only supported target) this is fine and the//nolint:gosecis reasonable, but an operator-set value above 2^31 on a 32-bit build would overflow to a negative int and disable the cap. A clamp tomath.MaxInt(or documenting 64-bit-only) would be more defensive.
| MaxConcurrentSimulationCalls: runtime.NumCPU(), | ||
| MaxTraceLookbackBlocks: 10000, | ||
| TraceTimeout: 30 * time.Second, | ||
| MaxTraceStructLogBytes: 256 * 1024 * 1024, // 256 MiB |
There was a problem hiding this comment.
[blocker] The 256 MiB default and the MaxTraceStructLogBytes name assume geth's logger.Config.Limit is measured in bytes. Codex flags (and historical geth implements) Limit as a count of StructLog entries (len(l.logs) >= cfg.Limit), not bytes. Please verify against the pinned sei-protocol/go-ethereum v1.15.7-sei-17 source: if OnOpcode/CaptureState compares against len(logs) rather than an output byte size (resultSize/logs.Size()), then this default permits ~256M entries (tens of GB) and the cap does not bound memory as intended — the config would need to either be renamed to reflect entry-count with an appropriate default, or be changed to enforce an actual byte limit. If this geth version does bound by output bytes, disregard — the PR is correct as written. (I could not verify the dependency source in this review environment.)
There was a problem hiding this comment.
LGTM — small operational guardrail on debug trace memory with conservative defaults and clear test coverage.
Extended reasoning...
Overview
Adds MaxTraceStructLogBytes (default 256 MiB, 0=unlimited) to evmrpc/config and wires it into DebugAPI via a new clampDefaultStructLogLimit helper invoked by TraceTransaction, TraceBlockByNumber, TraceBlockByHash, and TraceCall. The clamp only affects the default geth struct logger; named/custom tracers are skipped, and the cap only ever tightens — never loosens — a caller-supplied Limit.
Security risks
None of note. This is a memory bound that tightens an existing behavior; it does not touch auth, crypto, consensus, or any external surface. Worst-case it returns a smaller trace than a caller requested, which is the documented intent. Defaults preserve upstream geth behavior when set to 0.
Level of scrutiny
Low. This is operator-facing tunable behind a config flag, with the default selected to be conservative (256 MiB). Verified Config.Limit semantics in the vendored geth (eth/tracers/logger/logger.go:49: "maximum size of output, but zero means unlimited") match the PR — so the "honor smaller caller limit, override 0/larger" logic is correct.
Other factors
Test coverage in tracers_structlog_limit_test.go hits the edges that matter (nil config, nil nested Config, larger caller limit, smaller caller limit, custom-tracer no-op, disabled-cap no-op, nil-safe). Config plumbing follows the established pattern in the file (flag constant, default value, ReadConfig branch, app.toml template snippet, bad-type test). No outstanding reviewer comments to address.
There was a problem hiding this comment.
The config plumbing, clamp logic, and tests are clean and well-structured, but the feature does not achieve its stated goal: go-ethereum's logger.Config.Limit bounds the number of retained struct-log entries (opcodes), not bytes, so treating it as a byte count means the advertised 256 MiB memory cap is not actually enforced and the OOM this PR targets can still occur.
Findings: 2 blocking | 3 non-blocking | 2 posted inline
Blockers
- None at the file/PR level.
- 2 blocking issue(s) flagged inline on specific lines.
Non-blocking
int(debugCfg.MaxTraceStructLogBytes)at tracers.go:208 (with//nolint:gosec): a very large operator-configureduint64(> math.MaxInt, or > MaxInt32 on 32-bit builds) wraps to a negativeint, whichclampDefaultStructLogLimitthen treats as<= 0and silently disables the cap — the opposite of the operator's intent. Consider clamping tomath.MaxIntinstead of a raw conversion.- Cursor second-opinion review file (
cursor-review.md) was empty — no findings were contributed from that pass. - Codex's single finding (Limit is an entry count, not bytes) is corroborated here and captured as the primary inline blocker.
| config.Config = &traceLogger.Config{} | ||
| } | ||
| if config.Config.Limit <= 0 || config.Config.Limit > api.maxStructLogBytes { | ||
| config.Config.Limit = api.maxStructLogBytes |
There was a problem hiding this comment.
[blocker] This assigns a byte budget to logger.Config.Limit, but in go-ethereum that field caps the number of retained struct-log entries (one per executed opcode), not bytes — the struct logger compares len(l.logs) against Limit. With the 256 MiB default this becomes Limit = 268,435,456 entries. Since each struct-log entry (PC, op, gas, and optionally stack/memory/storage snapshots) is far larger than one byte, a node would OOM long before accumulating 268M entries, so the intended memory bound is effectively never enforced and the OOM this PR targets can still happen. The cap needs to be expressed in entries with a realistic default (order 10^4–10^6, tuned to the desired memory ceiling), or memory must be bounded some other way — the byte framing does not map onto Limit.
| // Timeout for each trace call | ||
| TraceTimeout time.Duration `mapstructure:"trace_timeout"` | ||
|
|
||
| // MaxTraceStructLogBytes bounds the retained struct-logger output (in bytes) for a |
There was a problem hiding this comment.
[blocker] The Bytes/"in bytes" naming and documentation are misleading given that this value flows into logger.Config.Limit, which is an entry (opcode) count rather than a byte count (see tracers.go:224). Both the config name (max_trace_struct_log_bytes / MaxTraceStructLogBytes) and the 256 MiB default should be reworked to reflect entry-count semantics, and the default chosen to actually bound memory to the intended ceiling. As written the default provides essentially no protection.
| config.Config = &traceLogger.Config{} | ||
| } | ||
| if config.Limit <= 0 || config.Limit > api.maxStructLogBytes { | ||
| config.Limit = api.maxStructLogBytes |
There was a problem hiding this comment.
Struct log cap wrong field
High Severity
clampDefaultStructLogLimit updates config.Limit on TraceConfig, but the default struct logger reads the byte cap from config.Config (traceLogger.Config.Limit). After clamping, Config.Limit often stays zero or above the node cap, so traces can still grow unbounded and the new max_trace_struct_log_bytes setting may not protect the RPC node.
Reviewed by Cursor Bugbot for commit d21f39d. Configure here.
There was a problem hiding this comment.
The PR adds an operator-configurable cap on default struct-logger output for debug_trace* calls, wired cleanly with good test coverage. However, the value is applied to geth's logger.Config.Limit, which bounds the NUMBER of struct-log entries rather than bytes, so the byte-denominated config name/docs and the 256 MiB default do not deliver the memory bound they advertise.
Findings: 1 blocking | 5 non-blocking | 3 posted inline
Blockers
- None at the file/PR level.
- 1 blocking issue(s) flagged inline on specific lines.
Non-blocking
cursor-review.mdwas empty — the Cursor second-opinion pass produced no output.REVIEW_GUIDELINES.md(from the base branch) was also empty, so no repo-specific review standards were applied.- Positive:
TestClampDefaultStructLogLimitcovers all clamp/no-op branches (no limit, nil nested config, larger limit clamped, smaller limit honored, custom tracer, disabled cap, nil config), and the clamp is correctly applied before the profiled-vs-standard block-trace branch so both paths are covered. - Consider a test that actually runs a default struct-logger trace end-to-end and asserts the effective cap is enforced during execution — the current unit tests only exercise config mutation, not the geth logger's runtime behavior, which is exactly where the entries-vs-bytes mismatch hides.
- 2 suggestion(s)/nit(s) flagged inline on specific lines.
| config.Config = &traceLogger.Config{} | ||
| } | ||
| if config.Limit <= 0 || config.Limit > api.maxStructLogBytes { | ||
| config.Limit = api.maxStructLogBytes |
There was a problem hiding this comment.
[blocker] config.Limit is geth's logger.Config.Limit, which caps the number of retained struct-log entries (it is compared against len(l.logs) inside the StructLogger), not the number of bytes. Assigning api.maxStructLogBytes here means the default 256 * 1024 * 1024 permits ~268 million opcode log entries. Since each entry retains stack/storage snapshots (hundreds of bytes to KBs each), the actual memory ceiling is tens to hundreds of GB — so the advertised "256 MiB" OOM protection is effectively not enforced. Either (a) rename the config to an entry count and choose a default that maps to a sane memory budget, or (b) implement true byte accounting via a custom logger. As-is the feature does not achieve its stated purpose.
| // single default debug_trace* call (debug_traceCall/traceTransaction/traceBlock*), | ||
| // guarding against quadratic memory growth from traces that read many distinct | ||
| // storage slots. Set to 0 for unlimited (matches upstream geth behavior). | ||
| MaxTraceStructLogBytes uint64 `mapstructure:"max_trace_struct_log_bytes"` |
There was a problem hiding this comment.
[suggestion] This field, its mapstructure key max_trace_struct_log_bytes, the doc comment ("in bytes"), and the TOML template (config.go:700-703) all describe bytes, but the value is consumed as geth's entry-count logger.Config.Limit (see tracers.go:224). Rename to something like MaxTraceStructLogEntries / max_trace_struct_log_entries and update the comments + default so operators aren't misled about what unit they're configuring.
| traceCallSemaphore: sem, | ||
| maxBlockLookback: debugCfg.MaxTraceLookbackBlocks, | ||
| traceTimeout: debugCfg.TraceTimeout, | ||
| maxStructLogBytes: int(debugCfg.MaxTraceStructLogBytes), //nolint:gosec // bounded operator config |
There was a problem hiding this comment.
[nit] int(debugCfg.MaxTraceStructLogBytes) on a uint64: a configured value above math.MaxInt64 wraps to a negative int, which clampDefaultStructLogLimit then treats as <= 0 ("disabled"), silently turning off the cap instead of rejecting the invalid config. Consider validating/clamping the range at parse time in ReadConfig rather than relying on the //nolint:gosec cast. Low likelihood in practice, but it's a silent failure mode.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 2 potential issues.
There are 3 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit a1c065d. Configure here.
| // smaller caller-supplied Limit. | ||
| func (api *DebugAPI) clampDefaultStructLogLimit(config *tracers.TraceConfig) { | ||
| if config == nil || config.Tracer != nil || api.maxStructLogBytes <= 0 { | ||
| return |
There was a problem hiding this comment.
Empty tracer skips memory cap
High Severity
clampDefaultStructLogLimit returns early whenever config.Tracer is non-nil, but elsewhere default struct-logger tracing treats a non-nil empty tracer string the same as no tracer. RPC configs with "tracer":"" therefore skip max_trace_struct_log_bytes while still using the opcode struct logger on the legacy trace path, leaving the OOM guard ineffective for that input shape.
Reviewed by Cursor Bugbot for commit a1c065d. Configure here.
| if config == nil { | ||
| config = &tracers.TraceConfig{} | ||
| } | ||
| api.clampDefaultStructLogLimit(config) |
There was a problem hiding this comment.
Profile trace skips struct log cap
High Severity
This PR wires clampDefaultStructLogLimit into debug_traceTransaction, block traces, and debug_traceCall, but debug_traceTransactionProfile still calls profiledTraceTx with the raw config. That path uses the default struct logger via NewStructLogger and never applies max_trace_struct_log_bytes, so the new cap can be bypassed on a exposed debug RPC.
Reviewed by Cursor Bugbot for commit a1c065d. Configure here.
There was a problem hiding this comment.
The PR wires a new operator config into geth's struct-logger Limit, but that field bounds the count of retained struct-log entries, not bytes — so the config named/documented as max_trace_struct_log_bytes ("256 MiB") is semantically incorrect and its default (~268M entries) does not enforce the memory cap it advertises. Plumbing, tests, and clamp logic are otherwise sound.
Findings: 4 blocking | 3 non-blocking | 2 posted inline
Blockers
- Feature does not achieve its stated goal. geth's
logger.Config.Limitgates retention by entry count (l.cfg.Limit <= len(l.logs)), not bytes. Mappingmax_trace_struct_log_bytesontoLimitmeans the default of 25610241024 = 268,435,456 is interpreted as ~268M retained struct-log entries — effectively unlimited for the OOM scenario described. The quadratic memory growth (each entry re-copying the accumulated storage map) blows up long before that entry count, so the default provides no meaningful protection. Either bound by actual retained bytes, or rename the config/field/docs (e.g.max_trace_struct_log_entries) and choose an entry-count default that reflects real memory limits. - The docstrings, TOML template comment, PR description, and default value all state the cap is in bytes / '256 MiB'. This misdocumentation can cause operators to badly misconfigure: setting
max_trace_struct_log_bytes = 1048576expecting a 1 MiB cap would actually retain up to 1,048,576 entries (potentially many GiB), i.e. under-protection while believing they are protected. - 2 blocking issue(s) flagged inline on specific lines.
Non-blocking
cursor-review.mdwas empty — the Cursor second-opinion pass produced no output.- Codex's single finding (tracers.go bytes-vs-count) matches the primary blocker here and is corroborated.
int(debugCfg.MaxTraceStructLogBytes)is silenced with//nolint:gosec; on 32-bit platforms a configured value > 2^31-1 would overflow/negative-wrap. Low risk given operator-controlled config and the default fitting in int32, but worth clamping tomath.MaxIntdefensively.
| config.Config = &traceLogger.Config{} | ||
| } | ||
| if config.Limit <= 0 || config.Limit > api.maxStructLogBytes { | ||
| config.Limit = api.maxStructLogBytes |
There was a problem hiding this comment.
[blocker] config.Limit (geth logger.Config.Limit) bounds the number of retained struct-log entries, not bytes — the struct logger gates on l.cfg.Limit <= len(l.logs). Assigning maxStructLogBytes (default 268,435,456) here caps entry count at ~268M, which is effectively unlimited and does not enforce the advertised 256 MiB byte cap. Because struct-log memory grows quadratically (storage map re-copied per entry), OOM occurs far below 268M entries, so the default gives no real protection. Bound by actual retained bytes, or rename to an entries-based config with a realistic default.
| // guarding against quadratic memory growth from traces that read many distinct | ||
| // storage slots. Set to 0 for unlimited (matches upstream geth behavior). | ||
| MaxTraceStructLogBytes uint64 `mapstructure:"max_trace_struct_log_bytes"` | ||
|
|
There was a problem hiding this comment.
[blocker] Naming/documentation is misleading: this value is applied to geth's Limit, which counts retained struct-log entries, not bytes. The (in bytes) / 256 MiB framing will lead operators to set values expecting a byte cap while actually configuring an entry-count cap orders of magnitude larger in memory. Rename (e.g. MaxTraceStructLogEntries) and fix the docstring, TOML comment, and default accordingly, or convert to a genuine byte-based bound.


Summary
debug_trace*calls that use the default struct logger retain one structured log entry per executed opcode. Traces that touch many distinct storage slots grow memory quadratically, and an unbounded trace can OOM the RPC node. Upstream geth exposes a per-callLimiton the struct logger but leaves it unset (unlimited) by default.This PR adds an operator-configurable, node-side cap on the retained default struct-logger output so a single trace cannot grow without bound.
Changes
evm.max_trace_struct_log_bytes(MaxTraceStructLogBytes) bounding the bytes of retained struct-logger output for a single defaultdebug_trace*call. Defaults to 256 MiB;0disables the cap (matches upstream geth's unlimited behavior).DebugAPI.clampDefaultStructLogLimitapplies the cap acrossTraceTransaction,TraceBlockByNumber,TraceBlockByHash, andTraceCall. It:0),Limitdown to the node cap,Limit.ReadConfig), the TOML template, andDefaultConfig.Tests
TestClampDefaultStructLogLimitcovers all clamp/no-op branches (no limit set, nil nested config, larger limit clamped, smaller limit honored, custom tracer, disabled cap, nil config).config_test.goasserts the default value and rejects a bad type for the new flag.PLT-205