feat: structured benchmark result output via BENCH_RESULT_OUTPUT by chatton · Pull Request #3195 · evstack/ev-node

chatton · 2026-03-24T11:45:51Z

Summary

Add run_result.go with structured JSON output for benchmark runs, written to BENCH_RESULT_OUTPUT when set
Each test emits full config, metrics (throughput, block fill, intervals, trace-derived timings), tags, and field descriptions
Wire up emitRunResult in TestGasBurner, TestERC20Throughput, and TestDeFiSimulation

Context

Supports the external-mode matrix runner in the benchmarks repo. Each matrix entry sets BENCH_RESULT_OUTPUT to collect structured results that can be aggregated into comparison tables.

This output will be used across many runs with different permutations of configurations to generate markdown tables.

local test output

  {
    "schema_version": 1,
    "test": "TestResult",
    "timestamp": "2026-03-24T11:56:17Z",
    "platform": "darwin/arm64",
    "config": {
      "block_time": "100ms",
      "slot_duration": "50ms",
      "gas_limit": "0x5F5E100",
      "scrape_interval": "25ms",
      "num_spammers": 4,
      "count_per_spammer": 10000,
      "throughput": 60,
      "warmup_txs": 100,
      "gas_units_to_burn": 0,
      "max_wallets": 200,
      "wait_timeout": "5m0s"
    },
    "tags": {
      "ev_reth": "latest",
      "ev_node": "unknown"
    },
    "metrics": {
      "mgas_per_sec": 150,
      "tps": 2250,
      "blocks_per_sec": 7.5,
      "non_empty_ratio_pct": 75,
      "avg_gas_per_block": 20000000,
      "avg_tx_per_block": 300,
      "gas_block_p50": 20000000,
      "gas_block_p99": 20000000,
      "tx_block_p50": 300,
      "tx_block_p99": 300,
      "avg_block_interval_ms": 100,
      "block_interval_p50_ms": 100,
      "block_interval_p99_ms": 100,
      "steady_state_sec": 2,
      "wall_clock_sec": 10,
      "overhead_pct": 23.52941176470588,
      "ev_reth_ggas_per_sec": 6,
      "secs_per_gigagas": 6.666666666666667,
      "produce_block_avg_ms": 85,
      "produce_block_min_ms": 80,
      "produce_block_max_ms": 90,
      "get_payload_avg_ms": 40,
      "get_payload_min_ms": 40,
      "get_payload_max_ms": 40,
      "new_payload_avg_ms": 50,
      "new_payload_min_ms": 50,
      "new_payload_max_ms": 50
    },
    "block_range": {
      "start": 100,
      "end": 120,
      "total": 20,
      "non_empty": 15
    },
    "spamoor": {
      "sent": 40000,
      "failed": 5
    },
    "field_descriptions": { ... }
  }

Summary by CodeRabbit

Tests
- Enhanced benchmark test infrastructure with structured result emission and configurable parameters.
- Improved metrics collection and resource attribute extraction from execution traces.
- Added support for transaction spamming configuration (pending limits, rebroadcast settings, base and tip fees).
Refactor
- Refactored trace processing helpers for improved composability and maintainability.

emit full benchmark run metadata (config, tags, metrics, block range, spamoor stats) as JSON when BENCH_RESULT_OUTPUT is set. consumed by external matrix runner for table generation.

claude · 2026-03-24T11:46:06Z

Claude finished @chatton's task in 1m 54s —— View job

Code Review

Overall, this is a clean addition of structured benchmark output. The architecture is well-considered — the env-var guard makes it a true no-op when unused, span stats are aggregated once and reused, and the fieldDescriptions map serves as inline documentation for consumers. A few issues worth addressing:

Issues

1. runSpamoorStats.Sent/Failed should be integer types

run_result.go:100-103

type runSpamoorStats struct {
    Sent   float64 `json:"sent"`
    Failed float64 `json:"failed"`
}

Transaction counts are integers. Using float64 loses precision above 2^53 (~9 quadrillion) and produces semantically wrong JSON (e.g. 40000.0 instead of 40000). The source prometheus counters are float64 internally, but the conversion to int64 at the boundary here would be more appropriate. The existing call sites already hold these as float64 from sumCounter — this is the right place to convert. Fix this →

2. Timestamp records end-of-test, not start

run_result.go:168

Timestamp: time.Now().UTC().Format(time.RFC3339),

buildRunResult is called from the deferred emitRunResult, so time.Now() reflects when the test finished writing results — not when the benchmark started. For aggregation/comparison tables, a start timestamp is more useful. Consider passing the start time (e.g., loadStart) through to emitRunResult.

3. extractResourceAttrs unmarshals the same JSON line 7 times

traces.go:357-367

func extractResourceAttrs(line []byte) resourceAttrs {
    return resourceAttrs{
        HostName:    extractResourceAttr(line, "host.name"),
        HostCPU:     extractResourceAttr(line, "host.cpu"),
        ...
    }
}

Each extractResourceAttr call does a full json.Unmarshal into map[string]string. This is called once per benchmark run so the performance impact is negligible, but it is wasteful — unmarshal once and look up all keys. Fix this →

4. runConfig omits several benchConfig fields

run_result.go:173-185

MaxPending, Rebroadcast, BaseFee, and TipFee are in benchConfig but not serialized. These affect load shape and could matter when comparing runs. Worth either including them or calling out in a comment that they're intentionally excluded.

5. 0644 permission literal — use 0o644

run_result.go:120

if err := os.WriteFile(outputPath, data, 0644); err != nil {

Go 1.13+ prefers the explicit octal notation 0o644. The old form still works but will likely trigger a linter warning.

Minor Observations

emitRunResult is called for TestStatePressure but not TestSpamoorSmoke — this looks intentional and is correct.
TestERC20Throughput uses waitForSpamoorDone for stats while GasBurner/DeFi/StatePressure use a separate GetMetrics() call. The inconsistency is harmless but worth noting: ERC20 gets the final counts from the drain-wait return values, others poll explicitly post-drain. Both are correct.
evNodeTag() VCS fallback is well-implemented — the 8-char truncation of the git SHA is clean.
fieldDescriptions allocated on every call — this is called once per run and the result is embedded in JSON, so there is no meaningful cost. Fine as-is.
runConfig.WaitTimeout is serialized via .String() (e.g. "5m0s") rather than seconds. Since other duration fields in runMetrics use _ms suffixed floats, consider whether the external consumer expects a consistent format.

coderabbitai · 2026-03-24T11:46:07Z

📝 Walkthrough

Walkthrough

Adds structured JSON "run result" emission for benchmarks, expands benchConfig with spammer parameters, captures final spamoor metrics and resource attributes, refactors trace/stat helpers, and integrates result emission into multiple spamoor/gasburner benchmark tests.

Changes

Cohort / File(s)	Summary
Result emission infrastructure `test/e2e/benchmark/run_result.go`	New file: defines runResult schema, builders, and `emitRunResult()` to write JSON when `BENCH_RESULT_OUTPUT` is set. Populates metrics, block-range, tags, host attributes, and optional spamoor stats and trace-derived fields.
Benchmark config `test/e2e/benchmark/config.go`	Added benchConfig fields `MaxPending`, `Rebroadcast`, `BaseFee`, `TipFee` and env-var parsing for `BENCH_MAX_PENDING`, `BENCH_REBROADCAST`, `BENCH_BASE_FEE`, `BENCH_TIP_FEE`.
Test integrations (emit + spamoor metrics) `test/e2e/benchmark/gasburner_test.go`, `test/e2e/benchmark/spamoor_defi_test.go`, `test/e2e/benchmark/spamoor_erc20_test.go`, `test/e2e/benchmark/spamoor_state_pressure_test.go`	Declare run-level `result`, `wallClock`, `spamoorStats`; defer conditional `emitRunResult(...)`; replace hardcoded spamoor params with `cfg.*`; collect final spamoor metrics via `api.GetMetrics()` or `waitForSpamoorDone()` and populate `runSpamoorStats`.
Trace/resource attr handling `test/e2e/benchmark/traces.go`, `test/e2e/benchmark/suite_test.go`, `test/e2e/benchmark/spamoor_smoke_test.go`	Introduce optional `resourceAttrCollector` and `resourceAttrs`; implement `fetchResourceAttrs` for victoriaTraceProvider; `env` now stores `evNodeServiceName`/`evRethServiceName`; `collectTraces` no longer takes serviceName argument and fetches resource attrs. Update smoke test to call `s.collectTraces(e)` (no explicit service name).
Span/stat helpers & result shape `test/e2e/benchmark/helpers.go`, `test/e2e/benchmark/result.go`	Add `overheadFromStats` and `rethRateFromStats`, refactor `evNodeOverhead`/`rethExecutionRate` to use them; adjust engine span timing units (use microseconds->milliseconds float); extend `traceResult` with `evNodeAttrs` and `evRethAttrs`.
Minor test updates `test/e2e/benchmark/gasburner_test.go` (also)	Adjustments to variable scoping, trace collection calls, and benchmark CreateSpammer parameters to use `cfg` fields.

Sequence Diagram(s)

sequenceDiagram
  participant Test as Benchmark Test
  participant SpammerAPI as Spamoor API
  participant Traces as VictoriaTraces
  participant FS as Filesystem

  Test->>SpammerAPI: run workload / waitForSpamoorDone
  SpammerAPI-->>Test: final sent/failed counters
  Test->>Traces: collect spans & resource attrs
  Traces-->>Test: spans + resource attributes
  Test->>Test: buildRunResult(result, wallClock, spans, attrs, spamoorStats)
  Test->>FS: emitRunResult -> write JSON file (BENCH_RESULT_OUTPUT)
  FS-->>Test: write success/failure

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat(benchmarking): adding gas burner test #3115: Touches gasburner benchmark and related emitRunResult integrations—strong overlap with result emission changes.
feat: add TestDeFiSimulation benchmark for Uniswap V2 workload #3187: Modifies spamoor_deFi/ERC20 benchmark flows and benchConfig use; closely related to parameterization and test updates here.
feat: add TestStatePressure benchmark for state trie stress test #3188: Adjusts e2e benchmark suite and benchConfig consumption; likely overlaps with service-name and trace/resource changes.

Suggested reviewers

alpe
tac0turtle

Poem

🐰
I hopped through tests with a pen of light,
Counting gas and traces through the night,
JSON carrots left for curious eyes,
Metrics snug beneath moonlit skies,
Hop—benchmark done, and all data bright.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	❓ Inconclusive	The PR description provides a comprehensive summary with context, but lacks the explicit structure and sections required by the template (Overview section with issue/background/goal).	Restructure the description to follow the template: add an Overview section with the goal, background, and relevant context/issue references before the Summary section.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: structured benchmark result output via BENCH_RESULT_OUTPUT' clearly and concisely summarizes the main change: adding structured JSON benchmark output written to an environment variable.
Docstring Coverage	✅ Passed	Docstring coverage is 92.86% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch cian/benchmark-results-artifact

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-03-24T11:46:26Z

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`⏩ skipped`	`✅ passed`	`✅ passed`	Mar 30, 2026, 10:41 AM

codecov · 2026-03-24T11:49:33Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.45%. Comparing base (889da9a) to head (f0f71bd).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3195      +/-   ##
==========================================
- Coverage   61.46%   61.45%   -0.02%     
==========================================
  Files         120      120              
  Lines       12462    12462              
==========================================
- Hits         7660     7658       -2     
- Misses       3944     3945       +1     
- Partials      858      859       +1

Flag	Coverage Δ
combined	`61.45% <ø> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Deduplicate overhead/reth-rate computation: move stats-based helpers to helpers.go, make span-based wrappers delegate to them - Fix sub-millisecond precision loss in engine span timings by using microsecond-based float division instead of integer truncation - Add spamoor stats to TestGasBurner for consistency with other tests

- Add MaxPending, Rebroadcast, BaseFee, TipFee to benchConfig - Fix ERC20 test hardcoding max_wallets=200 instead of using cfg - Replace all hardcoded spamoor params with cfg fields across tests

- Add resourceAttrs struct with host, OS, and service fields - Extract attributes from VictoriaTraces LogsQL span data via resourceAttrCollector interface - Include host metadata in structured benchmark result JSON

Move emitRunResult into a deferred closure in all three test functions. If the test fails after metrics are collected, the structured JSON is still written. If it fails before result data exists, the defer is a no-op.

Remove the 3-second sleep before requireSpammersRunning that caused all transactions to be mined before the measurement window started, leaving SteadyState at 0s. Also add deferred emitRunResult, configurable spamoor params, and spamoorStats collection to match the other benchmark tests.

… mode In external mode the sequencer reports spans as "ev-node" (not the test-specific name like "ev-node-erc20"), so trace queries returned zero spans. Store service names on env: local mode uses the test-specific name, external mode defaults to "ev-node"/"ev-reth" with BENCH_EVNODE_SERVICE_NAME/BENCH_EVRETH_SERVICE_NAME overrides.

fetchResourceAttrs only needs one span but was streaming the full result set from VictoriaTraces. Add limit=1 to the LogsQL query to avoid wasting bandwidth on long-lived instances with many spans.

alpe

👍 uAck

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

test/e2e/benchmark/helpers.go (1)
390-395: Avoid silent metric disappearance on “zero-count” paths.

At Line 390 / Line 394 / Line 413, returning ok=false drops the metric downstream with no diagnostic signal (field omitted in JSON/log paths). For matrix comparisons, this can make runs harder to interpret. Consider surfacing an explicit reason (or a stable placeholder field) so missing values are distinguishable from truly unavailable traces.

Also applies to: 413-414
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/benchmark/helpers.go` around lines 390 - 395, The current
metric-extraction code returns ok=false whenever a metric exists but has
Count==0 (e.g., the checks around produce, execute and stats[spanExecuteTxs]),
which causes the metric to be dropped downstream and indistinguishable from a
truly missing metric; change the logic in the function that inspects stats (the
blocks referencing produce, execute, stats and spanExecuteTxs) so that instead
of returning ok=false for zero-count cases you return a stable placeholder value
(e.g., NaN or a sentinel like -1) or set ok=true plus an explicit reason/flag
(e.g., zeroCount=true or reason="zero_count") so downstream JSON/loggers always
include the field and can differentiate “zero count” from “no data.” Ensure the
change updates all similar branches (including the checks at lines handling
produce/execute and the block noted at 413–414) and preserves the original
non-zero behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/benchmark/config.go`:
- Around line 50-53: The BENCH_* knobs can be negative because envInt allows -1;
update the benchConfig construction (where MaxPending, Rebroadcast, BaseFee,
TipFee are set) to validate or clamp values returned by envInt so negatives are
rejected or set to sane defaults—e.g., ensure MaxPending >= 0, Rebroadcast >= 0,
BaseFee >= 0, TipFee >= 0—and return an error (or panic) from the bench config
constructor (the function that builds benchConfig) on invalid inputs so
misconfiguration fails fast before hitting Spamoor.

In `@test/e2e/benchmark/suite_test.go`:
- Around line 246-248: collectTraces currently populates tr.evRethAttrs but
buildRunResult only includes traces.evNodeAttrs in the serialized
BENCH_RESULT_OUTPUT, so evRethAttrs is never exported; to fix, update
buildRunResult (or the serialization path that produces BENCH_RESULT_OUTPUT) to
include tr.evRethAttrs alongside tr.evNodeAttrs (e.g., add a field for
evRethAttrs in the RunResult/traces struct and serialize it), and ensure any
consumers expecting the output are updated to read the new field; touch the
functions buildRunResult and the traces/RunResult type so evRethAttrs is carried
through to output.

In `@test/e2e/benchmark/traces.go`:
- Around line 266-304: fetchResourceAttrs can block forever because it uses the
caller context (t.Context()) for the HTTP request; wrap the incoming ctx in a
short child timeout and use that for the request to ensure the optional
resource-attrs query cannot hang the whole test. Modify fetchResourceAttrs to
create a child context with a small timeout (e.g. 5s) via
context.WithTimeout(ctx, 5*time.Second) and defer cancel(), then pass that child
context to http.NewRequestWithContext / http.DefaultClient.Do so the request and
response read will be cancelled if it stalls; ensure the cancel is deferred
immediately after creating the timed context.

---

Nitpick comments:
In `@test/e2e/benchmark/helpers.go`:
- Around line 390-395: The current metric-extraction code returns ok=false
whenever a metric exists but has Count==0 (e.g., the checks around produce,
execute and stats[spanExecuteTxs]), which causes the metric to be dropped
downstream and indistinguishable from a truly missing metric; change the logic
in the function that inspects stats (the blocks referencing produce, execute,
stats and spanExecuteTxs) so that instead of returning ok=false for zero-count
cases you return a stable placeholder value (e.g., NaN or a sentinel like -1) or
set ok=true plus an explicit reason/flag (e.g., zeroCount=true or
reason="zero_count") so downstream JSON/loggers always include the field and can
differentiate “zero count” from “no data.” Ensure the change updates all similar
branches (including the checks at lines handling produce/execute and the block
noted at 413–414) and preserves the original non-zero behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0ceebf01-d369-4248-87a4-dbfcd7f438df

📥 Commits

Reviewing files that changed from the base of the PR and between 23d0718 and f0f71bd.

📒 Files selected for processing (11)

test/e2e/benchmark/config.go
test/e2e/benchmark/gasburner_test.go
test/e2e/benchmark/helpers.go
test/e2e/benchmark/result.go
test/e2e/benchmark/run_result.go
test/e2e/benchmark/spamoor_defi_test.go
test/e2e/benchmark/spamoor_erc20_test.go
test/e2e/benchmark/spamoor_smoke_test.go
test/e2e/benchmark/spamoor_state_pressure_test.go
test/e2e/benchmark/suite_test.go
test/e2e/benchmark/traces.go

🚧 Files skipped from review as they are similar to previous changes (4)

test/e2e/benchmark/spamoor_erc20_test.go
test/e2e/benchmark/spamoor_defi_test.go
test/e2e/benchmark/gasburner_test.go
test/e2e/benchmark/run_result.go

coderabbitai · 2026-03-30T10:53:04Z

test/e2e/benchmark/config.go

+		MaxPending:      envInt("BENCH_MAX_PENDING", 50_000),
+		Rebroadcast:     envInt("BENCH_REBROADCAST", 0),
+		BaseFee:         envInt("BENCH_BASE_FEE", 20),
+		TipFee:          envInt("BENCH_TIP_FEE", 2),


⚠️ Potential issue | 🟡 Minor

Reject negative BENCH_* load knobs up front.

envInt accepts -1 here, so max_pending, rebroadcast, and the fee fields can go negative and only fail much later inside Spamoor. Validate or clamp these values when building benchConfig so misconfigured runs fail fast.

Possible fix

+func envNonNegativeInt(key string, fallback int) int { + n := envInt(key, fallback) + if n < 0 { + return fallback + } + return n +} + - MaxPending: envInt("BENCH_MAX_PENDING", 50_000), - Rebroadcast: envInt("BENCH_REBROADCAST", 0), - BaseFee: envInt("BENCH_BASE_FEE", 20), - TipFee: envInt("BENCH_TIP_FEE", 2), + MaxPending: envNonNegativeInt("BENCH_MAX_PENDING", 50_000), + Rebroadcast: envNonNegativeInt("BENCH_REBROADCAST", 0), + BaseFee: envNonNegativeInt("BENCH_BASE_FEE", 20), + TipFee: envNonNegativeInt("BENCH_TIP_FEE", 2),

As per coding guidelines, "Validate all inputs from external sources in Go code."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

MaxPending: envInt("BENCH_MAX_PENDING", 50_000),

Rebroadcast: envInt("BENCH_REBROADCAST", 0),

BaseFee: envInt("BENCH_BASE_FEE", 20),

TipFee: envInt("BENCH_TIP_FEE", 2),

func envNonNegativeInt(key string, fallback int) int {

n := envInt(key, fallback)

if n < 0 {

return fallback

}

return n

}

MaxPending: envNonNegativeInt("BENCH_MAX_PENDING", 50_000),

Rebroadcast: envNonNegativeInt("BENCH_REBROADCAST", 0),

BaseFee: envNonNegativeInt("BENCH_BASE_FEE", 20),

TipFee: envNonNegativeInt("BENCH_TIP_FEE", 2),

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/e2e/benchmark/config.go` around lines 50 - 53, The BENCH_* knobs can be negative because envInt allows -1; update the benchConfig construction (where MaxPending, Rebroadcast, BaseFee, TipFee are set) to validate or clamp values returned by envInt so negatives are rejected or set to sane defaults—e.g., ensure MaxPending >= 0, Rebroadcast >= 0, BaseFee >= 0, TipFee >= 0—and return an error (or panic) from the bench config constructor (the function that builds benchConfig) on invalid inputs so misconfiguration fails fast before hitting Spamoor.

coderabbitai · 2026-03-30T10:53:05Z

test/e2e/benchmark/suite_test.go

+	if rac, ok := e.traces.(resourceAttrCollector); ok {
+		tr.evNodeAttrs = rac.fetchResourceAttrs(ctx, e.evNodeServiceName)
+		tr.evRethAttrs = rac.fetchResourceAttrs(ctx, e.evRethServiceName)


⚠️ Potential issue | 🟡 Minor

evRethAttrs is fetched but never reaches the artifact.

collectTraces fills tr.evRethAttrs here, but buildRunResult only serializes traces.evNodeAttrs (test/e2e/benchmark/run_result.go:166-172). Right now the second lookup is dead work and the reth-side host metadata is silently dropped from BENCH_RESULT_OUTPUT.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/e2e/benchmark/suite_test.go` around lines 246 - 248, collectTraces currently populates tr.evRethAttrs but buildRunResult only includes traces.evNodeAttrs in the serialized BENCH_RESULT_OUTPUT, so evRethAttrs is never exported; to fix, update buildRunResult (or the serialization path that produces BENCH_RESULT_OUTPUT) to include tr.evRethAttrs alongside tr.evNodeAttrs (e.g., add a field for evRethAttrs in the RunResult/traces struct and serialize it), and ensure any consumers expecting the output are updated to read the new field; touch the functions buildRunResult and the traces/RunResult type so evRethAttrs is carried through to output.

coderabbitai · 2026-03-30T10:53:05Z

test/e2e/benchmark/traces.go

+func (v *victoriaTraceProvider) fetchResourceAttrs(ctx context.Context, serviceName string) *resourceAttrs {
+	end := time.Now()
+	query := fmt.Sprintf(`_stream:{resource_attr:service.name="%s"}`, serviceName)
+	baseURL := strings.TrimRight(v.queryURL, "/")
+	url := fmt.Sprintf("%s/select/logsql/query?query=%s&start=%s&end=%s&limit=1",
+		baseURL,
+		neturl.QueryEscape(query),
+		v.startTime.Format(time.RFC3339Nano),
+		end.Format(time.RFC3339Nano))
+
+	req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
+	if err != nil {
+		v.t.Logf("warning: failed to create resource attrs request: %v", err)
+		return nil
+	}
+
+	resp, err := http.DefaultClient.Do(req)
+	if err != nil {
+		v.t.Logf("warning: failed to fetch resource attrs: %v", err)
+		return nil
+	}
+	defer resp.Body.Close()
+
+	if resp.StatusCode != http.StatusOK {
+		v.t.Logf("warning: unexpected status %d fetching resource attrs", resp.StatusCode)
+		return nil
+	}
+
+	scanner := bufio.NewScanner(resp.Body)
+	scanner.Buffer(make([]byte, 0, 64*1024), 1024*1024)
+	for scanner.Scan() {
+		line := scanner.Bytes()
+		if len(line) == 0 {
+			continue
+		}
+		attrs := extractResourceAttrs(line)
+		return &attrs
+	}
+	return nil


⚠️ Potential issue | 🟠 Major

Give the optional resource-attrs query its own timeout.

collectTraces passes t.Context() into this method, so a stalled VictoriaTraces endpoint can keep http.DefaultClient.Do blocked until the whole test times out. That makes optional host metadata capable of hanging the benchmark and delaying artifact emission after span collection already succeeded.

Possible fix

func (v *victoriaTraceProvider) fetchResourceAttrs(ctx context.Context, serviceName string) *resourceAttrs { + ctx, cancel := context.WithTimeout(ctx, 15*time.Second) + defer cancel() + end := time.Now() query := fmt.Sprintf(`_stream:{resource_attr:service.name="%s"}`, serviceName)

As per coding guidelines, "Use context.Context for cancellation in Go."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/e2e/benchmark/traces.go` around lines 266 - 304, fetchResourceAttrs can block forever because it uses the caller context (t.Context()) for the HTTP request; wrap the incoming ctx in a short child timeout and use that for the request to ensure the optional resource-attrs query cannot hang the whole test. Modify fetchResourceAttrs to create a child context with a small timeout (e.g. 5s) via context.WithTimeout(ctx, 5*time.Second) and defer cancel(), then pass that child context to http.NewRequestWithContext / http.DefaultClient.Do so the request and response read will be cancelled if it stalls; ensure the cancel is deferred immediately after creating the timed context.

feat(benchmarking): add structured result output via BENCH_RESULT_OUTPUT

23d0718

emit full benchmark run metadata (config, tags, metrics, block range, spamoor stats) as JSON when BENCH_RESULT_OUTPUT is set. consumed by external matrix runner for table generation.

github-actions bot assigned chatton Mar 24, 2026

chatton marked this pull request as draft March 24, 2026 11:49

chatton added 13 commits March 24, 2026 12:17

Merge branch 'main' into cian/benchmark-results-artifact

3dc3e35

refactor: make spamoor config fully configurable via BENCH_* env vars

c0592ee

- Add MaxPending, Rebroadcast, BaseFee, TipFee to benchConfig - Fix ERC20 test hardcoding max_wallets=200 instead of using cfg - Replace all hardcoded spamoor params with cfg fields across tests

fix: defer emitRunResult so results are written even on test failure

c55bec1

Move emitRunResult into a deferred closure in all three test functions. If the test fails after metrics are collected, the structured JSON is still written. If it fails before result data exists, the defer is a no-op.

perf: use limit=1 for resource attribute trace queries

f2936c8

fetchResourceAttrs only needs one span but was streaming the full result set from VictoriaTraces. Add limit=1 to the LogsQL query to avoid wasting bandwidth on long-lived instances with many spans.

Merge branch 'main' into cian/benchmark-results-artifact

1028b10

Merge branch 'main' into cian/benchmark-results-artifact

41a4e40

docs: add missing doc comments to run_result.go functions

3fb75ee

Merge branch 'main' into cian/benchmark-results-artifact

f0eb3a7

Merge branch 'main' into cian/benchmark-results-artifact

f0f71bd

chatton marked this pull request as ready for review March 30, 2026 10:41

alpe approved these changes Mar 30, 2026

View reviewed changes

coderabbitai bot reviewed Mar 30, 2026

View reviewed changes

-		MaxPending:      envInt("BENCH_MAX_PENDING", 50_000),
-		Rebroadcast:     envInt("BENCH_REBROADCAST", 0),
-		BaseFee:         envInt("BENCH_BASE_FEE", 20),
-		TipFee:          envInt("BENCH_TIP_FEE", 2),
+func envNonNegativeInt(key string, fallback int) int {
+	n := envInt(key, fallback)
+	if n < 0 {
+		return fallback
+	}
+	return n
+}
+		MaxPending:      envNonNegativeInt("BENCH_MAX_PENDING", 50_000),
+		Rebroadcast:     envNonNegativeInt("BENCH_REBROADCAST", 0),
+		BaseFee:         envNonNegativeInt("BENCH_BASE_FEE", 20),
+		TipFee:          envNonNegativeInt("BENCH_TIP_FEE", 2),

Conversation

chatton commented Mar 24, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Context

Summary by CodeRabbit

Uh oh!

claude bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Issues

Minor Observations

Uh oh!

coderabbitai bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

github-actions bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

alpe left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chatton commented Mar 24, 2026 •

edited by coderabbitai bot

Loading

claude bot commented Mar 24, 2026 •

edited

Loading

coderabbitai bot commented Mar 24, 2026 •

edited

Loading

github-actions bot commented Mar 24, 2026 •

edited

Loading

codecov bot commented Mar 24, 2026 •

edited

Loading