Skip to content

feat: structured benchmark result output via BENCH_RESULT_OUTPUT#3195

Open
chatton wants to merge 14 commits intomainfrom
cian/benchmark-results-artifact
Open

feat: structured benchmark result output via BENCH_RESULT_OUTPUT#3195
chatton wants to merge 14 commits intomainfrom
cian/benchmark-results-artifact

Conversation

@chatton
Copy link
Copy Markdown
Contributor

@chatton chatton commented Mar 24, 2026

Summary

  • Add run_result.go with structured JSON output for benchmark runs, written to BENCH_RESULT_OUTPUT when set
  • Each test emits full config, metrics (throughput, block fill, intervals, trace-derived timings), tags, and field descriptions
  • Wire up emitRunResult in TestGasBurner, TestERC20Throughput, and TestDeFiSimulation

Context

Supports the external-mode matrix runner in the benchmarks repo. Each matrix entry sets BENCH_RESULT_OUTPUT to collect structured results that can be aggregated into comparison tables.

This output will be used across many runs with different permutations of configurations to generate markdown tables.

local test output

  {
    "schema_version": 1,
    "test": "TestResult",
    "timestamp": "2026-03-24T11:56:17Z",
    "platform": "darwin/arm64",
    "config": {
      "block_time": "100ms",
      "slot_duration": "50ms",
      "gas_limit": "0x5F5E100",
      "scrape_interval": "25ms",
      "num_spammers": 4,
      "count_per_spammer": 10000,
      "throughput": 60,
      "warmup_txs": 100,
      "gas_units_to_burn": 0,
      "max_wallets": 200,
      "wait_timeout": "5m0s"
    },
    "tags": {
      "ev_reth": "latest",
      "ev_node": "unknown"
    },
    "metrics": {
      "mgas_per_sec": 150,
      "tps": 2250,
      "blocks_per_sec": 7.5,
      "non_empty_ratio_pct": 75,
      "avg_gas_per_block": 20000000,
      "avg_tx_per_block": 300,
      "gas_block_p50": 20000000,
      "gas_block_p99": 20000000,
      "tx_block_p50": 300,
      "tx_block_p99": 300,
      "avg_block_interval_ms": 100,
      "block_interval_p50_ms": 100,
      "block_interval_p99_ms": 100,
      "steady_state_sec": 2,
      "wall_clock_sec": 10,
      "overhead_pct": 23.52941176470588,
      "ev_reth_ggas_per_sec": 6,
      "secs_per_gigagas": 6.666666666666667,
      "produce_block_avg_ms": 85,
      "produce_block_min_ms": 80,
      "produce_block_max_ms": 90,
      "get_payload_avg_ms": 40,
      "get_payload_min_ms": 40,
      "get_payload_max_ms": 40,
      "new_payload_avg_ms": 50,
      "new_payload_min_ms": 50,
      "new_payload_max_ms": 50
    },
    "block_range": {
      "start": 100,
      "end": 120,
      "total": 20,
      "non_empty": 15
    },
    "spamoor": {
      "sent": 40000,
      "failed": 5
    },
    "field_descriptions": { ... }
  }

Summary by CodeRabbit

  • Tests

    • Enhanced benchmark test infrastructure with structured result emission and configurable parameters.
    • Improved metrics collection and resource attribute extraction from execution traces.
    • Added support for transaction spamming configuration (pending limits, rebroadcast settings, base and tip fees).
  • Refactor

    • Refactored trace processing helpers for improved composability and maintainability.

emit full benchmark run metadata (config, tags, metrics, block range,
spamoor stats) as JSON when BENCH_RESULT_OUTPUT is set. consumed by
external matrix runner for table generation.
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 24, 2026

Claude finished @chatton's task in 1m 54s —— View job


Code Review

  • Gather context
  • Read changed files
  • Analyze code quality & architecture
  • Review error handling & resilience
  • Review security
  • Review performance & resource efficiency
  • Review testing & documentation
  • Post final review

Overall, this is a clean addition of structured benchmark output. The architecture is well-considered — the env-var guard makes it a true no-op when unused, span stats are aggregated once and reused, and the fieldDescriptions map serves as inline documentation for consumers. A few issues worth addressing:


Issues

1. runSpamoorStats.Sent/Failed should be integer types

run_result.go:100-103

type runSpamoorStats struct {
    Sent   float64 `json:"sent"`
    Failed float64 `json:"failed"`
}

Transaction counts are integers. Using float64 loses precision above 2^53 (~9 quadrillion) and produces semantically wrong JSON (e.g. 40000.0 instead of 40000). The source prometheus counters are float64 internally, but the conversion to int64 at the boundary here would be more appropriate. The existing call sites already hold these as float64 from sumCounter — this is the right place to convert. Fix this →


2. Timestamp records end-of-test, not start

run_result.go:168

Timestamp: time.Now().UTC().Format(time.RFC3339),

buildRunResult is called from the deferred emitRunResult, so time.Now() reflects when the test finished writing results — not when the benchmark started. For aggregation/comparison tables, a start timestamp is more useful. Consider passing the start time (e.g., loadStart) through to emitRunResult.


3. extractResourceAttrs unmarshals the same JSON line 7 times

traces.go:357-367

func extractResourceAttrs(line []byte) resourceAttrs {
    return resourceAttrs{
        HostName:    extractResourceAttr(line, "host.name"),
        HostCPU:     extractResourceAttr(line, "host.cpu"),
        ...
    }
}

Each extractResourceAttr call does a full json.Unmarshal into map[string]string. This is called once per benchmark run so the performance impact is negligible, but it is wasteful — unmarshal once and look up all keys. Fix this →


4. runConfig omits several benchConfig fields

run_result.go:173-185

MaxPending, Rebroadcast, BaseFee, and TipFee are in benchConfig but not serialized. These affect load shape and could matter when comparing runs. Worth either including them or calling out in a comment that they're intentionally excluded.


5. 0644 permission literal — use 0o644

run_result.go:120

if err := os.WriteFile(outputPath, data, 0644); err != nil {

Go 1.13+ prefers the explicit octal notation 0o644. The old form still works but will likely trigger a linter warning.


Minor Observations

  • emitRunResult is called for TestStatePressure but not TestSpamoorSmoke — this looks intentional and is correct.

  • TestERC20Throughput uses waitForSpamoorDone for stats while GasBurner/DeFi/StatePressure use a separate GetMetrics() call. The inconsistency is harmless but worth noting: ERC20 gets the final counts from the drain-wait return values, others poll explicitly post-drain. Both are correct.

  • evNodeTag() VCS fallback is well-implemented — the 8-char truncation of the git SHA is clean.

  • fieldDescriptions allocated on every call — this is called once per run and the result is embedded in JSON, so there is no meaningful cost. Fine as-is.

  • runConfig.WaitTimeout is serialized via .String() (e.g. "5m0s") rather than seconds. Since other duration fields in runMetrics use _ms suffixed floats, consider whether the external consumer expects a consistent format.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 24, 2026

📝 Walkthrough

Walkthrough

Adds structured JSON "run result" emission for benchmarks, expands benchConfig with spammer parameters, captures final spamoor metrics and resource attributes, refactors trace/stat helpers, and integrates result emission into multiple spamoor/gasburner benchmark tests.

Changes

Cohort / File(s) Summary
Result emission infrastructure
test/e2e/benchmark/run_result.go
New file: defines runResult schema, builders, and emitRunResult() to write JSON when BENCH_RESULT_OUTPUT is set. Populates metrics, block-range, tags, host attributes, and optional spamoor stats and trace-derived fields.
Benchmark config
test/e2e/benchmark/config.go
Added benchConfig fields MaxPending, Rebroadcast, BaseFee, TipFee and env-var parsing for BENCH_MAX_PENDING, BENCH_REBROADCAST, BENCH_BASE_FEE, BENCH_TIP_FEE.
Test integrations (emit + spamoor metrics)
test/e2e/benchmark/gasburner_test.go, test/e2e/benchmark/spamoor_defi_test.go, test/e2e/benchmark/spamoor_erc20_test.go, test/e2e/benchmark/spamoor_state_pressure_test.go
Declare run-level result, wallClock, spamoorStats; defer conditional emitRunResult(...); replace hardcoded spamoor params with cfg.*; collect final spamoor metrics via api.GetMetrics() or waitForSpamoorDone() and populate runSpamoorStats.
Trace/resource attr handling
test/e2e/benchmark/traces.go, test/e2e/benchmark/suite_test.go, test/e2e/benchmark/spamoor_smoke_test.go
Introduce optional resourceAttrCollector and resourceAttrs; implement fetchResourceAttrs for victoriaTraceProvider; env now stores evNodeServiceName/evRethServiceName; collectTraces no longer takes serviceName argument and fetches resource attrs. Update smoke test to call s.collectTraces(e) (no explicit service name).
Span/stat helpers & result shape
test/e2e/benchmark/helpers.go, test/e2e/benchmark/result.go
Add overheadFromStats and rethRateFromStats, refactor evNodeOverhead/rethExecutionRate to use them; adjust engine span timing units (use microseconds->milliseconds float); extend traceResult with evNodeAttrs and evRethAttrs.
Minor test updates
test/e2e/benchmark/gasburner_test.go (also)
Adjustments to variable scoping, trace collection calls, and benchmark CreateSpammer parameters to use cfg fields.

Sequence Diagram(s)

sequenceDiagram
  participant Test as Benchmark Test
  participant SpammerAPI as Spamoor API
  participant Traces as VictoriaTraces
  participant FS as Filesystem

  Test->>SpammerAPI: run workload / waitForSpamoorDone
  SpammerAPI-->>Test: final sent/failed counters
  Test->>Traces: collect spans & resource attrs
  Traces-->>Test: spans + resource attributes
  Test->>Test: buildRunResult(result, wallClock, spans, attrs, spamoorStats)
  Test->>FS: emitRunResult -> write JSON file (BENCH_RESULT_OUTPUT)
  FS-->>Test: write success/failure
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • alpe
  • tac0turtle

Poem

🐰
I hopped through tests with a pen of light,
Counting gas and traces through the night,
JSON carrots left for curious eyes,
Metrics snug beneath moonlit skies,
Hop—benchmark done, and all data bright.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The PR description provides a comprehensive summary with context, but lacks the explicit structure and sections required by the template (Overview section with issue/background/goal). Restructure the description to follow the template: add an Overview section with the goal, background, and relevant context/issue references before the Summary section.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: structured benchmark result output via BENCH_RESULT_OUTPUT' clearly and concisely summarizes the main change: adding structured JSON benchmark output written to an environment variable.
Docstring Coverage ✅ Passed Docstring coverage is 92.86% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch cian/benchmark-results-artifact

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 24, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedMar 30, 2026, 10:41 AM

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.45%. Comparing base (889da9a) to head (f0f71bd).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3195      +/-   ##
==========================================
- Coverage   61.46%   61.45%   -0.02%     
==========================================
  Files         120      120              
  Lines       12462    12462              
==========================================
- Hits         7660     7658       -2     
- Misses       3944     3945       +1     
- Partials      858      859       +1     
Flag Coverage Δ
combined 61.45% <ø> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@chatton chatton marked this pull request as draft March 24, 2026 11:49
chatton added 13 commits March 24, 2026 12:17
- Deduplicate overhead/reth-rate computation: move stats-based helpers
  to helpers.go, make span-based wrappers delegate to them
- Fix sub-millisecond precision loss in engine span timings by using
  microsecond-based float division instead of integer truncation
- Add spamoor stats to TestGasBurner for consistency with other tests
- Add MaxPending, Rebroadcast, BaseFee, TipFee to benchConfig
- Fix ERC20 test hardcoding max_wallets=200 instead of using cfg
- Replace all hardcoded spamoor params with cfg fields across tests
- Add resourceAttrs struct with host, OS, and service fields
- Extract attributes from VictoriaTraces LogsQL span data via
  resourceAttrCollector interface
- Include host metadata in structured benchmark result JSON
Move emitRunResult into a deferred closure in all three test functions.
If the test fails after metrics are collected, the structured JSON is
still written. If it fails before result data exists, the defer is a
no-op.
Remove the 3-second sleep before requireSpammersRunning that caused all
transactions to be mined before the measurement window started, leaving
SteadyState at 0s. Also add deferred emitRunResult, configurable spamoor
params, and spamoorStats collection to match the other benchmark tests.
… mode

In external mode the sequencer reports spans as "ev-node" (not the
test-specific name like "ev-node-erc20"), so trace queries returned
zero spans. Store service names on env: local mode uses the
test-specific name, external mode defaults to "ev-node"/"ev-reth"
with BENCH_EVNODE_SERVICE_NAME/BENCH_EVRETH_SERVICE_NAME overrides.
fetchResourceAttrs only needs one span but was streaming the full
result set from VictoriaTraces. Add limit=1 to the LogsQL query to
avoid wasting bandwidth on long-lived instances with many spans.
@chatton chatton marked this pull request as ready for review March 30, 2026 10:41
Copy link
Copy Markdown
Contributor

@alpe alpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 uAck

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
test/e2e/benchmark/helpers.go (1)

390-395: Avoid silent metric disappearance on “zero-count” paths.

At Line 390 / Line 394 / Line 413, returning ok=false drops the metric downstream with no diagnostic signal (field omitted in JSON/log paths). For matrix comparisons, this can make runs harder to interpret. Consider surfacing an explicit reason (or a stable placeholder field) so missing values are distinguishable from truly unavailable traces.

Also applies to: 413-414

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/benchmark/helpers.go` around lines 390 - 395, The current
metric-extraction code returns ok=false whenever a metric exists but has
Count==0 (e.g., the checks around produce, execute and stats[spanExecuteTxs]),
which causes the metric to be dropped downstream and indistinguishable from a
truly missing metric; change the logic in the function that inspects stats (the
blocks referencing produce, execute, stats and spanExecuteTxs) so that instead
of returning ok=false for zero-count cases you return a stable placeholder value
(e.g., NaN or a sentinel like -1) or set ok=true plus an explicit reason/flag
(e.g., zeroCount=true or reason="zero_count") so downstream JSON/loggers always
include the field and can differentiate “zero count” from “no data.” Ensure the
change updates all similar branches (including the checks at lines handling
produce/execute and the block noted at 413–414) and preserves the original
non-zero behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/benchmark/config.go`:
- Around line 50-53: The BENCH_* knobs can be negative because envInt allows -1;
update the benchConfig construction (where MaxPending, Rebroadcast, BaseFee,
TipFee are set) to validate or clamp values returned by envInt so negatives are
rejected or set to sane defaults—e.g., ensure MaxPending >= 0, Rebroadcast >= 0,
BaseFee >= 0, TipFee >= 0—and return an error (or panic) from the bench config
constructor (the function that builds benchConfig) on invalid inputs so
misconfiguration fails fast before hitting Spamoor.

In `@test/e2e/benchmark/suite_test.go`:
- Around line 246-248: collectTraces currently populates tr.evRethAttrs but
buildRunResult only includes traces.evNodeAttrs in the serialized
BENCH_RESULT_OUTPUT, so evRethAttrs is never exported; to fix, update
buildRunResult (or the serialization path that produces BENCH_RESULT_OUTPUT) to
include tr.evRethAttrs alongside tr.evNodeAttrs (e.g., add a field for
evRethAttrs in the RunResult/traces struct and serialize it), and ensure any
consumers expecting the output are updated to read the new field; touch the
functions buildRunResult and the traces/RunResult type so evRethAttrs is carried
through to output.

In `@test/e2e/benchmark/traces.go`:
- Around line 266-304: fetchResourceAttrs can block forever because it uses the
caller context (t.Context()) for the HTTP request; wrap the incoming ctx in a
short child timeout and use that for the request to ensure the optional
resource-attrs query cannot hang the whole test. Modify fetchResourceAttrs to
create a child context with a small timeout (e.g. 5s) via
context.WithTimeout(ctx, 5*time.Second) and defer cancel(), then pass that child
context to http.NewRequestWithContext / http.DefaultClient.Do so the request and
response read will be cancelled if it stalls; ensure the cancel is deferred
immediately after creating the timed context.

---

Nitpick comments:
In `@test/e2e/benchmark/helpers.go`:
- Around line 390-395: The current metric-extraction code returns ok=false
whenever a metric exists but has Count==0 (e.g., the checks around produce,
execute and stats[spanExecuteTxs]), which causes the metric to be dropped
downstream and indistinguishable from a truly missing metric; change the logic
in the function that inspects stats (the blocks referencing produce, execute,
stats and spanExecuteTxs) so that instead of returning ok=false for zero-count
cases you return a stable placeholder value (e.g., NaN or a sentinel like -1) or
set ok=true plus an explicit reason/flag (e.g., zeroCount=true or
reason="zero_count") so downstream JSON/loggers always include the field and can
differentiate “zero count” from “no data.” Ensure the change updates all similar
branches (including the checks at lines handling produce/execute and the block
noted at 413–414) and preserves the original non-zero behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0ceebf01-d369-4248-87a4-dbfcd7f438df

📥 Commits

Reviewing files that changed from the base of the PR and between 23d0718 and f0f71bd.

📒 Files selected for processing (11)
  • test/e2e/benchmark/config.go
  • test/e2e/benchmark/gasburner_test.go
  • test/e2e/benchmark/helpers.go
  • test/e2e/benchmark/result.go
  • test/e2e/benchmark/run_result.go
  • test/e2e/benchmark/spamoor_defi_test.go
  • test/e2e/benchmark/spamoor_erc20_test.go
  • test/e2e/benchmark/spamoor_smoke_test.go
  • test/e2e/benchmark/spamoor_state_pressure_test.go
  • test/e2e/benchmark/suite_test.go
  • test/e2e/benchmark/traces.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • test/e2e/benchmark/spamoor_erc20_test.go
  • test/e2e/benchmark/spamoor_defi_test.go
  • test/e2e/benchmark/gasburner_test.go
  • test/e2e/benchmark/run_result.go

Comment on lines +50 to +53
MaxPending: envInt("BENCH_MAX_PENDING", 50_000),
Rebroadcast: envInt("BENCH_REBROADCAST", 0),
BaseFee: envInt("BENCH_BASE_FEE", 20),
TipFee: envInt("BENCH_TIP_FEE", 2),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Reject negative BENCH_* load knobs up front.

envInt accepts -1 here, so max_pending, rebroadcast, and the fee fields can go negative and only fail much later inside Spamoor. Validate or clamp these values when building benchConfig so misconfigured runs fail fast.

Possible fix
+func envNonNegativeInt(key string, fallback int) int {
+	n := envInt(key, fallback)
+	if n < 0 {
+		return fallback
+	}
+	return n
+}
+
-		MaxPending:      envInt("BENCH_MAX_PENDING", 50_000),
-		Rebroadcast:     envInt("BENCH_REBROADCAST", 0),
-		BaseFee:         envInt("BENCH_BASE_FEE", 20),
-		TipFee:          envInt("BENCH_TIP_FEE", 2),
+		MaxPending:      envNonNegativeInt("BENCH_MAX_PENDING", 50_000),
+		Rebroadcast:     envNonNegativeInt("BENCH_REBROADCAST", 0),
+		BaseFee:         envNonNegativeInt("BENCH_BASE_FEE", 20),
+		TipFee:          envNonNegativeInt("BENCH_TIP_FEE", 2),
As per coding guidelines, "Validate all inputs from external sources in Go code."
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
MaxPending: envInt("BENCH_MAX_PENDING", 50_000),
Rebroadcast: envInt("BENCH_REBROADCAST", 0),
BaseFee: envInt("BENCH_BASE_FEE", 20),
TipFee: envInt("BENCH_TIP_FEE", 2),
func envNonNegativeInt(key string, fallback int) int {
n := envInt(key, fallback)
if n < 0 {
return fallback
}
return n
}
MaxPending: envNonNegativeInt("BENCH_MAX_PENDING", 50_000),
Rebroadcast: envNonNegativeInt("BENCH_REBROADCAST", 0),
BaseFee: envNonNegativeInt("BENCH_BASE_FEE", 20),
TipFee: envNonNegativeInt("BENCH_TIP_FEE", 2),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/benchmark/config.go` around lines 50 - 53, The BENCH_* knobs can be
negative because envInt allows -1; update the benchConfig construction (where
MaxPending, Rebroadcast, BaseFee, TipFee are set) to validate or clamp values
returned by envInt so negatives are rejected or set to sane defaults—e.g.,
ensure MaxPending >= 0, Rebroadcast >= 0, BaseFee >= 0, TipFee >= 0—and return
an error (or panic) from the bench config constructor (the function that builds
benchConfig) on invalid inputs so misconfiguration fails fast before hitting
Spamoor.

Comment on lines +246 to +248
if rac, ok := e.traces.(resourceAttrCollector); ok {
tr.evNodeAttrs = rac.fetchResourceAttrs(ctx, e.evNodeServiceName)
tr.evRethAttrs = rac.fetchResourceAttrs(ctx, e.evRethServiceName)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

evRethAttrs is fetched but never reaches the artifact.

collectTraces fills tr.evRethAttrs here, but buildRunResult only serializes traces.evNodeAttrs (test/e2e/benchmark/run_result.go:166-172). Right now the second lookup is dead work and the reth-side host metadata is silently dropped from BENCH_RESULT_OUTPUT.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/benchmark/suite_test.go` around lines 246 - 248, collectTraces
currently populates tr.evRethAttrs but buildRunResult only includes
traces.evNodeAttrs in the serialized BENCH_RESULT_OUTPUT, so evRethAttrs is
never exported; to fix, update buildRunResult (or the serialization path that
produces BENCH_RESULT_OUTPUT) to include tr.evRethAttrs alongside tr.evNodeAttrs
(e.g., add a field for evRethAttrs in the RunResult/traces struct and serialize
it), and ensure any consumers expecting the output are updated to read the new
field; touch the functions buildRunResult and the traces/RunResult type so
evRethAttrs is carried through to output.

Comment on lines +266 to +304
func (v *victoriaTraceProvider) fetchResourceAttrs(ctx context.Context, serviceName string) *resourceAttrs {
end := time.Now()
query := fmt.Sprintf(`_stream:{resource_attr:service.name="%s"}`, serviceName)
baseURL := strings.TrimRight(v.queryURL, "/")
url := fmt.Sprintf("%s/select/logsql/query?query=%s&start=%s&end=%s&limit=1",
baseURL,
neturl.QueryEscape(query),
v.startTime.Format(time.RFC3339Nano),
end.Format(time.RFC3339Nano))

req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
if err != nil {
v.t.Logf("warning: failed to create resource attrs request: %v", err)
return nil
}

resp, err := http.DefaultClient.Do(req)
if err != nil {
v.t.Logf("warning: failed to fetch resource attrs: %v", err)
return nil
}
defer resp.Body.Close()

if resp.StatusCode != http.StatusOK {
v.t.Logf("warning: unexpected status %d fetching resource attrs", resp.StatusCode)
return nil
}

scanner := bufio.NewScanner(resp.Body)
scanner.Buffer(make([]byte, 0, 64*1024), 1024*1024)
for scanner.Scan() {
line := scanner.Bytes()
if len(line) == 0 {
continue
}
attrs := extractResourceAttrs(line)
return &attrs
}
return nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Give the optional resource-attrs query its own timeout.

collectTraces passes t.Context() into this method, so a stalled VictoriaTraces endpoint can keep http.DefaultClient.Do blocked until the whole test times out. That makes optional host metadata capable of hanging the benchmark and delaying artifact emission after span collection already succeeded.

Possible fix
 func (v *victoriaTraceProvider) fetchResourceAttrs(ctx context.Context, serviceName string) *resourceAttrs {
+	ctx, cancel := context.WithTimeout(ctx, 15*time.Second)
+	defer cancel()
+
 	end := time.Now()
 	query := fmt.Sprintf(`_stream:{resource_attr:service.name="%s"}`, serviceName)
As per coding guidelines, "Use context.Context for cancellation in Go."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/benchmark/traces.go` around lines 266 - 304, fetchResourceAttrs can
block forever because it uses the caller context (t.Context()) for the HTTP
request; wrap the incoming ctx in a short child timeout and use that for the
request to ensure the optional resource-attrs query cannot hang the whole test.
Modify fetchResourceAttrs to create a child context with a small timeout (e.g.
5s) via context.WithTimeout(ctx, 5*time.Second) and defer cancel(), then pass
that child context to http.NewRequestWithContext / http.DefaultClient.Do so the
request and response read will be cancelled if it stalls; ensure the cancel is
deferred immediately after creating the timed context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants