Skip to content

perf: Expand Benchmarks vs Upstream OpenTelemetry & CI Regression#1500

Open
JacksonWeber wants to merge 4 commits into
microsoft:mainfrom
JacksonWeber:perf/expand-tests-and-regression-gate
Open

perf: Expand Benchmarks vs Upstream OpenTelemetry & CI Regression#1500
JacksonWeber wants to merge 4 commits into
microsoft:mainfrom
JacksonWeber:perf/expand-tests-and-regression-gate

Conversation

@JacksonWeber
Copy link
Copy Markdown
Contributor

Adds four new perf scenarios so we can measure overhead of this package against equivalent upstream OpenTelemetry calls:

  • AzureMonitorSpanTest / AzureMonitorLogTest (useAzureMonitor + direct OTel API)

  • OtelSpanTest / OtelLogTest (plain @opentelemetry/sdk-trace-base & sdk-logs reference, informational only)

Introduces a deterministic benchmark runner (bench.mjs + runBenchmarks.mjs) that bypasses the @azure-tools/test-perf worker pool, runs each scenario in a fresh Node child process to avoid OTel global-state contamination, and emits structured JSON with median/mean/stdev across N samples.

Adds .github/workflows/performance.yml: packs both PR and base branch as tarballs via npm pack, installs each in turn under the PR's perf harness, runs the benchmark suite, and fails the job (blocking merge when set as a required check) if any gating scenario regresses by more than PERF_REGRESSION_THRESHOLD percent (default 15%). Posts a sticky PR comment with the comparison table.

… gate

Adds four new perf scenarios so we can measure overhead of this package against equivalent upstream OpenTelemetry calls:

- AzureMonitorSpanTest / AzureMonitorLogTest (useAzureMonitor + direct OTel API)

- OtelSpanTest / OtelLogTest (plain @opentelemetry/sdk-trace-base & sdk-logs reference, informational only)

Introduces a deterministic benchmark runner (bench.mjs + runBenchmarks.mjs) that bypasses the @azure-tools/test-perf worker pool, runs each scenario in a fresh Node child process to avoid OTel global-state contamination, and emits structured JSON with median/mean/stdev across N samples.

Adds .github/workflows/performance.yml: packs both PR and base branch as tarballs via npm pack, installs each in turn under the PR's perf harness, runs the benchmark suite, and fails the job (blocking merge when set as a required check) if any gating scenario regresses by more than PERF_REGRESSION_THRESHOLD percent (default 15%). Posts a sticky PR comment with the comparison table.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@JacksonWeber JacksonWeber requested a review from Copilot May 22, 2026 02:36
@JacksonWeber JacksonWeber changed the title perf: expand benchmarks vs upstream OpenTelemetry + add CI regression… perf: Expand Benchmarks vs Upstream OpenTelemetry & CI Regression May 22, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands the test/performanceTests harness to add new OpenTelemetry-based span/log scenarios, introduces a deterministic multi-scenario benchmark runner (per-scenario child process + JSON output), and adds a GitHub Actions workflow to compare candidate vs baseline performance and gate PRs on regressions.

Changes:

  • Add four new perf scenarios: AzureMonitorSpanTest/AzureMonitorLogTest (gating) and OtelSpanTest/OtelLogTest (informational baseline).
  • Add bench.mjs, runBenchmarks.mjs, and comparePerf.mjs to run isolated benchmarks and produce/compare structured results.
  • Add .github/workflows/performance.yml to run baseline vs candidate benchmarks on PRs and post a sticky comparison comment.

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
test/performanceTests/test/otelSpan.spec.ts New upstream OTel span baseline perf scenario.
test/performanceTests/test/otelLog.spec.ts New upstream OTel log baseline perf scenario.
test/performanceTests/test/azureMonitorSpan.spec.ts New useAzureMonitor() + OTel span perf scenario (gating).
test/performanceTests/test/azureMonitorLog.spec.ts New useAzureMonitor() + OTel log perf scenario (gating).
test/performanceTests/test/index.spec.ts Registers new scenarios and updates perf output capture + telemetry reporting.
test/performanceTests/test/appInsightsShim.spec.ts Makes shim startup idempotent across multiple test instantiations.
test/performanceTests/bench.mjs Single-scenario deterministic benchmark runner (no worker pool).
test/performanceTests/runBenchmarks.mjs Multi-scenario runner: per-scenario child process + sampling + JSON summary.
test/performanceTests/comparePerf.mjs Compares baseline vs candidate JSON and returns a gating exit code + Markdown.
test/performanceTests/README.md Documents scenarios, tiers, manual runs, and regression CI behavior.
test/performanceTests/package.json Updates perf harness deps and adds benchmark/compare scripts.
test/performanceTests/package-lock.json Lockfile updates for new/updated perf harness dependencies.
.github/workflows/performance.yml Adds PR perf regression workflow (pack+install both versions, benchmark, compare, comment).
Files not reviewed (1)
  • test/performanceTests/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +70 to +74
- name: Install perf harness deps (PR)
working-directory: pr/test/performanceTests
run: npm install

- name: Install CANDIDATE applicationinsights into perf harness
Comment on lines +14 to +16
"@opentelemetry/api-logs": "^0.208.0",
"@opentelemetry/sdk-logs": "^0.208.0",
"@opentelemetry/sdk-trace-base": "^2.2.0",
Comment on lines 18 to 20
console.log = function (message: string) {
perfTestData += message + "\n";
};
cwd: __dirname,
stdio: ["ignore", "inherit", "inherit"],
},
);
Comment on lines +78 to +82
- name: Run candidate benchmarks
working-directory: pr/test/performanceTests
run: |
node runBenchmarks.mjs \
--out "$GITHUB_WORKSPACE/candidate.json" \
JacksonWeber and others added 2 commits May 21, 2026 21:36
Fixes CI: add explicit npm run build of the perf harness before running benchmarks; previous run died with ERR_MODULE_NOT_FOUND because dist-esm was never produced.

Review feedback:

- workflow: switch perf harness install to npm ci; add --no-package-lock to tarball installs so the lockfile is not rewritten mid-run

- AzureMonitor scenarios: acquire @opentelemetry/api and @opentelemetry/api-logs via createRequire resolved from the installed applicationinsights, so the Tracer/Logger we benchmark is backed by the SAME api / api-logs instance that useAzureMonitor() mutated (otherwise a duplicate hoisted copy at the harness level would yield a no-op proxy and we'd silently measure nothing)

- OTel reference scenarios: use provider.getTracer / provider.getLogger directly instead of going through the global registry, eliminating dual-instance concerns for these

- index.spec.ts: capture console.log with rest args + util.format so multi-arg / non-string calls are formatted the same way Node would print them

- runBenchmarks.mjs: propagate child.error and child.signal in failure messages (spawnSync status can be null on spawn error or signal exit)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Previous run cancelled at 25min job timeout: candidate ran 12min (5 samples x 6 scenarios x ~24s/sample on CI runners; AzureMonitor scenarios pay a 5-10s SDK init cost per fresh child process), baseline got 12min in before cancellation.

Cut samples 5 -> 3, duration 8s -> 5s, warmup 2s -> 1s. New estimate: ~5min per side, ~12min total with install/build. Bumped job timeout 25 -> 40 min for safety margin.

Median of 3 samples is still robust to a single outlier (the main source of CI flake), and 5s is enough sustained measurement time for even the slowest scenario (AzureMonitorLog at ~12k ops/s yields ~60k ops per sample).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants