perf: Expand Benchmarks vs Upstream OpenTelemetry & CI Regression#1500
Open
JacksonWeber wants to merge 4 commits into
Open
perf: Expand Benchmarks vs Upstream OpenTelemetry & CI Regression#1500JacksonWeber wants to merge 4 commits into
JacksonWeber wants to merge 4 commits into
Conversation
… gate Adds four new perf scenarios so we can measure overhead of this package against equivalent upstream OpenTelemetry calls: - AzureMonitorSpanTest / AzureMonitorLogTest (useAzureMonitor + direct OTel API) - OtelSpanTest / OtelLogTest (plain @opentelemetry/sdk-trace-base & sdk-logs reference, informational only) Introduces a deterministic benchmark runner (bench.mjs + runBenchmarks.mjs) that bypasses the @azure-tools/test-perf worker pool, runs each scenario in a fresh Node child process to avoid OTel global-state contamination, and emits structured JSON with median/mean/stdev across N samples. Adds .github/workflows/performance.yml: packs both PR and base branch as tarballs via npm pack, installs each in turn under the PR's perf harness, runs the benchmark suite, and fails the job (blocking merge when set as a required check) if any gating scenario regresses by more than PERF_REGRESSION_THRESHOLD percent (default 15%). Posts a sticky PR comment with the comparison table. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR expands the test/performanceTests harness to add new OpenTelemetry-based span/log scenarios, introduces a deterministic multi-scenario benchmark runner (per-scenario child process + JSON output), and adds a GitHub Actions workflow to compare candidate vs baseline performance and gate PRs on regressions.
Changes:
- Add four new perf scenarios:
AzureMonitorSpanTest/AzureMonitorLogTest(gating) andOtelSpanTest/OtelLogTest(informational baseline). - Add
bench.mjs,runBenchmarks.mjs, andcomparePerf.mjsto run isolated benchmarks and produce/compare structured results. - Add
.github/workflows/performance.ymlto run baseline vs candidate benchmarks on PRs and post a sticky comparison comment.
Reviewed changes
Copilot reviewed 12 out of 13 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| test/performanceTests/test/otelSpan.spec.ts | New upstream OTel span baseline perf scenario. |
| test/performanceTests/test/otelLog.spec.ts | New upstream OTel log baseline perf scenario. |
| test/performanceTests/test/azureMonitorSpan.spec.ts | New useAzureMonitor() + OTel span perf scenario (gating). |
| test/performanceTests/test/azureMonitorLog.spec.ts | New useAzureMonitor() + OTel log perf scenario (gating). |
| test/performanceTests/test/index.spec.ts | Registers new scenarios and updates perf output capture + telemetry reporting. |
| test/performanceTests/test/appInsightsShim.spec.ts | Makes shim startup idempotent across multiple test instantiations. |
| test/performanceTests/bench.mjs | Single-scenario deterministic benchmark runner (no worker pool). |
| test/performanceTests/runBenchmarks.mjs | Multi-scenario runner: per-scenario child process + sampling + JSON summary. |
| test/performanceTests/comparePerf.mjs | Compares baseline vs candidate JSON and returns a gating exit code + Markdown. |
| test/performanceTests/README.md | Documents scenarios, tiers, manual runs, and regression CI behavior. |
| test/performanceTests/package.json | Updates perf harness deps and adds benchmark/compare scripts. |
| test/performanceTests/package-lock.json | Lockfile updates for new/updated perf harness dependencies. |
| .github/workflows/performance.yml | Adds PR perf regression workflow (pack+install both versions, benchmark, compare, comment). |
Files not reviewed (1)
- test/performanceTests/package-lock.json: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+70
to
+74
| - name: Install perf harness deps (PR) | ||
| working-directory: pr/test/performanceTests | ||
| run: npm install | ||
|
|
||
| - name: Install CANDIDATE applicationinsights into perf harness |
Comment on lines
+14
to
+16
| "@opentelemetry/api-logs": "^0.208.0", | ||
| "@opentelemetry/sdk-logs": "^0.208.0", | ||
| "@opentelemetry/sdk-trace-base": "^2.2.0", |
Comment on lines
18
to
20
| console.log = function (message: string) { | ||
| perfTestData += message + "\n"; | ||
| }; |
| cwd: __dirname, | ||
| stdio: ["ignore", "inherit", "inherit"], | ||
| }, | ||
| ); |
Comment on lines
+78
to
+82
| - name: Run candidate benchmarks | ||
| working-directory: pr/test/performanceTests | ||
| run: | | ||
| node runBenchmarks.mjs \ | ||
| --out "$GITHUB_WORKSPACE/candidate.json" \ |
Fixes CI: add explicit npm run build of the perf harness before running benchmarks; previous run died with ERR_MODULE_NOT_FOUND because dist-esm was never produced. Review feedback: - workflow: switch perf harness install to npm ci; add --no-package-lock to tarball installs so the lockfile is not rewritten mid-run - AzureMonitor scenarios: acquire @opentelemetry/api and @opentelemetry/api-logs via createRequire resolved from the installed applicationinsights, so the Tracer/Logger we benchmark is backed by the SAME api / api-logs instance that useAzureMonitor() mutated (otherwise a duplicate hoisted copy at the harness level would yield a no-op proxy and we'd silently measure nothing) - OTel reference scenarios: use provider.getTracer / provider.getLogger directly instead of going through the global registry, eliminating dual-instance concerns for these - index.spec.ts: capture console.log with rest args + util.format so multi-arg / non-string calls are formatted the same way Node would print them - runBenchmarks.mjs: propagate child.error and child.signal in failure messages (spawnSync status can be null on spawn error or signal exit) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Previous run cancelled at 25min job timeout: candidate ran 12min (5 samples x 6 scenarios x ~24s/sample on CI runners; AzureMonitor scenarios pay a 5-10s SDK init cost per fresh child process), baseline got 12min in before cancellation. Cut samples 5 -> 3, duration 8s -> 5s, warmup 2s -> 1s. New estimate: ~5min per side, ~12min total with install/build. Bumped job timeout 25 -> 40 min for safety margin. Median of 3 samples is still robust to a single outlier (the main source of CI flake), and 5s is enough sustained measurement time for even the slowest scenario (AzureMonitorLog at ~12k ops/s yields ~60k ops per sample). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds four new perf scenarios so we can measure overhead of this package against equivalent upstream OpenTelemetry calls:
AzureMonitorSpanTest / AzureMonitorLogTest (useAzureMonitor + direct OTel API)
OtelSpanTest / OtelLogTest (plain @opentelemetry/sdk-trace-base & sdk-logs reference, informational only)
Introduces a deterministic benchmark runner (bench.mjs + runBenchmarks.mjs) that bypasses the @azure-tools/test-perf worker pool, runs each scenario in a fresh Node child process to avoid OTel global-state contamination, and emits structured JSON with median/mean/stdev across N samples.
Adds .github/workflows/performance.yml: packs both PR and base branch as tarballs via npm pack, installs each in turn under the PR's perf harness, runs the benchmark suite, and fails the job (blocking merge when set as a required check) if any gating scenario regresses by more than PERF_REGRESSION_THRESHOLD percent (default 15%). Posts a sticky PR comment with the comparison table.