Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions performance-test-server/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,8 @@ ENV NODE_ENV=production
ENV TLS_DIR=/app/certs

# Ports: 8080 (full chain), 8081 (baseline), 8082 (validation),
# 8083 (logger), 8084 (otel)
EXPOSE 8080 8081 8082 8083 8084
# 8083 (logger), 8084 (otel no-op), 8085 (otel real OTLP export —
# only bound when OTEL_EXPORT_ENABLED=1)
EXPOSE 8080 8081 8082 8083 8084 8085

CMD ["node", "src/index.ts"]
71 changes: 62 additions & 9 deletions performance-test-server/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,18 @@ Dedicated server for k6 performance benchmarking with configurable interceptor c

## Purpose

This server runs **5 parallel instances** on different ports, each with a different interceptor configuration:
This server runs **5 parallel instances** on different ports, each with a different interceptor configuration, plus an **optional 6th instance** for measuring OTLP export overhead end-to-end:

| Port | Configuration | Purpose |
|------|---------------|---------|
| 8081 | **Baseline** (no interceptors) | Measure baseline latency without any overhead |
| 8082 | **Validation only** | Measure validation interceptor overhead |
| 8083 | **Logger only** | Measure logger interceptor overhead |
| 8084 | **OTel (tracing + metrics) only** | Measure OTel interceptor overhead |
| 8080 | **Full chain** (all interceptors) | Measure total overhead with all interceptors |
| 8084 | **OTel (tracing + metrics) only** (no-op exporter) | Measure OTel interceptor overhead |
| 8080 | **Full chain** (all interceptors, no-op exporter) | Measure total overhead with all interceptors |
| 8085 | **OTel export** — full chain + real OTLP exporter (opt-in via `OTEL_EXPORT_ENABLED=1`) | Measure end-to-end cost of the stock `@connectum/otel` export path (BatchSpanProcessor + otlp-transformer + OTLP/gRPC) |

This allows k6 benchmarks to accurately measure the overhead introduced by each interceptor.
This allows k6 benchmarks to accurately measure the overhead introduced by each interceptor, and — with the OTel export scenario — the CPU cost of actually shipping spans over the wire.

## Requirements

Expand Down Expand Up @@ -94,10 +95,45 @@ Stress-tests the full-chain configuration with 100 concurrent VUs for 7 minutes:
docker compose --profile load up k6-basic-load --build --abort-on-container-exit
```

### OTel OTLP Export Overhead

Measures the p50/p95/p99 latency delta between the baseline (port 8081) and the full-chain-with-real-OTLP-exporter configuration (port 8085). Runs for ~5 minutes at 100 VUs:

```bash
OTEL_EXPORT_ENABLED=1 docker compose --profile otel-export up \
server otel-collector k6-otel-export --build --abort-on-container-exit
```
Comment on lines +102 to +105

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Target k6-otel-export explicitly to avoid running the default benchmark too.

docker compose --profile otel-export up starts all unprofiled services as well, so k6-interceptor-overhead can run concurrently and skew this benchmark. Specify the service name.

🐛 Proposed fix
 OTEL_EXPORT_ENABLED=1 docker compose --profile otel-export up \
-  --build --abort-on-container-exit
+  --build --abort-on-container-exit k6-otel-export
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```bash
OTEL_EXPORT_ENABLED=1 docker compose --profile otel-export up \
--build --abort-on-container-exit
```
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@performance-test-server/README.md` around lines 102 - 105, The docker compose
command in the README starts unprofiled services too; change the example to
explicitly target the k6-otel-export service by adding its service name to the
up command (i.e., run docker compose --profile otel-export up k6-otel-export
--build --abort-on-container-exit with OTEL_EXPORT_ENABLED=1) so the default
benchmark (k6-interceptor-overhead) does not run concurrently and skew results.


Naming the three services explicitly is deliberate: `k6-interceptor-overhead`
has no profile, so a bare `docker compose --profile otel-export up` would start
it too and run the interceptor benchmark concurrently, stealing CPU from and
contaminating the OTLP-export measurement. Listing only the services this
scenario needs keeps the run isolated.

What this measures that the `k6-interceptor-overhead` scenario does *not*:

- Real `BatchSpanProcessor` + `@opentelemetry/otlp-transformer` serialization cost per exported span
- OTLP/gRPC wire transport cost (`@grpc/grpc-js`)
- End-to-end CPU pressure of the full OTel export pipeline under sustained load

The collector runs locally in Docker and drops all telemetry via a `debug` exporter — the goal is export-side CPU profiling, not backend write throughput. See `otel-collector-config.yaml`.

k6 writes a machine-readable JSON summary to `k6/results/otel-export-overhead.json` (gitignored) for CI / bench-tracking tooling.

**Expected overhead range** (informational — actual numbers depend on the installed `@opentelemetry/otlp-transformer` version):

| Metric | Baseline (8081) | OTel export (8085) | Overhead | Relative |
|--------|-----------------|--------------------|----------|----------|
| p50 latency | ~1–3 ms | ~1.5–4 ms | +0.5–1 ms | 1.2×–1.5× |
| p95 latency | ~2–5 ms | ~3–8 ms | +1–3 ms | 1.3×–2× |
| p99 latency | ~5–10 ms | ~8–20 ms | +3–10 ms | 1.5×–2.5× |

A **relative overhead >1.5×** on p95 — or any sudden jump from a previous run — is a signal to investigate the `@opentelemetry/otlp-transformer` version. See Connectum recommendations R1.2 and upstream issues [#6221](https://github.com/open-telemetry/opentelemetry-js/issues/6221), PR [#6225](https://github.com/open-telemetry/opentelemetry-js/pull/6225), PR [#6390](https://github.com/open-telemetry/opentelemetry-js/pull/6390), issue [#6570](https://github.com/open-telemetry/opentelemetry-js/issues/6570).

### Cleanup

```bash
docker compose --profile load down --rmi local -v
docker compose --profile load --profile otel-export down --rmi local -v
```

### Environment Variables
Expand All @@ -106,14 +142,30 @@ k6 scripts accept the following environment variables (set via `docker-compose.y

| Variable | Default | Used by |
|----------|---------|---------|
| `PROTOCOL` | `https` | interceptor-overhead |
| `BASE_HOST` | `server` | interceptor-overhead |
| `PROTOCOL` | `https` | interceptor-overhead, otel-export-overhead |
| `BASE_HOST` | `server` | interceptor-overhead, otel-export-overhead |
| `BASE_URL` | `https://server:8080` | basic-load |
| `BASELINE_PORT` | `8081` | interceptor-overhead |
| `BASELINE_PORT` | `8081` | interceptor-overhead, otel-export-overhead |
| `VALIDATION_PORT` | `8082` | interceptor-overhead |
| `LOGGER_PORT` | `8083` | interceptor-overhead |
| `TRACING_PORT` | `8084` | interceptor-overhead |
| `FULLCHAIN_PORT` | `8080` | interceptor-overhead |
| `OTEL_EXPORT_PORT` | `8085` | otel-export-overhead |

The server-side OTel export scenario (port 8085) is controlled via standard `OTEL_*` env vars. Defaults are set in `docker-compose.yml`; override by exporting before `docker compose up`:

| Variable | Default | Meaning |
|----------|---------|---------|
| `OTEL_EXPORT_ENABLED` | `0` | Set to `1` to bind port 8085 and initialize the OTel provider |
| `OTEL_SERVICE_NAME` | `performance-test-server` | Resource `service.name` attribute |
| `OTEL_TRACES_EXPORTER` | `otlp/grpc` | `console`, `otlp/http`, `otlp/grpc`, or `none` |
| `OTEL_METRICS_EXPORTER` | `otlp/grpc` | same values as above |
| `OTEL_LOGS_EXPORTER` | `none` | same values as above |
| `OTEL_EXPORTER_OTLP_ENDPOINT` | `http://otel-collector:4317` | Collector endpoint |
| `OTEL_BSP_MAX_EXPORT_BATCH_SIZE` | `512` | BatchSpanProcessor batch size |
| `OTEL_BSP_MAX_QUEUE_SIZE` | `2048` | BatchSpanProcessor queue size |
| `OTEL_BSP_SCHEDULE_DELAY` | `1000` | BatchSpanProcessor flush interval (ms) |
| `OTEL_BSP_EXPORT_TIMEOUT` | `10000` | Single export attempt timeout (ms) |

## Testing

Expand All @@ -132,7 +184,7 @@ curl -X POST http://localhost:8081/greeter.v1.GreeterService/SayHello \
-d '{"name":"health"}'
```

Repeat for the other ports (8082 validation, 8083 logger, 8084 OTel, 8080 full chain).
Repeat for the other ports (8082 validation, 8083 logger, 8084 OTel, 8080 full chain, and 8085 OTel export when `OTEL_EXPORT_ENABLED=1`).

### Manual Test

Expand Down Expand Up @@ -269,6 +321,7 @@ Benchmark scripts are located in the `k6/` directory:

- `k6/interceptor-overhead.js` - Uses **all ports** to compare interceptor overhead
- `k6/basic-load.js` - Uses port 8080 (full chain) with ramping VUs
- `k6/otel-export-overhead.js` - Uses **port 8081 (baseline) + port 8085 (OTel export)** to measure end-to-end OTLP export cost under 100 VUs sustained load

## Troubleshooting

Expand Down
74 changes: 74 additions & 0 deletions performance-test-server/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,22 @@
services:
server:
build: .
# OTel export scenario (port 8085) is opt-in. When OTEL_EXPORT_ENABLED=1
# the server initializes the @connectum/otel provider with env-driven OTLP
# exporter settings and binds the extra port. All other scenarios work
# without these env vars; the provider stays uninitialized.
environment:
OTEL_EXPORT_ENABLED: "${OTEL_EXPORT_ENABLED:-0}"
OTEL_SERVICE_NAME: "${OTEL_SERVICE_NAME:-performance-test-server}"
OTEL_TRACES_EXPORTER: "${OTEL_TRACES_EXPORTER:-otlp/grpc}"
OTEL_METRICS_EXPORTER: "${OTEL_METRICS_EXPORTER:-otlp/grpc}"
OTEL_LOGS_EXPORTER: "${OTEL_LOGS_EXPORTER:-none}"
OTEL_EXPORTER_OTLP_ENDPOINT: "${OTEL_EXPORTER_OTLP_ENDPOINT:-http://otel-collector:4317}"
# BatchSpanProcessor tuning — realistic production-ish defaults
OTEL_BSP_MAX_EXPORT_BATCH_SIZE: "${OTEL_BSP_MAX_EXPORT_BATCH_SIZE:-512}"
OTEL_BSP_MAX_QUEUE_SIZE: "${OTEL_BSP_MAX_QUEUE_SIZE:-2048}"
OTEL_BSP_SCHEDULE_DELAY: "${OTEL_BSP_SCHEDULE_DELAY:-1000}"
OTEL_BSP_EXPORT_TIMEOUT: "${OTEL_BSP_EXPORT_TIMEOUT:-10000}"
healthcheck:
test: >
node -e "const h=require('node:http2'),c=h.connect('https://localhost:8080',
Expand All @@ -15,6 +31,29 @@ services:
retries: 10
start_period: 5s

# =========================================================================
# OpenTelemetry Collector (only started for the otel-export profile)
# =========================================================================
# Accepts OTLP/gRPC on :4317 and OTLP/HTTP on :4318, then drops everything
# via a debug exporter. The goal is to measure export-side CPU cost, not
# backend write throughput — see otel-collector-config.yaml for rationale.
otel-collector:
image: otel/opentelemetry-collector-contrib:0.120.0
profiles: ["otel-export"]
volumes:
- ./otel-collector-config.yaml:/etc/otelcol-contrib/config.yaml:ro
command: ["--config=/etc/otelcol-contrib/config.yaml"]
# No healthcheck: the contrib image is built FROM scratch — it has no shell,
# curl, or wget — so the OTLP listeners (:4317/:4318) and the health_check
# extension (:13133) cannot be probed from inside the container, and the only
# in-image binary (`otelcol-contrib components`) exits 0 without ever opening
# a socket, which would be a false "healthy" signal. We do NOT claim to prove
# collector readiness here. Instead, k6-otel-export gates on `service_started`
# only, and correctness does not depend on the collector being up at t=0:
# the BatchSpanProcessor buffers and retries exports, and over the ~5-minute
# steady-state run any sub-second collector startup gap is negligible relative
# to total spans exported.

k6-interceptor-overhead:
image: grafana/k6:latest
volumes:
Expand All @@ -36,3 +75,38 @@ services:
server: { condition: service_healthy }
command: run /scripts/basic-load.js
profiles: ["load"]

# =========================================================================
# OTel OTLP export overhead scenario (profile: otel-export)
# =========================================================================
# Measures the p50/p95/p99 latency delta between the baseline (port 8081)
# and full-chain-with-real-OTLP-exporter (port 8085).
# Requires OTEL_EXPORT_ENABLED=1 on the server and a running otel-collector.
#
# Readiness note: the server healthcheck below probes port 8080 only. All
# ports (incl. 8085) are bound in a single Promise.all in src/index.ts, so
# 8080 being healthy means 8085 is almost certainly up too — and the k6
# script closes the remaining gap deterministically: its setup() health-checks
# 8081 AND 8085 and aborts the run if 8085 is not yet serving. So a brief
# bind-order race cannot produce silent, partially-measured results.
k6-otel-export:
image: grafana/k6:latest
volumes:
- ./k6:/scripts
- ./k6/results:/results
environment:
PROTOCOL: https
BASE_HOST: server
BASELINE_PORT: "8081"
OTEL_EXPORT_PORT: "8085"
# k6 writes a machine-readable summary here for CI/bench tracking.
K6_OUT: "json=/results/otel-export-overhead.json"
depends_on:
server: { condition: service_healthy }
# service_started, not service_healthy: the scratch-based collector image
# cannot be health-probed (see the otel-collector comment above). The
# BatchSpanProcessor tolerates a not-yet-ready collector via buffering and
# retries, so a started container is a sufficient precondition here.
otel-collector: { condition: service_started }
command: run /scripts/otel-export-overhead.js
Comment on lines +104 to +111

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

server: service_healthy does not prove port 8085 is ready.

The server healthcheck only exercises port 8080, but this k6 service immediately requires port 8085. If 8085 is not bound yet or OTEL_EXPORT_ENABLED was missed, the benchmark starts and fails in setup. Consider making the server healthcheck include 8085 when OTEL_EXPORT_ENABLED=1.

🩺 Proposed direction
-        node -e "const h=require('node:http2'),c=h.connect('https://localhost:8080',
+        node -e "const h=require('node:http2');const ports=process.env.OTEL_EXPORT_ENABLED==='1'?[8080,8085]:[8080];
+        let pending=ports.length, failed=false;
+        for (const port of ports){const c=h.connect(`https://localhost:${port}`,
         {rejectUnauthorized:false});const r=c.request({':method':'POST',
         ':path':'/greeter.v1.GreeterService/SayHello','content-type':'application/json',
         'connect-protocol-version':'1'});r.end(JSON.stringify({name:'health'}));
-        let d='';r.on('data',x=>d+=x);r.on('end',()=>{c.close();process.exit(d?0:1)});
-        r.on('error',()=>{c.close();process.exit(1)});
-        setTimeout(()=>{c.close();process.exit(1)},3000)"
+        let d='';r.on('data',x=>d+=x);r.on('end',()=>{c.close();failed ||= !d;if(--pending===0)process.exit(failed?1:0)});
+        r.on('error',()=>{c.close();process.exit(1)});
+        setTimeout(()=>{c.close();process.exit(1)},3000)}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@performance-test-server/docker-compose.yml` around lines 93 - 96, The server
healthcheck currently only verifies port 8080 but the k6 job (command: run
/scripts/otel-export-overhead.js) needs port 8085 when OTEL_EXPORT_ENABLED=1;
update the server healthcheck (or add a new healthcheck probe used by
docker-compose) to verify port 8085 as well when OTEL_EXPORT_ENABLED is set, or
make the k6 service depend_on a new healthcheck target that checks both ports;
specifically modify the server healthcheck block (and/or the depends_on entry
referenced by the k6 service) so it conditionally checks port 8085 when
OTEL_EXPORT_ENABLED=1 to ensure the benchmark waits for 8085 to be bound before
starting.

profiles: ["otel-export"]
Loading