Skip to content

feat(otel-export): exportEvalRuns — ship self-improvement provenance to Tangle Intelligence#73

Merged
drewstone merged 1 commit into
mainfrom
feat/intelligence-eval-runs-exporter
May 29, 2026
Merged

feat(otel-export): exportEvalRuns — ship self-improvement provenance to Tangle Intelligence#73
drewstone merged 1 commit into
mainfrom
feat/intelligence-eval-runs-exporter

Conversation

@tangletools
Copy link
Copy Markdown
Contributor

What

Adds exportEvalRuns(events, config) to @tangle-network/agent-runtime — a reusable client for Tangle Intelligence's first-class self-improvement record, POST /v1/ingest/eval-runs ('Mode D'), alongside the existing OTLP span exporter (createOtelExporter).

Why

Intelligence is becoming the de-facto store for agentic optimization + self-improvement. Today only OTLP traces have a first-class exporter; self-improvement runs (propose → gate → promote, with provenance) had no reusable client — each consumer hand-rolled a fetch. This makes shipping a self-improvement loop's provenance a one-import capability for every consumer.

Shape

import { exportEvalRuns, type EvalRunEvent } from '@tangle-network/agent-runtime'

await exportEvalRuns([{
  runId, runDir, timestamp, status: 'generation-complete',
  labels: { stage: 'proposed', measured: 'false' },
  generations: [{ index: 0, surfaceHash, surface: { /* provenance */ }, cells: [], compositeMean: 0, costUsd: 0, durationMs: 0 }],
  totalCostUsd: 0, totalDurationMs: 0,
}], { apiKey: process.env.TANGLE_API_KEY, idempotencyKey: runId })
  • Each generation's surfaceHash = proposed-change identity; surface = arbitrary provenance; labels.measured flags unmeasured proposals honestly.
  • A later gate-decided event re-emits the same runId (idempotent upsert) with a real gateDecision + holdoutLift → proposal→verdict is one diffable record (/v1/runs/diff).
  • Unlike the best-effort span exporter, this resolves with the ingest verdict ({ ok, status, accepted, rejected }) so a loop can assert its provenance landed.
  • Reads TANGLE_API_KEY + INTELLIGENCE_BASE from env; tenant resolved server-side from the key; X-Tangle-Wire-Version handled.

Tests

+4 in tests/otel-export.test.ts (extended, not forked): payload/header/wire-version shape, idempotency key, 400 per-event rejection passthrough, empty no-op. Full file 12/12 green. Typecheck clean. Additive — no change to existing exports. Version 0.31.0.

Verified end-to-end against prod from a downstream consumer: a benchmark's RSI loop shipped 2 proposal-provenance records → 200 OK, queryable at /v1/runs/<id>.

…to Tangle Intelligence (0.31.0)

Adds a reusable client for Intelligence's first-class self-improvement record
(POST /v1/ingest/eval-runs, 'Mode D'), alongside the existing OTLP span exporter.
A consumer's RSI loop emits one EvalRunEvent per proposal generation (surfaceHash =
proposed-change identity, surface = arbitrary provenance, labels.measured flags it
unmeasured); a later gate-decided event re-emits the same runId (idempotent upsert)
with a real gateDecision + holdoutLift, so proposal→verdict is one diffable record.

Unlike the best-effort span exporter, exportEvalRuns RESOLVES with the ingest verdict
(accepted/rejected per event) so a loop can assert its provenance landed. Reads
TANGLE_API_KEY + INTELLIGENCE_BASE from env; tenant resolved server-side from the key.
Wire version + X-Tangle-Wire-Version header handled. +4 tests (payload/header shape,
400 rejection passthrough, empty no-op). Makes Intelligence the de-facto provenance
store for any agent-runtime consumer's self-improvement loop, not just one benchmark.
@drewstone drewstone merged commit 70ff62a into main May 29, 2026
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants