feat(otel-export): exportEvalRuns — ship self-improvement provenance to Tangle Intelligence by tangletools · Pull Request #73 · tangle-network/agent-runtime

tangletools · 2026-05-29T13:33:13Z

What

Adds exportEvalRuns(events, config) to @tangle-network/agent-runtime — a reusable client for Tangle Intelligence's first-class self-improvement record, POST /v1/ingest/eval-runs ('Mode D'), alongside the existing OTLP span exporter (createOtelExporter).

Why

Intelligence is becoming the de-facto store for agentic optimization + self-improvement. Today only OTLP traces have a first-class exporter; self-improvement runs (propose → gate → promote, with provenance) had no reusable client — each consumer hand-rolled a fetch. This makes shipping a self-improvement loop's provenance a one-import capability for every consumer.

Shape

import { exportEvalRuns, type EvalRunEvent } from '@tangle-network/agent-runtime'

await exportEvalRuns([{
  runId, runDir, timestamp, status: 'generation-complete',
  labels: { stage: 'proposed', measured: 'false' },
  generations: [{ index: 0, surfaceHash, surface: { /* provenance */ }, cells: [], compositeMean: 0, costUsd: 0, durationMs: 0 }],
  totalCostUsd: 0, totalDurationMs: 0,
}], { apiKey: process.env.TANGLE_API_KEY, idempotencyKey: runId })

Each generation's surfaceHash = proposed-change identity; surface = arbitrary provenance; labels.measured flags unmeasured proposals honestly.
A later gate-decided event re-emits the same runId (idempotent upsert) with a real gateDecision + holdoutLift → proposal→verdict is one diffable record (/v1/runs/diff).
Unlike the best-effort span exporter, this resolves with the ingest verdict ({ ok, status, accepted, rejected }) so a loop can assert its provenance landed.
Reads TANGLE_API_KEY + INTELLIGENCE_BASE from env; tenant resolved server-side from the key; X-Tangle-Wire-Version handled.

Tests

+4 in tests/otel-export.test.ts (extended, not forked): payload/header/wire-version shape, idempotency key, 400 per-event rejection passthrough, empty no-op. Full file 12/12 green. Typecheck clean. Additive — no change to existing exports. Version 0.31.0.

Verified end-to-end against prod from a downstream consumer: a benchmark's RSI loop shipped 2 proposal-provenance records → 200 OK, queryable at /v1/runs/<id>.

…to Tangle Intelligence (0.31.0) Adds a reusable client for Intelligence's first-class self-improvement record (POST /v1/ingest/eval-runs, 'Mode D'), alongside the existing OTLP span exporter. A consumer's RSI loop emits one EvalRunEvent per proposal generation (surfaceHash = proposed-change identity, surface = arbitrary provenance, labels.measured flags it unmeasured); a later gate-decided event re-emits the same runId (idempotent upsert) with a real gateDecision + holdoutLift, so proposal→verdict is one diffable record. Unlike the best-effort span exporter, exportEvalRuns RESOLVES with the ingest verdict (accepted/rejected per event) so a loop can assert its provenance landed. Reads TANGLE_API_KEY + INTELLIGENCE_BASE from env; tenant resolved server-side from the key. Wire version + X-Tangle-Wire-Version header handled. +4 tests (payload/header shape, 400 rejection passthrough, empty no-op). Makes Intelligence the de-facto provenance store for any agent-runtime consumer's self-improvement loop, not just one benchmark.

…#73)

drewstone merged commit 70ff62a into main May 29, 2026
1 check failed

tangletools pushed a commit that referenced this pull request May 29, 2026

style(otel-export): biome format exportEvalRuns + tests (fix red CI on …

076afbe

…#73)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(otel-export): exportEvalRuns — ship self-improvement provenance to Tangle Intelligence#73

feat(otel-export): exportEvalRuns — ship self-improvement provenance to Tangle Intelligence#73
drewstone merged 1 commit into
mainfrom
feat/intelligence-eval-runs-exporter

tangletools commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tangletools commented May 29, 2026

What

Why

Shape

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants