Runs domain agents and closes the loop that makes them better — capturing every run as a trace and turning those traces into eval-gated improvements, automatically.
Two halves. Execution: the chat turn, the loop kernel (refine, fanout-vote, agent-authored dynamic topologies), delegated sub-agent loops (build, research, review, audit), and OpenTelemetry GenAI tracing of all of it. Self-improvement: the declarative defineAgent manifest that names an agent's mutable surfaces, an analyst loop that mines real traces into findings, surface adapters that apply them to prompts and knowledge, identity-gated prompt optimization, outcome measurement, and createSandboxAct so the agent is evaluated through its actual production profile. Domain behavior — models, tools, knowledge — lives in adapters; the scoring/judge/ship-gate engine in @tangle-network/agent-eval; durable long-running execution in @tangle-network/sandbox.
pnpm add @tangle-network/agent-runtime @tangle-network/agent-eval @tangle-network/sandbox- Getting started — the 20-line production chat turn
- Which entry point do I reach for?
- Capabilities
- 1. Chat turns —
handleChatTurn - 2. Driven loops + topology drivers
- 3. Agent-authored topology —
createDynamicDriver - 4. Delegated loop-runner —
runDelegatedLoop - 5. Reliable build-in-a-loop — the coder delegate
- 6. Valid-only research —
createKbGate - 7. Identity-gated prompt optimization —
optimizePrompt - 8. OpenTelemetry GenAI topology tracing
- 9. MCP delegation server —
agent-runtime-mcp
- 1. Chat turns —
- Defaults
- Composition with the stack
- Subpath exports
- Adoption skill
- Stability · Tests · Docs
Every product agent is a handleChatTurn call inside a route. This is what gtm / creative / legal / tax all run in production:
import { handleChatTurn } from '@tangle-network/agent-runtime'
export async function POST({ request, env, ctx }: { request: Request; env: Env; ctx: ExecutionContext }) {
const { workspaceId, threadId, userMessage } = await request.json()
const box = await ensureWorkspaceSandbox(workspaceId)
const result = handleChatTurn({
identity: { tenantId: workspaceId, sessionId: threadId, userId: 'demo', turnIndex: 0 },
hooks: {
produce: () => ({
stream: box.streamPrompt(userMessage),
finalText: () => box.lastResponse(),
}),
persistAssistantMessage: async ({ identity, finalText }) => env.db.insertMessage(identity, finalText),
traceFlush: () => env.traceSink.flush(),
},
waitUntil: ctx.waitUntil.bind(ctx),
})
return new Response(result.body, { headers: { 'content-type': result.contentType } })
}That's the centerpiece. Everything below is "when one chat turn isn't enough" — multi-shot loops, delegation, optimization, and the telemetry that makes them auditable.
| You want to… | Reach for | Subpath |
|---|---|---|
| Run a production chat turn (90% of products) | handleChatTurn |
root |
| Declare an agent (profile + surfaces + adapters) | defineAgent |
/agent |
| One-shot task with verification + eval | runAgentTask |
root |
| Multi-shot loop (refine / fanout-vote) | runLoop + a driver |
/loops |
| Let the agent choose the loop shape per round | createDynamicDriver + createSandboxPlanner |
/loops |
| Delegate a disciplined loop by mode (code/research/…) | runDelegatedLoop / agent-runtime-loop |
root |
| Build code reliably (reviewed, gated) | createDefaultCoderDelegate |
/mcp |
| Grow a KB with only grounded facts | createKbGate |
/mcp |
| Improve a prompt safely (identity-gated) | optimizePrompt |
/improvement |
| Ship loop traces to a GenAI viewer | buildLoopOtelSpans + createOtelExporter |
root |
| Expose delegation as MCP tools to a sandbox agent | createMcpServer / agent-runtime-mcp |
/mcp |
| Mutate surfaces from trace findings | runAnalystLoop |
/analyst-loop |
| Persist a run + cost ledger | startRuntimeRun |
root |
The production turn envelope: frames a producer with the session.run.* NDJSON protocol, the persist → post-process → trace-flush hook order, and a stable execution id for client-retry replay. See Getting started and examples/chat-handler/.
runLoop is a topology-agnostic kernel: each iteration spawns a sandbox on an AgentRunSpec, decodes the output, validates it, and asks a driver what to do next. The driver owns topology; the validator owns scoring; the kernel owns iteration accounting, concurrency, cost/token aggregation, and trace emission.
import { runLoop, createFanoutVoteDriver } from '@tangle-network/agent-runtime/loops'
const result = await runLoop({
driver: createFanoutVoteDriver({ n: 3 }), // 3 parallel attempts, pick the best valid one
agentRuns: [claudeSpec, codexSpec, glmSpec], // heterogeneous: one harness per branch
output, // events → typed Output
validator, // Output → { valid, score }
task,
ctx: { sandboxClient: sandbox },
})
result.winner // highest-scoring valid attemptShipped drivers (/loops/drivers): createRefineDriver (single task, iterate until valid) and createFanoutVoteDriver (N parallel, vote). See examples/coder-loop/ and examples/researcher-loop/.
The third driver lets the agent author the loop topology at runtime — refine, fan out, or stop, decided per round by an injected planner. Topology is orthogonal to harness: the planner never names a backend; the kernel's agentRuns round-robin decides which harness runs each branch.
import { runLoop, createDynamicDriver, createSandboxPlanner } from '@tangle-network/agent-runtime/loops'
const planner = createSandboxPlanner({
client: sandbox,
profile: { name: 'planner', metadata: { backendType: 'claude-code' } }, // cheap model is fine
decodeTask: (raw) => raw as Task,
})
const result = await runLoop({
driver: createDynamicDriver({ planner, maxIterations: 8 }),
agentRuns: [claudeSpec, codexSpec], // the planner can fan a single round across both
output, validator, task,
ctx: { sandboxClient: sandbox },
})The planner emits one TopologyMove per round (refine | fanout | stop) with a rationale; a malformed move throws PlannerError (the loop never runs a topology nobody chose).
One configured entrypoint a worker agent (or a scheduled routine) calls to run a disciplined loop in a chosen mode, over the hardened engines below. Fail-loud on an unwired mode; a thrown engine is captured as { ok: false } so unattended runs record rather than crash.
import {
runDelegatedLoop, coderLoopRunner, researchLoopRunner, type DelegatedLoopRegistry,
} from '@tangle-network/agent-runtime'
const registry: DelegatedLoopRegistry = {
code: coderLoopRunner({
sandboxClient,
args: { goal: 'fix the flaky retry test', repoRoot: '/repo' },
reviewer, // optional adversarial gate
winnerSelection: 'smallest-diff',
}),
research: researchLoopRunner({ research, gate: { selfArtifactKinds: ['spec'] }, maxRounds: 3 }),
}
const result = await runDelegatedLoop('code', registry)
// → { mode: 'code', ok: true, output: CoderOutput, durationMs }Modes: code · review · research · audit · self-improve · dynamic — each with a default factory (coderLoopRunner, reviewLoopRunner, researchLoopRunner, dynamicLoopRunner, selfImproveLoopRunner, auditLoopRunner).
Schedulable: the agent-runtime-loop bin runs it from a cron/routine. The config module wires the registry (with full env/creds access):
agent-runtime-loop --mode research --config ./loops.config.js
# exits 0 (ok) · 1 (recorded failure) · 2 (usage/config error); prints the result as JSON// loops.config.js — default-exports a DelegatedLoopRegistry (or a factory)
import { researchLoopRunner } from '@tangle-network/agent-runtime'
export default { research: researchLoopRunner({ research: myResearchEngine, maxRounds: 3 }) }createDefaultCoderDelegate drives a coder loop with default-on safety gates so it never ships junk:
- no-op rejection — an empty patch can't "pass" trivially,
- secret-path floor — always-on, independent of
forbiddenPaths(.env, keys, wallets, …), - optional
reviewergate — a candidate must pass tests/typecheck and be approved to win, winnerSelection—highest-score(default) ·smallest-diff·highest-readiness·first-approved.
import { createDefaultCoderDelegate } from '@tangle-network/agent-runtime/mcp'
const coder = createDefaultCoderDelegate({
sandboxClient,
fanoutHarnesses: ['claude-code', 'codex'],
reviewer: async (output, task) => ({ approved: output.testResult.passed, recommendation: 'ship', readiness: 0.9 }),
winnerSelection: 'highest-readiness',
})
const out = await coder({ goal: 'add a retry with backoff', repoRoot: '/repo', variants: 2 }, ctx)See examples/coder-loop/ and examples/agent-into-reviewer/.
A fail-closed gate so a knowledge base grows with only grounded facts. The always-on floor: a fact's verbatimPassage must literally appear in its sourceText (anti-hallucination), the asserted value must be in the passage, and citations can't point at self-generated artifacts (laundering). Plug in your own judges; verdict-only (remediation is yours).
import { createKbGate } from '@tangle-network/agent-runtime/mcp'
const gate = createKbGate({ selfArtifactKinds: ['spec', 'cad_params'] })
const verdict = await gate({
claim: 'revenue was $1.2B in 2025',
value: 1_200_000_000,
verbatimPassage: 'total revenue was $1,200,000,000 for the fiscal year',
sourceText: rawSource,
})
if (verdict.accepted) writeToKb(fact)
else console.warn('vetoed by', verdict.vetoedBy, verdict.reason)researchLoopRunner (mode research) wraps this with a correct-on-veto remediation loop: research → gate → re-research the vetoed gaps up to maxRounds, then return the unverified ones (escalate, never silently drop).
Optimize any text prompt over agent-eval's runImprovementLoop, identity-gated by construction: it runs evals, proposes candidates (default gepaDriver), and the held-out gate compares candidate vs baseline. result.prompt is the baseline unless the gate decided ship — so registering a prompt for optimization can never regress it.
import { optimizePrompt } from '@tangle-network/agent-runtime/improvement'
const { prompt, improved, delta } = await optimizePrompt({
baselinePrompt: CURRENT_SYSTEM_PROMPT,
runWithPrompt: (candidate, scenario, ctx) => runYourThing(candidate, scenario),
scenarios, holdoutScenarios, judges, runDir,
reflection: { llm, model: 'claude-sonnet-4-6' },
})
// assign `prompt` unconditionally — it's the safe oneSee examples/self-improving-loop/.
runLoop emits a structured event stream; buildLoopOtelSpans turns it into a nested, real-duration span tree that any GenAI trace viewer (Phoenix, Langfuse, Grafana Tempo, Tangle Intelligence) renders natively. Attributes follow the current GenAI semantic conventions (gen_ai.operation.name, gen_ai.agent.name, gen_ai.usage.input_tokens/output_tokens) plus a tangle.loop.* extension for the topology (move kind/rationale, edge lineage, verdict, placement, cost).
import { buildLoopOtelSpans, createOtelExporter } from '@tangle-network/agent-runtime'
const exporter = createOtelExporter() // reads OTEL_EXPORTER_OTLP_ENDPOINT
for (const span of buildLoopOtelSpans(loopEvents, traceId)) exporter?.exportSpan(span)
await exporter?.flush()The shape: loop → loop.round (move + rationale) → loop.iteration (agent, usage, verdict, cost, parent edge). See examples/with-intelligence-export/.
Expose the five delegation tools (delegate_code, delegate_research, delegate_feedback, delegation_status, delegation_history) to a sandbox coding-harness agent — mount the canonical server, don't fork delegation logic.
import { createMcpServer, createDefaultCoderDelegate } from '@tangle-network/agent-runtime/mcp'
const server = createMcpServer({
coderDelegate: createDefaultCoderDelegate({ sandboxClient }),
researcherDelegate, // wire your KB-backed researcher
})Or mount the agent-runtime-mcp stdio bin on a production AgentProfile.mcp. See examples/mcp-delegation/ and examples/fleet-delegation/.
When nothing is specified:
| Knob | Default | Override |
|---|---|---|
| Backend model | gpt-4o-mini (via createOpenAICompatibleBackend) |
model option / MODEL_NAME env |
| Backend provider | openai-compat when TANGLE_API_KEY, else openai if OPENAI_API_KEY |
MODEL_PROVIDER env |
| Router base URL | https://router.tangle.tools/v1 |
TANGLE_ROUTER_BASE_URL env |
| Sandbox base URL | https://sandbox.tangle.tools |
SANDBOX_API_URL env |
| Loop iteration cap | 10 (runLoop); dynamic driver 8 |
runLoop({ maxIterations }) |
| Driver | none — required by runLoop |
createRefineDriver / createFanoutVoteDriver / createDynamicDriver |
| Winner selection (coder delegate) | highest-score |
winnerSelection option |
| KB gate min passage | 12 chars | createKbGate({ minPassageChars }) |
optimizePrompt gate |
heldOutGate |
defaultProductionGate for red-team hardening |
| OTEL export | off | set OTEL_EXPORTER_OTLP_ENDPOINT |
| Loop-runner mode failure | recorded as { ok: false } |
runDelegatedLoop never crashes on a thrown engine |
agent-runtime ── handleChatTurn · runLoop + drivers · runDelegatedLoop · createMcpServer
optimizePrompt · createKbGate · buildLoopOtelSpans · defineAgent
agent-eval ── runEvalCampaign · runImprovementLoop (gepaDriver) · heldOutGate · runAgentMatrix
(consumes runtime traces, scores, gates promotion)
agent-knowledge ─ proposeKnowledgeWrites / applyKnowledgeWriteBlocks
(analyst-loop produces these; runtime + createKbGate consume them)
sandbox ── AgentProfile · Sandbox.create · streamPrompt · exportTraceBundle
(the harness execution surface every loop runs on)
| Import | Owns |
|---|---|
@tangle-network/agent-runtime |
chat turns, delegated loop-runner, OTEL export, errors, model resolution |
…/agent |
defineAgent + surfaces / outcome adapters |
…/loops |
runLoop kernel + refine / fanout-vote / dynamic drivers + loopDispatch |
…/profiles |
coderProfile, researcherProfile presets |
…/mcp |
createMcpServer, createDefaultCoderDelegate, createKbGate, agent-runtime-mcp bin |
…/improvement |
optimizePrompt (text) + improvementDriver (code/worktree) |
…/analyst-loop |
runAnalystLoop — analyst registry driver |
…/platform |
cross-site SSO + integrations hub |
Bins: agent-runtime-mcp (delegation MCP server) · agent-runtime-loop (schedulable delegated loop-runner).
This package ships a self-contained adoption skill at skills/agent-runtime-adoption/SKILL.md — driven loops, topology drivers, the loopDispatch campaign bridge, MCP delegation, and identity-gated optimizePrompt. It needs only this package + @tangle-network/agent-eval, so external consumers need nothing private. For the full self-improving pipeline (trace sink → analyst loop → scorecard → production loop → CI), see the agent-eval-adoption / agent-stack-adoption skills.
Every public export is annotated @stable or @experimental. @stable exports don't change shape inside a minor; @experimental ones may and require a deliberate consumer bump.
pnpm test # full suite across the kernel, drivers, MCP, delegate hardening, kb-gate, loop-runner, backends
pnpm typecheck
pnpm buildDeeper docs: docs/concepts.md (mental model) · docs/agent-bus-protocol.md (cross-gateway header contract) · docs/conversation-economics.md (who pays — authSource) · docs/durability-adapters.md (SQL-backed ConversationJournal).