Skip to content

feat(loops): emit topology + planner-move telemetry for visualization #80

@drewstone

Description

@drewstone

Summary

Emit the topology structure + planner reasoning on the loop kernel's trace events so consumers can faithfully reconstruct and visualize a dynamic loop run. Today the events are structurally flat: the kernel computes the topology (the TopologyMove per round — refine | fanout | stop + rationale) and the iteration lineage, then drops both before they reach a trace.

This is a forcing requirement from a concrete consumer: the Tangle Intelligence product now ships a loop topology visualization (general-graph IR + React Flow renderer) that turns a loop run into an interactive graph — each node an iteration, each edge a planner move, the planner's reasoning + verdict/cost behind a drawer. It works today only by inferring edges for the two built-in drivers (refine → chain, fanout-vote → star) and cannot render createDynamicDriver topologies faithfully, show the planner's reasoning, or replay historical runs from the trace lake — because the data isn't emitted.

Filing so the schema below can be reviewed against the GenAI + loop-trace work already in flight, and land as one coherent convention.

What's missing (verified against 0.33.0)

  • LoopDecisionPayload = { decision, historyLength } — the TopologyMove.kind and rationale the planner returned are not on it.
  • LoopIterationDispatchPayload / LoopIterationEndedPayload carry iterationIndex + agentRunName but no parentIndex / groupId — so there are no edges, only an ordered list.
  • The in-process LoopTraceEmitter callback is not exported as OTEL spans, so an ingested trace carries no loop structure at all (no historical feed).

Net: from the emitted telemetry you can reconstruct a flat iteration list + a generic decision string. Everything that makes the topology a topology (edges, the move that created each branch, the reasoning) is computed and discarded.

Proposed enrichment (additive, backward-compatible)

1. A loop.move trace event (or enrich loop.decision)

Carry the planner's chosen TopologyMove verbatim:

interface LoopMovePayload {
  round: number;                       // planner round (0-based)
  kind: 'refine' | 'fanout' | 'stop';
  rationale?: string;                  // the planner's WHY — the high-value signal
  parentIndex?: number;                // iteration this move was planned from (omit ⇒ root)
  childIndices: number[];              // iterations this move dispatched
}

2. Edge lineage on the iteration events

Add to LoopIterationDispatchPayload (and mirror on ended):

parentIndex?: number;   // the iteration this one was planned from
groupId?: string;       // the fanout batch / planner round this belongs to

3. loop.ended rollups (mostly present)

winnerIterationIndex (have it) + totalCostUsd, durationMs, iterations.

4. OTEL span convention tangle.loop.* — the historical feed

Export the emitter's events as spans so an ingested trace (not just the in-process callback) carries the topology. The intelligence consumer already reads this exact key set (spansToLoopEvents); aligning means ingested loop runs light up with zero extra work on our side:

span attributes
loop (root) tangle.loop.driver, tangle.loop.winner_index, tangle.loop.total_cost_usd, tangle.loop.duration_ms, tangle.loop.iterations
loop.move tangle.loop.move.kind, tangle.loop.move.round, tangle.loop.move.rationale, tangle.loop.move.parent_index, tangle.loop.move.child_indices
loop.iteration tangle.loop.iteration.index, .agent_run, .placement, .group_id, .parent_index, .verdict.valid, .verdict.score, .cost_usd, .duration_ms, .tokens_in, .tokens_out, .error, .output_preview

(Co-locate with the GenAI semconv landing now — the iteration span is the natural parent of the gen_ai.* LLM spans it produced.)

Why this matters

  • Faithful arbitrary topologies. createDynamicDriver interleaves refine + fanout however the planner reasons. Without edges we guess; with them we render the actual shape — and the consumer IR is already a general directed graph (cycles allowed), so a future revisit/goto move renders with no further change.
  • The planner's reasoning is the WOW. "The planner fanned out into 3 because X, then refined branch B because it was closest" — that story is computed today and thrown away. Surfacing rationale per move turns the viz from a diagram into an explanation.
  • Historical replay. The OTEL convention means any ingested loop run (ours or a customer's agent using these loops) reconstructs from the trace lake, not just live.

Compatibility

Fully additive. Consumers degrade: ours already infers refine/fanout edges when loop.move is absent and flags the run as "inferred edges". No breaking change to runLoop / driver APIs.

Consumer reference

The intelligence-side IR + adapter that consumes this (so the field names match a real reader): products/intelligence/web/src/lib/loop-graph.tsLoopEvent, buildLoopGraph, spansToLoopEvents — in the agent-dev-container repo (branch feat/intelligence-autonomous-se).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions