Skip to content

OpenTelemetry integration: spans, metrics, and trace export #38

@dgenio

Description

@dgenio

Milestone: v0.2.0 | Tier: Medium | Effort: Medium

Problem

TraceStore is in-memory with no export path. The kernel produces ActionTrace records but they're only accessible via trace_store.get() — there are no spans, no metrics, and no integration with the OpenTelemetry ecosystem.

Production users need:

  • Distributed tracing (each invoke() as a span with attributes)
  • Metrics (invocation counts, latency histograms, policy denial rates, budget consumption)
  • Standard export to Jaeger, Grafana, Datadog, etc. via OTLP

Proposed Change

1. OTel span instrumentation (src/agent_kernel/otel.py)

Wrap key kernel methods with OTel spans:

invoke() → span "agent_kernel.invoke"
  ├── attributes: principal_id, capability_id, safety_class, sensitivity, response_mode
  ├── child span "agent_kernel.policy.evaluate"
  ├── child span "agent_kernel.driver.execute"  
  └── child span "agent_kernel.firewall.apply"

2. Metrics

Emit the following counters/histograms:

  • agent_kernel.invocations (counter) — labels: capability_id, safety_class, status (success/error)
  • agent_kernel.invocation_duration (histogram) — latency in milliseconds
  • agent_kernel.policy_denials (counter) — labels: capability_id, reason_category
  • agent_kernel.budget_consumed (gauge) — current budget usage per session

3. TraceStore bridge

  • OTelTraceExporter that converts ActionTrace → OTel spans for legacy compatibility.
  • Bidirectional: OTel spans can be correlated with ActionTrace.trace_id.

4. Integration pattern

from agent_kernel.otel import instrument_kernel

kernel = Kernel(...)
instrument_kernel(kernel)  # Wraps methods with OTel instrumentation

Gate behind agent-kernel[otel] optional extra. When opentelemetry-api is not installed, all instrumentation is a no-op.

Acceptance Criteria

  • invoke() produces OTel spans with correct parent-child relationships
  • Span attributes include principal_id, capability_id, safety_class, response_mode
  • Invocation counter and latency histogram exported via OTLP
  • Policy denial counter incremented on each denial
  • instrument_kernel() is a no-op when opentelemetry-api is not installed
  • Zero performance impact when instrumentation is disabled
  • Integration test verifies spans appear in an in-memory OTel exporter

Affected Files

  • src/agent_kernel/otel.py (new — instrumentation module)
  • src/agent_kernel/trace.py (OTel bridge)
  • pyproject.toml (optional opentelemetry-api dependency)
  • tests/test_trace.py (OTel integration tests)
  • docs/integrations.md (OTel usage documentation)

Metadata

Metadata

Assignees

No one assigned

    Labels

    complexity:averageModerate effort, some design neededphase:storageHandle store, trace storepriority:highCore functionalitysize:MMedium change, 50 to 200 linestype:featureNew functionality

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions