Skip to content

Add framework-level OpenTelemetry tracing#310

Open
cpsievert wants to merge 4 commits into
mainfrom
worktree-feat+otel-framework-spans
Open

Add framework-level OpenTelemetry tracing#310
cpsievert wants to merge 4 commits into
mainfrom
worktree-feat+otel-framework-spans

Conversation

@cpsievert
Copy link
Copy Markdown
Collaborator

Summary

Chatlas now emits OpenTelemetry spans that capture the full structure of multi-turn conversations and tool execution — without requiring any provider-specific instrumentor libraries. When a TracerProvider is configured, every chat()/stream() call automatically produces a 3-level span hierarchy:

invoke_agent                      # wraps the full chat loop
├── chat gpt-4o                   # each model API call
├── execute_tool get_weather      # each tool invocation
├── chat gpt-4o                   # follow-up model call
└── ...

Users opt in with pip install "chatlas[otel]" and a standard TracerProvider setup (console exporter, Logfire, or any OTLP-compatible backend). The approach is consistent with Shiny for Python's OTel story — same [otel] extra pattern, same recommended tools, same config-module pattern.

Spans follow the GenAI semantic conventions and record token usage, response model/ID, and optionally full message content (gated by OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT). These framework spans complement (not replace) provider-specific SDK instrumentors like opentelemetry-instrumentation-openai-v2.

  • New chatlas/_otel.py module with span lifecycle functions
  • Hooks into all 6 core Chat methods (sync + async for agent/chat/tool spans)
  • 7 tests with VCR cassettes covering span hierarchy, token usage, content capture, tool errors, streaming, and no-op behavior
  • Updated docs/get-started/monitor.qmd with framework-level tracing docs

Test plan

  • pytest tests/test_otel.py — 7/7 passing
  • pyright chatlas/_otel.py — 0 errors
  • ruff check and ruff format — clean
  • Manual verification with a real exporter (Logfire or console) and live API key

cpsievert added 4 commits May 12, 2026 15:39
Adds a new chatlas/_otel.py module that emits OpenTelemetry spans for
the chat lifecycle: invoke_agent (top-level), chat (per model call),
and execute_tool (per tool invocation). Spans follow the GenAI semantic
conventions with attributes like gen_ai.usage.input_tokens,
gen_ai.response.model, and optional message content capture controlled
by the OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT env var.

Also adds an `otel` optional dependency extra (`pip install chatlas[otel]`).
Wires the _otel span functions into the six core Chat methods:
_chat_impl/_chat_impl_async (agent spans), _submit_turns/
_submit_turns_async (chat spans), and _invoke_tool/
_invoke_tool_async (tool spans). Parent context is passed
explicitly via _otel_parent to avoid async context hazards.
Adds 7 tests covering span hierarchy, token usage, content capture
(on/off), tool error recording, streaming lifecycle, and no-op
behavior. Includes VCR cassettes for replay without live API keys.

Updates docs/get-started/monitor.qmd with a new framework-level
tracing section (console quickstart, Logfire production path,
config-module pattern) before the existing provider-specific content.
The OTel API is ~212KB with no heavy transitive deps, and its
default ProxyTracer already no-ops when no SDK is configured.
Making it a hard dep lets us drop the lazy initialization
(cache_tracer/initialized/is_tracing guards) and always create
spans, relying on the no-op machinery for zero overhead when
nobody is collecting.

Removes the `chatlas[otel]` extra — the API is now always available.
Users still opt in to collection by installing opentelemetry-sdk
and configuring a TracerProvider.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant