Skip to content

feat(llm): expose cached_tokens from Vertex AI context caching#538

Draft
breakanalysis wants to merge 1 commit into
mainfrom
feat/cached-token-traces
Draft

feat(llm): expose cached_tokens from Vertex AI context caching#538
breakanalysis wants to merge 1 commit into
mainfrom
feat/cached-token-traces

Conversation

@breakanalysis

Copy link
Copy Markdown

Summary

  • Add cached_tokens: Optional[int] = None to LLMUsage for prompt tokens served from Vertex AI's implicit context cache (cached_content_token_count in UsageMetadata)
  • Update VertexAILLM._parse_content_response to populate cached_tokens and set llm.token_count.prompt_details.cache_read span attribute on the active OpenTelemetry span when cached tokens are present

Context

Vertex AI's implicit caching transparently caches repeated context (system prompt, schema) and reports cached token counts in UsageMetadata.cached_content_token_count. The openinference-instrumentation-vertexai library does not currently expose this count. Setting the attribute in _parse_content_response works because it runs while the openinference instrumentation span is still active.

Test plan

  • Run pytest tests/unit/llm/test_vertexai_llm.py to verify existing tests pass
  • Manually verify cached_tokens appears in LLMResponse.usage for a real VertexAI call with context caching enabled

🤖 Generated with Claude Code

…Usage and OTel traces

- Add `cached_tokens` field to `LLMUsage` for prompt tokens served from
  Vertex AI's implicit context cache (cached_content_token_count).
- Update `VertexAILLM._parse_content_response` to populate `cached_tokens`
  and, when non-zero, set the `llm.token_count.prompt_details.cache_read`
  span attribute on the currently active OpenTelemetry span.

The OTel attribute is set inside `_parse_content_response`, which is called
while the openinference-instrumentation-vertexai span is still active, so the
attribute appears on the correct LLM span in Cloud Trace.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant