Skip to content

fix(openai, Anthropic) fixed finish_reasons Otel semconv compliance gaps.#3916

Open
max-deygin-traceloop wants to merge 11 commits intomainfrom
max/fix-finish-reason-fallback
Open

fix(openai, Anthropic) fixed finish_reasons Otel semconv compliance gaps.#3916
max-deygin-traceloop wants to merge 11 commits intomainfrom
max/fix-finish-reason-fallback

Conversation

@max-deygin-traceloop
Copy link
Copy Markdown
Contributor

@max-deygin-traceloop max-deygin-traceloop commented Mar 31, 2026

OpenAI, Anthropic, LangChain

Aligns all packages with the Bedrock, to follow OTel genAI convention: when finish_reason is unknown/None, use "" (empty string) instead of fabricating "stop" (OpenAI) or omitting the key entirely (Anthropic, LangChain).

  • OpenAI chat_wrappers: or "stop"or ""
  • OpenAI completion_wrappers: or "stop"or ""
  • Anthropic span_utils: conditional if mapped → always set with "" fallback (3 spots)
  • LangChain span_utils: conditional if fr → always include finish_reason with "" fallback
  • Updated tests in all three packages to verify the new behavior

Summary by CodeRabbit

  • Bug Fixes

    • Output messages now always include a finish_reason field; when absent it is set to an empty string instead of being omitted or defaulted to "stop", ensuring a consistent message schema across instrumentations.
    • Instrumentation now aggregates unique non-empty finish reasons into a top-level span attribute when present.
  • Tests

    • Added and updated tests to assert finish_reason is present with an empty-string fallback and to verify finish-reason mapping and deduplication.

…, Anthropic, LangChain

Aligns all packages with the Bedrock convention: when finish_reason is
unknown/None, use "" (empty string) instead of fabricating "stop" (OpenAI)
or omitting the key entirely (Anthropic, LangChain).

- OpenAI chat_wrappers: `or "stop"` → `or ""`
- OpenAI completion_wrappers: `or "stop"` → `or ""`
- Anthropic span_utils: conditional `if mapped` → always set with `""` fallback (3 spots)
- LangChain span_utils: conditional `if fr` → always include `finish_reason` with `""` fallback
- Updated tests in all three packages to verify the new behavior
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 31, 2026

Important

Review skipped

Too many files!

This PR contains 164 files, which is 14 over the limit of 150.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6a74447a-f239-42b0-b49d-ca096a6a5d88

📥 Commits

Reviewing files that changed from the base of the PR and between 596b30f and 1f1c5ca.

📒 Files selected for processing (164)
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
  • packages/opentelemetry-instrumentation-openai/tests/metrics/cassettes/test_openai_metrics/test_chat_completion_metrics.yaml
  • packages/opentelemetry-instrumentation-openai/tests/metrics/cassettes/test_openai_metrics/test_chat_parsed_completion_metrics.yaml
  • packages/opentelemetry-instrumentation-openai/tests/metrics/cassettes/test_openai_metrics/test_embeddings_metrics.yaml
  • packages/opentelemetry-instrumentation-openai/tests/metrics/cassettes/test_openai_metrics/test_image_gen_metrics.yaml
  • packages/opentelemetry-instrumentation-openai/tests/metrics/test_openai_metrics.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_assistant/test_existing_assistant.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_assistant/test_existing_assistant_with_events_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_assistant/test_existing_assistant_with_events_with_no_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_assistant/test_new_assistant.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_assistant/test_new_assistant_with_events_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_assistant/test_new_assistant_with_events_with_no_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_assistant/test_new_assistant_with_polling.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_assistant/test_new_assistant_with_polling_with_events_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_assistant/test_new_assistant_with_polling_with_events_with_no_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_assistant/test_streaming_existing_assistant.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_assistant/test_streaming_existing_assistant_with_events_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_assistant/test_streaming_existing_assistant_with_events_with_no_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_assistant/test_streaming_new_assistant.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_assistant/test_streaming_new_assistant_with_events_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_assistant/test_streaming_new_assistant_with_events_with_no_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_azure/test_chat.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_azure/test_chat_async_streaming.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_azure/test_chat_async_streaming_with_events_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_azure/test_chat_async_streaming_with_events_with_no_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_azure/test_chat_content_filtering.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_azure/test_chat_content_filtering_with_events_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_azure/test_chat_content_filtering_with_events_with_no_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_azure/test_chat_reasoning.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_azure/test_chat_streaming.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_azure/test_chat_streaming_with_events_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_azure/test_chat_streaming_with_events_with_no_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_azure/test_chat_with_events_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_azure/test_chat_with_events_with_no_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_azure/test_prompt_content_filtering.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_azure/test_prompt_content_filtering_with_events_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_azure/test_prompt_content_filtering_with_events_with_no_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_async_streaming.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_async_streaming_with_events_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_async_streaming_with_events_with_no_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_history_message_dict.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_history_message_pydantic.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_pydantic_based_tool_calls.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_pydantic_based_tool_calls_with_events_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_pydantic_based_tool_calls_with_events_with_no_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_reasoning.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_streaming_exception_during_consumption.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_streaming_memory_leak_prevention.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_streaming_not_consumed.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_streaming_partial_consumption.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_streaming_with_events_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_streaming_with_events_with_no_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_tool_calls.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_tool_calls_with_events_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_tool_calls_with_events_with_no_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_with_events_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_with_events_with_no_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_with_service_tier.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_with_asyncio_run.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_with_asyncio_run_with_events_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_with_asyncio_run_with_events_with_no_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat_parse/test_async_parsed_completion.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat_parse/test_async_parsed_completion_with_events_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat_parse/test_async_parsed_completion_with_events_with_no_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat_parse/test_async_parsed_refused_completion.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat_parse/test_async_parsed_refused_completion_with_events_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat_parse/test_async_parsed_refused_completion_with_events_with_no_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat_parse/test_parsed_completion.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat_parse/test_parsed_completion_with_events_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat_parse/test_parsed_completion_with_events_with_no_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat_parse/test_parsed_refused_completion.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat_parse/test_parsed_refused_completion_with_events_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat_parse/test_parsed_refused_completion_with_events_with_no_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat_response_format/test_async_chat_response_format.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat_response_format/test_chat_response_format.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_completions/test_async_completion.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_completions/test_async_completion_streaming.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_completions/test_async_completion_streaming_with_events_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_completions/test_async_completion_streaming_with_events_with_no_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_completions/test_async_completion_with_events_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_completions/test_async_completion_with_events_with_no_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_completions/test_completion.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_completions/test_completion_langchain_style.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_completions/test_completion_langchain_style_with_events_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_completions/test_completion_langchain_style_with_events_with_no_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_completions/test_completion_streaming_with_events_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_completions/test_completion_streaming_with_events_with_no_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_completions/test_completion_with_events_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_completions/test_completion_with_events_with_no_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_embeddings/test_azure_openai_embeddings.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_embeddings/test_azure_openai_embeddings_with_events_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_embeddings/test_azure_openai_embeddings_with_events_with_no_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_embeddings/test_embeddings.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_embeddings/test_embeddings_with_events_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_embeddings/test_embeddings_with_events_with_no_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_embeddings/test_embeddings_with_raw_response.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_embeddings/test_embeddings_with_raw_response_with_events_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_embeddings/test_embeddings_with_raw_response_with_events_with_no_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_exceptions/test_exception_in_instrumentation_suppressed.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_finish_reasons/TestChatCompletionsFinishReasons.test_async_chat_sets_top_level_finish_reasons.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_finish_reasons/TestChatCompletionsFinishReasons.test_async_chat_streaming_sets_top_level_finish_reasons.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_finish_reasons/TestChatCompletionsFinishReasons.test_chat_completions_finish_reason_in_output_messages.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_finish_reasons/TestChatCompletionsFinishReasons.test_chat_completions_sets_top_level_finish_reasons.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_finish_reasons/TestChatCompletionsFinishReasons.test_chat_streaming_sets_top_level_finish_reasons.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_finish_reasons/TestCompletionsFinishReasons.test_async_completions_sets_top_level_finish_reasons.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_finish_reasons/TestCompletionsFinishReasons.test_async_completions_streaming_sets_top_level_finish_reasons.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_finish_reasons/TestCompletionsFinishReasons.test_completions_finish_reason_in_output_messages.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_finish_reasons/TestCompletionsFinishReasons.test_completions_sets_top_level_finish_reasons.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_finish_reasons/TestCompletionsFinishReasons.test_completions_streaming_sets_top_level_finish_reasons.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_finish_reasons/TestFinishReasonDeduplication.test_streaming_deduplicates_finish_reasons.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_finish_reasons/TestFinishReasonMapping.test_finish_reason_defaults_to_empty_string_when_missing.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_finish_reasons/TestFinishReasonMapping.test_finish_reason_mapped_from_provider_not_derived.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_finish_reasons/TestFinishReasonMapping.test_finish_reason_tool_calls_mapped_correctly.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_finish_reasons/TestResponsesAPIFinishReasons.test_responses_api_stop_finish_reason.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_finish_reasons/TestResponsesAPIFinishReasons.test_responses_api_tool_call_finish_reason.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_functions/test_open_ai_function_calls.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_functions/test_open_ai_function_calls_tools.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_functions/test_open_ai_function_calls_tools_parallel.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_functions/test_open_ai_function_calls_tools_parallel_with_events_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_functions/test_open_ai_function_calls_tools_parallel_with_events_with_no_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_functions/test_open_ai_function_calls_tools_streaming.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_functions/test_open_ai_function_calls_tools_streaming_parallel.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_functions/test_open_ai_function_calls_tools_streaming_parallel_with_events_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_functions/test_open_ai_function_calls_tools_streaming_parallel_with_events_with_no_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_functions/test_open_ai_function_calls_tools_streaming_with_events_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_functions/test_open_ai_function_calls_tools_streaming_with_events_with_no_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_functions/test_open_ai_function_calls_tools_with_events_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_functions/test_open_ai_function_calls_tools_with_events_with_no_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_functions/test_open_ai_function_calls_with_events_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_functions/test_open_ai_function_calls_with_events_with_no_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_prompt_caching/test_openai_prompt_caching.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_prompt_caching/test_openai_prompt_caching_async.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_prompt_caching/test_openai_prompt_caching_async_with_events_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_prompt_caching/test_openai_prompt_caching_async_with_events_with_no_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_prompt_caching/test_openai_prompt_caching_with_events_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_prompt_caching/test_openai_prompt_caching_with_events_with_no_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_reasoning.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_reasoning_dict_issue.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming_async.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming_async_with_context_manager.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming_async_with_parent_span.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming_with_context_manager.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_streaming_with_parent_span.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_tool_calls.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_with_input_history.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_with_request_params.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/test_responses_with_service_tier.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_vision/test_vision.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_vision/test_vision_base64.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_vision/test_vision_base64_with_events_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_vision/test_vision_base64_with_events_with_no_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_vision/test_vision_with_events_with_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_vision/test_vision_with_events_with_no_content.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_assistant.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_finish_reasons.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_semconv_compliance.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_vision.py

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Standardizes finish_reason handling across Anthropic, LangChain, and OpenAI instrumentation: output message objects always include a finish_reason field, using an empty string ("") when no finish reason is available instead of omitting the key or supplying a default like "stop".

Changes

Cohort / File(s) Summary
Anthropic Instrumentation
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py, packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py
_map_finish_reason returns "" for falsy inputs; output messages and streaming attributes always include finish_reason (empty-string fallback). Tests added/updated to expect finish_reason present as "" when missing.
LangChain Instrumentation
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py, packages/opentelemetry-instrumentation-langchain/tests/test_finish_reasons.py
set_chat_response always sets finish_reason to fr if fr else "". Test updated to expect finish_reason present as "" when none provided.
OpenAI Instrumentation (wrappers & mapping)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py, packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py
_map_finish_reason maps falsy inputs to "". Chat/completion wrappers set per-message finish_reason to the mapped value (no fabricated default); they aggregate unique non-empty mapped finish reasons and set the GEN_AI_RESPONSE_FINISH_REASONS span attribute when present.
OpenAI Tests
packages/opentelemetry-instrumentation-openai/tests/traces/test_semconv_compliance.py, packages/opentelemetry-instrumentation-openai/tests/traces/test_finish_reasons.py
New and updated tests asserting empty-string fallback for missing finish reasons, mapping rules (e.g., tool_callstool_call), deduplication of top-level finish reasons, and presence of per-message finish_reason as "" when missing.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I hopped through spans and JSON trees today,
I nudged each finish_reason so none stray away,
When reasons vanish, I leave a small trace — "",
A tidy field in every message's place,
Hooray for steady telemetry play! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 53.06% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly identifies the primary change: fixing OpenTelemetry semantic convention compliance for finish_reason handling across OpenAI and Anthropic packages.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch max/fix-finish-reason-fallback

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py`:
- Around line 398-404: The test currently uses "if raw:" which silently passes
when GenAIAttributes.GEN_AI_OUTPUT_MESSAGES is missing; change it to assert the
attribute exists (e.g., assert raw is not None or assert
GenAIAttributes.GEN_AI_OUTPUT_MESSAGES in span.attributes) before loading JSON
so the test fails on missing output messages, then keep the json.loads(raw) and
the existing finish_reason assertion; refer to the variable raw and the constant
GenAIAttributes.GEN_AI_OUTPUT_MESSAGES in test_semconv_span_attrs.py to locate
where to add the presence assertion.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9ab06105-7ac8-4624-b334-bb0b979285e1

📥 Commits

Reviewing files that changed from the base of the PR and between 0a25803 and 23fa4d9.

📒 Files selected for processing (7)
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py
  • packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py
  • packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py
  • packages/opentelemetry-instrumentation-langchain/tests/test_finish_reasons.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_semconv_compliance.py

Comment on lines +398 to +404
raw = span.attributes.get(GenAIAttributes.GEN_AI_OUTPUT_MESSAGES)
if raw:
output = json.loads(raw)
assert output[0]["finish_reason"] == "", (
f"Expected '' for missing streaming finish_reason, got '{output[0]['finish_reason']}'"
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Make the streaming fallback test fail when output messages are missing.

At Line 399, if raw: makes the test pass even if GEN_AI_OUTPUT_MESSAGES is never written. Assert presence first so regressions are caught.

✅ Suggested test hardening
-    raw = span.attributes.get(GenAIAttributes.GEN_AI_OUTPUT_MESSAGES)
-    if raw:
-        output = json.loads(raw)
-        assert output[0]["finish_reason"] == "", (
-            f"Expected '' for missing streaming finish_reason, got '{output[0]['finish_reason']}'"
-        )
+    raw = span.attributes.get(GenAIAttributes.GEN_AI_OUTPUT_MESSAGES)
+    assert raw is not None, "GEN_AI_OUTPUT_MESSAGES must be set for streaming text events"
+    output = json.loads(raw)
+    assert output[0]["finish_reason"] == "", (
+        f"Expected '' for missing streaming finish_reason, got '{output[0]['finish_reason']}'"
+    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
raw = span.attributes.get(GenAIAttributes.GEN_AI_OUTPUT_MESSAGES)
if raw:
output = json.loads(raw)
assert output[0]["finish_reason"] == "", (
f"Expected '' for missing streaming finish_reason, got '{output[0]['finish_reason']}'"
)
raw = span.attributes.get(GenAIAttributes.GEN_AI_OUTPUT_MESSAGES)
assert raw is not None, "GEN_AI_OUTPUT_MESSAGES must be set for streaming text events"
output = json.loads(raw)
assert output[0]["finish_reason"] == "", (
f"Expected '' for missing streaming finish_reason, got '{output[0]['finish_reason']}'"
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py`
around lines 398 - 404, The test currently uses "if raw:" which silently passes
when GenAIAttributes.GEN_AI_OUTPUT_MESSAGES is missing; change it to assert the
attribute exists (e.g., assert raw is not None or assert
GenAIAttributes.GEN_AI_OUTPUT_MESSAGES in span.attributes) before loading JSON
so the test fails on missing output messages, then keep the json.loads(raw) and
the existing finish_reason assertion; refer to the variable raw and the constant
GenAIAttributes.GEN_AI_OUTPUT_MESSAGES in test_semconv_span_attrs.py to locate
where to add the presence assertion.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py (1)

398-404: ⚠️ Potential issue | 🟡 Minor

Test silently passes if output messages are missing.

The if raw: guard means the test will pass even if GEN_AI_OUTPUT_MESSAGES is never written. Assert presence first to catch regressions.

✅ Suggested fix
     raw = span.attributes.get(GenAIAttributes.GEN_AI_OUTPUT_MESSAGES)
-    if raw:
-        output = json.loads(raw)
-        assert output[0]["finish_reason"] == "", (
-            f"Expected '' for missing streaming finish_reason, got '{output[0]['finish_reason']}'"
-        )
+    assert raw is not None, "GEN_AI_OUTPUT_MESSAGES must be set for streaming text events"
+    output = json.loads(raw)
+    assert output[0]["finish_reason"] == "", (
+        f"Expected '' for missing streaming finish_reason, got '{output[0]['finish_reason']}'"
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py`
around lines 398 - 404, The test currently guards with "if raw:" so it silently
passes when GenAIAttributes.GEN_AI_OUTPUT_MESSAGES is missing; change it to
explicitly assert presence before parsing: assert
GenAIAttributes.GEN_AI_OUTPUT_MESSAGES in span.attributes (or assert raw is not
None) then call json.loads(raw) and proceed to assert output[0]["finish_reason"]
== ""; update the check around
span.attributes.get(GenAIAttributes.GEN_AI_OUTPUT_MESSAGES) and the subsequent
json.loads usage to ensure the attribute exists and is parsed only after the
assertion.
🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py (1)

1574-1595: Good coverage of the _map_finish_reason helper.

The tests comprehensively cover:

  • All falsy inputs returning ""
  • All known Anthropic reason mappings
  • Unknown reason passthrough

One minor style note: the class-level import pattern is functional but unconventional. Consider importing at the top of the file alongside other imports from span_utils (lines 35-39) for consistency.

♻️ Optional: move import to top of file

At top of file (around line 39):

 from opentelemetry.instrumentation.anthropic.span_utils import (
     aset_input_attributes,
     set_response_attributes,
     set_streaming_response_attributes,
+    _map_finish_reason,
 )

Then simplify the class:

 class TestMapFinishReason:
-    from opentelemetry.instrumentation.anthropic.span_utils import _map_finish_reason
-    _map_finish_reason = staticmethod(_map_finish_reason)
-
     `@pytest.mark.parametrize`("falsy_input", [None, "", 0, False])
     def test_returns_empty_string_for_falsy(self, falsy_input):
-        assert self._map_finish_reason(falsy_input) == ""
+        assert _map_finish_reason(falsy_input) == ""
 
     def test_maps_end_turn_to_stop(self):
-        assert self._map_finish_reason("end_turn") == "stop"
+        assert _map_finish_reason("end_turn") == "stop"
     # ... similar changes for other test methods
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py`
around lines 1574 - 1595, Move the class-level import of _map_finish_reason out
of TestMapFinishReason and place a standard import at the top of the test file
alongside other span_utils imports (import _map_finish_reason from
opentelemetry.instrumentation.anthropic.span_utils), then update
TestMapFinishReason to reference the top-level symbol (remove the class-level
import and staticmethod assignment) so the tests use the module-level
_map_finish_reason directly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In
`@packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py`:
- Around line 398-404: The test currently guards with "if raw:" so it silently
passes when GenAIAttributes.GEN_AI_OUTPUT_MESSAGES is missing; change it to
explicitly assert presence before parsing: assert
GenAIAttributes.GEN_AI_OUTPUT_MESSAGES in span.attributes (or assert raw is not
None) then call json.loads(raw) and proceed to assert output[0]["finish_reason"]
== ""; update the check around
span.attributes.get(GenAIAttributes.GEN_AI_OUTPUT_MESSAGES) and the subsequent
json.loads usage to ensure the attribute exists and is parsed only after the
assertion.

---

Nitpick comments:
In
`@packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py`:
- Around line 1574-1595: Move the class-level import of _map_finish_reason out
of TestMapFinishReason and place a standard import at the top of the test file
alongside other span_utils imports (import _map_finish_reason from
opentelemetry.instrumentation.anthropic.span_utils), then update
TestMapFinishReason to reference the top-level symbol (remove the class-level
import and staticmethod assignment) so the tests use the module-level
_map_finish_reason directly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a3b8a16f-3479-4146-99b8-e8e75b0c55a8

📥 Commits

Reviewing files that changed from the base of the PR and between 23fa4d9 and 1810943.

📒 Files selected for processing (5)
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py
  • packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_semconv_compliance.py
✅ Files skipped from review due to trivial changes (1)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_semconv_compliance.py
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py

@max-deygin-traceloop max-deygin-traceloop changed the title fix: standardize finish_reason fallback to empty string fix(openai, Anthropic) fixed finish_reasons Otel semconv compliance gaps. Mar 31, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-openai/tests/traces/test_finish_reasons.py (1)

381-423: Replace placeholder pass tests with explicit skip/xfail markers.

Silent passes make CI look green without coverage for these behaviors.

Proposed refactor
 class TestFinishReasonOmission:
@@
+    `@pytest.mark.skip`(reason="Needs mocked scenario with all provider finish_reason values = None")
     def test_finish_reasons_omitted_when_empty(
@@
-        pass
+        ...
@@
 class TestResponsesAPIFinishReasons:
@@
+    `@pytest.mark.skip`(reason="Enable when Responses API fixtures/cassettes are available")
     `@pytest.mark.vcr`
     def test_responses_api_extracts_finish_reason_from_provider(
@@
-        pass
+        ...
@@
+    `@pytest.mark.skip`(reason="Enable when Responses API fixtures/cassettes are available")
     `@pytest.mark.vcr`
     def test_responses_api_sets_top_level_finish_reasons_from_response(
@@
-        pass
+        ...

If you want, I can draft concrete mocked fixtures for these three tests so they can be fully asserted instead of skipped.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai/tests/traces/test_finish_reasons.py`
around lines 381 - 423, Replace the placeholder pass in the three tests so they
don't silently succeed: update
TestFinishReasonOmission.test_finish_reasons_omitted_when_empty,
TestResponsesAPIFinishReasons.test_responses_api_extracts_finish_reason_from_provider,
and
TestResponsesAPIFinishReasons.test_responses_api_sets_top_level_finish_reasons_from_response
to use explicit pytest markers (e.g., `@pytest.mark.skip`(reason="placeholder:
Responses API not available / requires mocked provider") or
`@pytest.mark.xfail`(reason="placeholder: requires mocked Responses API",
strict=False)) instead of pass; add an appropriate import for pytest if missing
and attach the marker directly above each test function so CI reports them as
skipped/expected-fail rather than passing silently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/opentelemetry-instrumentation-openai/tests/traces/test_finish_reasons.py`:
- Around line 359-368: The test test_streaming_deduplicates_finish_reasons is
currently invoking a non-streaming call; change the invocation of
openai_client.chat.completions.create in that test to use stream=True and
consume the returned stream (iterator/generator) so the streaming
accumulation/deduplication path is exercised (i.e., call
openai_client.chat.completions.create(..., stream=True) and iterate through the
stream to build/collect the final response), ensuring the assertions remain
against the streamed result.
- Around line 335-353: The test
test_finish_reason_defaults_to_empty_string_when_missing currently only checks
that finish_reason is a string; update it to mock a provider response where
finish_reason is None (when calling openai_client.chat.completions.create) and
then assert the traced output message's finish_reason equals the empty string
""; locate this test and adjust the setup to inject a controlled response with
finish_reason: None, call get_output_messages(span) and replace the loose
isinstance/string assertion with assert output_messages[0]["finish_reason"] ==
"" to confirm the empty-string fallback in the tracing logic.

---

Nitpick comments:
In
`@packages/opentelemetry-instrumentation-openai/tests/traces/test_finish_reasons.py`:
- Around line 381-423: Replace the placeholder pass in the three tests so they
don't silently succeed: update
TestFinishReasonOmission.test_finish_reasons_omitted_when_empty,
TestResponsesAPIFinishReasons.test_responses_api_extracts_finish_reason_from_provider,
and
TestResponsesAPIFinishReasons.test_responses_api_sets_top_level_finish_reasons_from_response
to use explicit pytest markers (e.g., `@pytest.mark.skip`(reason="placeholder:
Responses API not available / requires mocked provider") or
`@pytest.mark.xfail`(reason="placeholder: requires mocked Responses API",
strict=False)) instead of pass; add an appropriate import for pytest if missing
and attach the marker directly above each test function so CI reports them as
skipped/expected-fail rather than passing silently.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 150234c0-d877-4232-af77-e7a650c36543

📥 Commits

Reviewing files that changed from the base of the PR and between 1810943 and 596b30f.

📒 Files selected for processing (3)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_finish_reasons.py

Comment on lines +24 to +423
@pytest.mark.vcr
def test_completions_sets_top_level_finish_reasons(
self, instrument_legacy, span_exporter, openai_client
):
"""Non-streaming completions must set gen_ai.response.finish_reasons."""
openai_client.completions.create(
model="davinci-002",
prompt="Tell me a joke about opentelemetry",
)

spans = span_exporter.get_finished_spans()
assert len(spans) == 1
span = spans[0]

# Verify top-level finish_reasons attribute is set
finish_reasons = span.attributes.get(
GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS
)
assert finish_reasons is not None, "gen_ai.response.finish_reasons must be set"
assert isinstance(finish_reasons, (list, tuple))
assert len(finish_reasons) > 0
# Should contain mapped finish_reason from response
assert "length" in finish_reasons or "stop" in finish_reasons

@pytest.mark.vcr
def test_completions_finish_reason_in_output_messages(
self, instrument_legacy, span_exporter, openai_client
):
"""Output messages must have finish_reason field."""
openai_client.completions.create(
model="davinci-002",
prompt="Tell me a joke about opentelemetry",
)

spans = span_exporter.get_finished_spans()
span = spans[0]
output_messages = get_output_messages(span)

assert len(output_messages) > 0
for msg in output_messages:
assert "finish_reason" in msg
# finish_reason should be a string (empty string if missing from provider)
assert isinstance(msg["finish_reason"], str)

@pytest.mark.vcr
def test_completions_streaming_sets_top_level_finish_reasons(
self, instrument_legacy, span_exporter, mock_openai_client
):
"""Streaming completions must accumulate and set finish_reasons."""
response = mock_openai_client.completions.create(
model="davinci-002",
prompt="Tell me a joke about opentelemetry",
stream=True,
)

for _ in response:
pass

spans = span_exporter.get_finished_spans()
span = spans[0]

# Verify top-level finish_reasons attribute is set
finish_reasons = span.attributes.get(
GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS
)
assert finish_reasons is not None, "Streaming must set finish_reasons"
assert isinstance(finish_reasons, (list, tuple))
assert len(finish_reasons) > 0

@pytest.mark.vcr
@pytest.mark.asyncio
async def test_async_completions_sets_top_level_finish_reasons(
self, instrument_legacy, span_exporter, async_openai_client
):
"""Async completions must set gen_ai.response.finish_reasons."""
await async_openai_client.completions.create(
model="davinci-002",
prompt="Tell me a joke about opentelemetry",
)

spans = span_exporter.get_finished_spans()
span = spans[0]

finish_reasons = span.attributes.get(
GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS
)
assert finish_reasons is not None
assert isinstance(finish_reasons, (list, tuple))
assert len(finish_reasons) > 0

@pytest.mark.vcr
@pytest.mark.asyncio
async def test_async_completions_streaming_sets_top_level_finish_reasons(
self, instrument_legacy, span_exporter, async_openai_client
):
"""Async streaming completions must set finish_reasons."""
response = await async_openai_client.completions.create(
model="davinci-002",
prompt="Tell me a joke about opentelemetry",
stream=True,
)

async for _ in response:
pass

spans = span_exporter.get_finished_spans()
span = spans[0]

finish_reasons = span.attributes.get(
GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS
)
assert finish_reasons is not None
assert isinstance(finish_reasons, (list, tuple))
assert len(finish_reasons) > 0


class TestChatCompletionsFinishReasons:
"""Test finish_reason handling in chat completions API (P1-3)"""

@pytest.mark.vcr
def test_chat_completions_sets_top_level_finish_reasons(
self, instrument_legacy, span_exporter, openai_client
):
"""Non-streaming chat must set gen_ai.response.finish_reasons."""
openai_client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Tell me a joke about opentelemetry"}],
)

spans = span_exporter.get_finished_spans()
span = spans[0]

# Verify top-level finish_reasons attribute is set
finish_reasons = span.attributes.get(
GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS
)
assert finish_reasons is not None, "gen_ai.response.finish_reasons must be set"
assert isinstance(finish_reasons, (list, tuple))
assert len(finish_reasons) > 0
# Should contain "stop" for normal completion
assert "stop" in finish_reasons

@pytest.mark.vcr
def test_chat_completions_finish_reason_in_output_messages(
self, instrument_legacy, span_exporter, openai_client
):
"""Output messages must have finish_reason field."""
openai_client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Tell me a joke about opentelemetry"}],
)

spans = span_exporter.get_finished_spans()
span = spans[0]
output_messages = get_output_messages(span)

assert len(output_messages) > 0
for msg in output_messages:
assert "finish_reason" in msg
assert msg["finish_reason"] == "stop"

@pytest.mark.vcr
def test_chat_streaming_sets_top_level_finish_reasons(
self, instrument_legacy, span_exporter, openai_client
):
"""Streaming chat must accumulate and set finish_reasons."""
response = openai_client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Tell me a joke about opentelemetry"}],
stream=True,
)

for _ in response:
pass

spans = span_exporter.get_finished_spans()
span = spans[0]

# Verify top-level finish_reasons attribute is set
finish_reasons = span.attributes.get(
GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS
)
assert finish_reasons is not None, "Streaming must set finish_reasons"
assert isinstance(finish_reasons, (list, tuple))
assert len(finish_reasons) > 0

@pytest.mark.vcr
@pytest.mark.asyncio
async def test_async_chat_sets_top_level_finish_reasons(
self, instrument_legacy, span_exporter, async_openai_client
):
"""Async chat must set gen_ai.response.finish_reasons."""
await async_openai_client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Tell me a joke about opentelemetry"}],
)

spans = span_exporter.get_finished_spans()
span = spans[0]

finish_reasons = span.attributes.get(
GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS
)
assert finish_reasons is not None
assert isinstance(finish_reasons, (list, tuple))
assert len(finish_reasons) > 0

@pytest.mark.vcr
@pytest.mark.asyncio
async def test_async_chat_streaming_sets_top_level_finish_reasons(
self, instrument_legacy, span_exporter, async_openai_client
):
"""Async streaming chat must set finish_reasons."""
response = await async_openai_client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Tell me a joke about opentelemetry"}],
stream=True,
)

async for _ in response:
pass

spans = span_exporter.get_finished_spans()
span = spans[0]

finish_reasons = span.attributes.get(
GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS
)
assert finish_reasons is not None
assert isinstance(finish_reasons, (list, tuple))
assert len(finish_reasons) > 0


class TestFinishReasonMapping:
"""Test finish_reason mapping from provider values"""

@pytest.mark.vcr
def test_finish_reason_mapped_from_provider_not_derived(
self, instrument_legacy, span_exporter, openai_client
):
"""finish_reason must come from response, not inferred from parts.

This test validates that we use the actual finish_reason from the
provider response (e.g., "length", "content_filter") rather than
deriving it from the presence of tool_calls or other content.
"""
# Use a completion that hits max_tokens to get finish_reason="length"
openai_client.completions.create(
model="davinci-002",
prompt="Tell me a very long story about opentelemetry",
max_tokens=10, # Force length finish_reason
)

spans = span_exporter.get_finished_spans()
span = spans[0]
output_messages = get_output_messages(span)

# Verify finish_reason is "length" from provider, not "stop"
assert len(output_messages) > 0
assert output_messages[0]["finish_reason"] == "length"

# Verify top-level attribute also has "length"
finish_reasons = span.attributes.get(
GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS
)
assert "length" in finish_reasons

@pytest.mark.vcr
def test_finish_reason_tool_calls_mapped_correctly(
self, instrument_legacy, span_exporter, openai_client
):
"""finish_reason "tool_calls" must be mapped to "tool_call"."""
response = openai_client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "What's the weather in Boston?"}],
tools=[
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"},
},
},
},
}
],
)

spans = span_exporter.get_finished_spans()
span = spans[0]
output_messages = get_output_messages(span)

# OpenAI returns "tool_calls" but we should map to "tool_call"
if output_messages and output_messages[0].get("parts"):
has_tool_call = any(
p.get("type") == "tool_call" for p in output_messages[0]["parts"]
)
if has_tool_call:
# Should be mapped to "tool_call" (singular)
assert output_messages[0]["finish_reason"] == "tool_call"

finish_reasons = span.attributes.get(
GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS
)
assert "tool_call" in finish_reasons

@pytest.mark.vcr
def test_finish_reason_defaults_to_empty_string_when_missing(
self, instrument_legacy, span_exporter, openai_client
):
"""When provider doesn't return finish_reason, default to empty string."""
# This test may need a mocked response where finish_reason is None
openai_client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello"}],
)

spans = span_exporter.get_finished_spans()
span = spans[0]
output_messages = get_output_messages(span)

# Even if provider returns None, output message should have ""
assert len(output_messages) > 0
assert "finish_reason" in output_messages[0]
assert isinstance(output_messages[0]["finish_reason"], str)


class TestFinishReasonDeduplication:
"""Test finish_reason deduplication in streaming scenarios"""

@pytest.mark.vcr
def test_streaming_deduplicates_finish_reasons(
self, instrument_legacy, span_exporter, openai_client
):
"""Streaming with multiple choices should deduplicate finish_reasons."""
response = openai_client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Tell me a joke"}],
n=2, # Request 2 completions
)

spans = span_exporter.get_finished_spans()
span = spans[0]

finish_reasons = span.attributes.get(
GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS
)

if finish_reasons:
# Should be deduplicated (no duplicate "stop" entries)
assert len(finish_reasons) == len(set(finish_reasons))


class TestFinishReasonOmission:
"""Test that top-level finish_reasons is omitted when no meaningful values"""

def test_finish_reasons_omitted_when_empty(
self, instrument_legacy, span_exporter, openai_client
):
"""When no finish_reasons available, attribute should be omitted entirely.

This test validates that we don't set an empty array when there are
no meaningful finish_reasons (e.g., all None values filtered out).
"""
# This would require a mocked scenario where all finish_reasons are None
# For now, this is a placeholder test that documents the expected behavior
pass


class TestResponsesAPIFinishReasons:
"""Test finish_reason handling in Responses API (P1-1, P1-2)"""

@pytest.mark.vcr
def test_responses_api_extracts_finish_reason_from_provider(
self, instrument_legacy, span_exporter, openai_client
):
"""Responses API must extract finish_reason from provider response.

This test validates that finish_reason comes from the actual response
object, not derived from the presence of tool_calls or message types.
"""
# This test requires the Responses API which may not be in all OpenAI versions
# Placeholder for when Responses API tests are available
pass

@pytest.mark.vcr
def test_responses_api_sets_top_level_finish_reasons_from_response(
self, instrument_legacy, span_exporter, openai_client
):
"""Responses API must extract top-level finish_reasons from response choices.

This test validates that finish_reasons are read from response.choices[]
rather than fabricated from output block types.
"""
# Placeholder for Responses API test
pass
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Locate likely cassette files
echo "== Candidate VCR cassette files =="
fd -i 'cassette|cassettes|vcr' .

# 2) Scan YAML/JSON artifacts for likely secrets/PII markers
echo
echo "== Potential secret/PII markers in yaml/json artifacts =="
fd -e yml -e yaml -e json . | rg -n --iglob '*cassette*' \
  -e 'Authorization' \
  -e 'Bearer ' \
  -e 'api[_-]?key' \
  -e 'x-api-key' \
  -e 'sk-[A-Za-z0-9]+' || true

# 3) Show references to filters that scrub headers/body in VCR setup
echo
echo "== VCR scrub/filter configuration references =="
rg -n -C2 'filter_headers|before_record|before_record_request|before_record_response'

Repository: traceloop/openllmetry

Length of output: 13550


🏁 Script executed:

# Check if cassette files exist for test_finish_reasons.py
ls -la ./packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/ | head -20

echo "---"
echo "Count of cassette files:"
find ./packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/ -type f | wc -l

echo "---"
echo "Check for test_finish_reasons cassettes specifically:"
find ./packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/ -name "*finish_reason*" | head -10

Repository: traceloop/openllmetry

Length of output: 1122


🏁 Script executed:

# Verify the conftest.py has proper filter configuration
cat -n ./packages/opentelemetry-instrumentation-openai/tests/traces/conftest.py | head -30

Repository: traceloop/openllmetry

Length of output: 535


🏁 Script executed:

# Double-check: is there a test_finish_reasons directory?
ls -la ./packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/ | grep -i finish

# Also check if cassettes directory exists in the test directory itself
find ./packages/opentelemetry-instrumentation-openai/tests/traces -type d -name "cassettes" -o -type d -name "*finish*"

# Count total test methods in test_finish_reasons.py
grep -c "def test_" ./packages/opentelemetry-instrumentation-openai/tests/traces/test_finish_reasons.py || echo "File not found"

# Check if the test file exists
ls -la ./packages/opentelemetry-instrumentation-openai/tests/traces/test_finish_reasons.py 2>/dev/null || echo "test_finish_reasons.py not found"

Repository: traceloop/openllmetry

Length of output: 272


🏁 Script executed:

# Check the VCR record mode configuration in conftest.py (if set)
grep -n "record_mode\|vcr_config" ./packages/opentelemetry-instrumentation-openai/tests/traces/conftest.py

# Check if there's pytest.ini or pyproject.toml VCR settings
cat ./packages/opentelemetry-instrumentation-openai/tests/traces/pytest.ini 2>/dev/null || echo "No pytest.ini"

Repository: traceloop/openllmetry

Length of output: 98


🏁 Script executed:

# List all cassette subdirectories to see structure
ls -1 ./packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/

echo "---"

# Check if cassettes were recorded for any test in test_finish_reasons
find ./packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/ -type f -name "*.yaml" -o -name "*.json" | xargs grep -l "finish.reason\|finish_reason" | head -5 || echo "No cassettes found with finish_reason"

echo "---"

# Check when cassettes were last modified to see if they're recent
find ./packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/ -type f -printf '%T@ %Tc %p\n' | sort -n | tail -5

Repository: traceloop/openllmetry

Length of output: 1811


🏁 Script executed:

# Check test_finish_reasons.py to see what the test decorator says and cassette strategy
head -50 ./packages/opentelemetry-instrumentation-openai/tests/traces/test_finish_reasons.py

echo "---"

# Check if there's a test_finish_reasons cassette directory at all
find ./packages/opentelemetry-instrumentation-openai/tests/traces -name "*finish_reasons*" -type d

Repository: traceloop/openllmetry

Length of output: 1943


🏁 Script executed:

# Check the cassettes directory - does test_finish_reasons have any cassettes?
test -d ./packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_finish_reasons && echo "Directory exists" || echo "test_finish_reasons cassette directory DOES NOT EXIST"

# Count VCR-marked tests in test_finish_reasons.py
grep -c "@pytest.mark.vcr" ./packages/opentelemetry-instrumentation-openai/tests/traces/test_finish_reasons.py

# Check git status to see if cassettes were added
git -C ./packages/opentelemetry-instrumentation-openai status --short tests/traces/cassettes/ 2>/dev/null | head -20 || echo "Git status unavailable"

Repository: traceloop/openllmetry

Length of output: 121


🏁 Script executed:

# Check if there's a record_mode setting anywhere that might affect VCR behavior
grep -r "record_mode\|vcr.*once\|vcr.*new_episodes" ./packages/opentelemetry-instrumentation-openai/tests/ 2>/dev/null | head -10 || echo "No record_mode found"

# Verify the conftest fixtures are being used properly
grep -A5 "def openai_client" ./packages/opentelemetry-instrumentation-openai/tests/conftest.py | head -15

Repository: traceloop/openllmetry

Length of output: 149


🏁 Script executed:

# Verify the conftest filter configuration is complete
cat ./packages/opentelemetry-instrumentation-openai/tests/traces/conftest.py

echo "---"

# Check if cassettes need to be recorded by looking at git diff or status
git -C ./packages/opentelemetry-instrumentation-openai diff --name-only 2>/dev/null | grep -i cassette || echo "No cassette changes in diff"

echo "---"

# Check if test_finish_reasons.py was just added
git -C ./packages/opentelemetry-instrumentation-openai status --short 2>/dev/null | grep finish_reason || echo "File status unclear"

Repository: traceloop/openllmetry

Length of output: 479


🏁 Script executed:

# Confirm what happens when pytest runs - check if there's a record_mode default
python3 -c "
import os
os.chdir('./packages/opentelemetry-instrumentation-openai/tests/traces')
# Check if pytest-vcr is configured anywhere
with open('conftest.py', 'r') as f:
    content = f.read()
    if 'record_mode' in content:
        print('record_mode found in conftest')
    else:
        print('No explicit record_mode - will use default (usually once)')
" 2>/dev/null || echo "Cannot check record_mode"

Repository: traceloop/openllmetry

Length of output: 122


Cassettes for test_finish_reasons.py must be recorded before merging.

The 16 new VCR-marked tests in test_finish_reasons.py have no corresponding cassette files in ./packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_finish_reasons/. With the default record_mode=once, these tests cannot playback in CI and will fail.

The VCR filter configuration in conftest.py is properly configured with filter_headers for authorization, api-key, openai-organization, openai-project, set-cookie, and x-request-id. However, the cassettes themselves must be recorded. Record them locally using a valid OpenAI API key in an environment variable, and commit the cassettes to verify they contain no secrets and are properly scrubbed.

Comment on lines +335 to +353
def test_finish_reason_defaults_to_empty_string_when_missing(
self, instrument_legacy, span_exporter, openai_client
):
"""When provider doesn't return finish_reason, default to empty string."""
# This test may need a mocked response where finish_reason is None
openai_client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello"}],
)

spans = span_exporter.get_finished_spans()
span = spans[0]
output_messages = get_output_messages(span)

# Even if provider returns None, output message should have ""
assert len(output_messages) > 0
assert "finish_reason" in output_messages[0]
assert isinstance(output_messages[0]["finish_reason"], str)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

This test does not assert the empty-string fallback it describes.

Right now it passes for any string value; it should assert "" against a controlled response where provider finish_reason is None.

Proposed assertion tightening
         assert len(output_messages) > 0
         assert "finish_reason" in output_messages[0]
-        assert isinstance(output_messages[0]["finish_reason"], str)
+        assert output_messages[0]["finish_reason"] == ""
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai/tests/traces/test_finish_reasons.py`
around lines 335 - 353, The test
test_finish_reason_defaults_to_empty_string_when_missing currently only checks
that finish_reason is a string; update it to mock a provider response where
finish_reason is None (when calling openai_client.chat.completions.create) and
then assert the traced output message's finish_reason equals the empty string
""; locate this test and adjust the setup to inject a controlled response with
finish_reason: None, call get_output_messages(span) and replace the loose
isinstance/string assertion with assert output_messages[0]["finish_reason"] ==
"" to confirm the empty-string fallback in the tracing logic.

Comment on lines +359 to +368
def test_streaming_deduplicates_finish_reasons(
self, instrument_legacy, span_exporter, openai_client
):
"""Streaming with multiple choices should deduplicate finish_reasons."""
response = openai_client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Tell me a joke"}],
n=2, # Request 2 completions
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

test_streaming_deduplicates_finish_reasons is not testing streaming code.

The test name/docstring says streaming, but Line 363-367 call is non-streaming (stream=True is missing), so this misses the streaming accumulation path.

Proposed fix
     response = openai_client.chat.completions.create(
         model="gpt-3.5-turbo",
         messages=[{"role": "user", "content": "Tell me a joke"}],
         n=2,  # Request 2 completions
+        stream=True,
     )
 
+    for _ in response:
+        pass
+
     spans = span_exporter.get_finished_spans()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai/tests/traces/test_finish_reasons.py`
around lines 359 - 368, The test test_streaming_deduplicates_finish_reasons is
currently invoking a non-streaming call; change the invocation of
openai_client.chat.completions.create in that test to use stream=True and
consume the returned stream (iterator/generator) so the streaming
accumulation/deduplication path is exercised (i.e., call
openai_client.chat.completions.create(..., stream=True) and iterate through the
stream to build/collect the final response), ensuring the assertions remain
against the streamed result.

@OzBenSimhonTraceloop OzBenSimhonTraceloop self-requested a review March 31, 2026 13:10
@max-deygin-traceloop max-deygin-traceloop force-pushed the max/fix-finish-reason-fallback branch from b4198cc to 4f5a25e Compare March 31, 2026 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants