fix(openai, Anthropic) fixed finish_reasons Otel semconv compliance gaps.#3916
fix(openai, Anthropic) fixed finish_reasons Otel semconv compliance gaps.#3916max-deygin-traceloop wants to merge 11 commits intomainfrom
Conversation
…, Anthropic, LangChain Aligns all packages with the Bedrock convention: when finish_reason is unknown/None, use "" (empty string) instead of fabricating "stop" (OpenAI) or omitting the key entirely (Anthropic, LangChain). - OpenAI chat_wrappers: `or "stop"` → `or ""` - OpenAI completion_wrappers: `or "stop"` → `or ""` - Anthropic span_utils: conditional `if mapped` → always set with `""` fallback (3 spots) - LangChain span_utils: conditional `if fr` → always include `finish_reason` with `""` fallback - Updated tests in all three packages to verify the new behavior
|
Important Review skippedToo many files! This PR contains 164 files, which is 14 over the limit of 150. ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (164)
You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughStandardizes finish_reason handling across Anthropic, LangChain, and OpenAI instrumentation: output message objects always include a Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py`:
- Around line 398-404: The test currently uses "if raw:" which silently passes
when GenAIAttributes.GEN_AI_OUTPUT_MESSAGES is missing; change it to assert the
attribute exists (e.g., assert raw is not None or assert
GenAIAttributes.GEN_AI_OUTPUT_MESSAGES in span.attributes) before loading JSON
so the test fails on missing output messages, then keep the json.loads(raw) and
the existing finish_reason assertion; refer to the variable raw and the constant
GenAIAttributes.GEN_AI_OUTPUT_MESSAGES in test_semconv_span_attrs.py to locate
where to add the presence assertion.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9ab06105-7ac8-4624-b334-bb0b979285e1
📒 Files selected for processing (7)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.pypackages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.pypackages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.pypackages/opentelemetry-instrumentation-langchain/tests/test_finish_reasons.pypackages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.pypackages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.pypackages/opentelemetry-instrumentation-openai/tests/traces/test_semconv_compliance.py
| raw = span.attributes.get(GenAIAttributes.GEN_AI_OUTPUT_MESSAGES) | ||
| if raw: | ||
| output = json.loads(raw) | ||
| assert output[0]["finish_reason"] == "", ( | ||
| f"Expected '' for missing streaming finish_reason, got '{output[0]['finish_reason']}'" | ||
| ) | ||
|
|
There was a problem hiding this comment.
Make the streaming fallback test fail when output messages are missing.
At Line 399, if raw: makes the test pass even if GEN_AI_OUTPUT_MESSAGES is never written. Assert presence first so regressions are caught.
✅ Suggested test hardening
- raw = span.attributes.get(GenAIAttributes.GEN_AI_OUTPUT_MESSAGES)
- if raw:
- output = json.loads(raw)
- assert output[0]["finish_reason"] == "", (
- f"Expected '' for missing streaming finish_reason, got '{output[0]['finish_reason']}'"
- )
+ raw = span.attributes.get(GenAIAttributes.GEN_AI_OUTPUT_MESSAGES)
+ assert raw is not None, "GEN_AI_OUTPUT_MESSAGES must be set for streaming text events"
+ output = json.loads(raw)
+ assert output[0]["finish_reason"] == "", (
+ f"Expected '' for missing streaming finish_reason, got '{output[0]['finish_reason']}'"
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| raw = span.attributes.get(GenAIAttributes.GEN_AI_OUTPUT_MESSAGES) | |
| if raw: | |
| output = json.loads(raw) | |
| assert output[0]["finish_reason"] == "", ( | |
| f"Expected '' for missing streaming finish_reason, got '{output[0]['finish_reason']}'" | |
| ) | |
| raw = span.attributes.get(GenAIAttributes.GEN_AI_OUTPUT_MESSAGES) | |
| assert raw is not None, "GEN_AI_OUTPUT_MESSAGES must be set for streaming text events" | |
| output = json.loads(raw) | |
| assert output[0]["finish_reason"] == "", ( | |
| f"Expected '' for missing streaming finish_reason, got '{output[0]['finish_reason']}'" | |
| ) | |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py`
around lines 398 - 404, The test currently uses "if raw:" which silently passes
when GenAIAttributes.GEN_AI_OUTPUT_MESSAGES is missing; change it to assert the
attribute exists (e.g., assert raw is not None or assert
GenAIAttributes.GEN_AI_OUTPUT_MESSAGES in span.attributes) before loading JSON
so the test fails on missing output messages, then keep the json.loads(raw) and
the existing finish_reason assertion; refer to the variable raw and the constant
GenAIAttributes.GEN_AI_OUTPUT_MESSAGES in test_semconv_span_attrs.py to locate
where to add the presence assertion.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py (1)
398-404:⚠️ Potential issue | 🟡 MinorTest silently passes if output messages are missing.
The
if raw:guard means the test will pass even ifGEN_AI_OUTPUT_MESSAGESis never written. Assert presence first to catch regressions.✅ Suggested fix
raw = span.attributes.get(GenAIAttributes.GEN_AI_OUTPUT_MESSAGES) - if raw: - output = json.loads(raw) - assert output[0]["finish_reason"] == "", ( - f"Expected '' for missing streaming finish_reason, got '{output[0]['finish_reason']}'" - ) + assert raw is not None, "GEN_AI_OUTPUT_MESSAGES must be set for streaming text events" + output = json.loads(raw) + assert output[0]["finish_reason"] == "", ( + f"Expected '' for missing streaming finish_reason, got '{output[0]['finish_reason']}'" + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py` around lines 398 - 404, The test currently guards with "if raw:" so it silently passes when GenAIAttributes.GEN_AI_OUTPUT_MESSAGES is missing; change it to explicitly assert presence before parsing: assert GenAIAttributes.GEN_AI_OUTPUT_MESSAGES in span.attributes (or assert raw is not None) then call json.loads(raw) and proceed to assert output[0]["finish_reason"] == ""; update the check around span.attributes.get(GenAIAttributes.GEN_AI_OUTPUT_MESSAGES) and the subsequent json.loads usage to ensure the attribute exists and is parsed only after the assertion.
🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py (1)
1574-1595: Good coverage of the_map_finish_reasonhelper.The tests comprehensively cover:
- All falsy inputs returning
""- All known Anthropic reason mappings
- Unknown reason passthrough
One minor style note: the class-level import pattern is functional but unconventional. Consider importing at the top of the file alongside other imports from
span_utils(lines 35-39) for consistency.♻️ Optional: move import to top of file
At top of file (around line 39):
from opentelemetry.instrumentation.anthropic.span_utils import ( aset_input_attributes, set_response_attributes, set_streaming_response_attributes, + _map_finish_reason, )Then simplify the class:
class TestMapFinishReason: - from opentelemetry.instrumentation.anthropic.span_utils import _map_finish_reason - _map_finish_reason = staticmethod(_map_finish_reason) - `@pytest.mark.parametrize`("falsy_input", [None, "", 0, False]) def test_returns_empty_string_for_falsy(self, falsy_input): - assert self._map_finish_reason(falsy_input) == "" + assert _map_finish_reason(falsy_input) == "" def test_maps_end_turn_to_stop(self): - assert self._map_finish_reason("end_turn") == "stop" + assert _map_finish_reason("end_turn") == "stop" # ... similar changes for other test methods🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py` around lines 1574 - 1595, Move the class-level import of _map_finish_reason out of TestMapFinishReason and place a standard import at the top of the test file alongside other span_utils imports (import _map_finish_reason from opentelemetry.instrumentation.anthropic.span_utils), then update TestMapFinishReason to reference the top-level symbol (remove the class-level import and staticmethod assignment) so the tests use the module-level _map_finish_reason directly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In
`@packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py`:
- Around line 398-404: The test currently guards with "if raw:" so it silently
passes when GenAIAttributes.GEN_AI_OUTPUT_MESSAGES is missing; change it to
explicitly assert presence before parsing: assert
GenAIAttributes.GEN_AI_OUTPUT_MESSAGES in span.attributes (or assert raw is not
None) then call json.loads(raw) and proceed to assert output[0]["finish_reason"]
== ""; update the check around
span.attributes.get(GenAIAttributes.GEN_AI_OUTPUT_MESSAGES) and the subsequent
json.loads usage to ensure the attribute exists and is parsed only after the
assertion.
---
Nitpick comments:
In
`@packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py`:
- Around line 1574-1595: Move the class-level import of _map_finish_reason out
of TestMapFinishReason and place a standard import at the top of the test file
alongside other span_utils imports (import _map_finish_reason from
opentelemetry.instrumentation.anthropic.span_utils), then update
TestMapFinishReason to reference the top-level symbol (remove the class-level
import and staticmethod assignment) so the tests use the module-level
_map_finish_reason directly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a3b8a16f-3479-4146-99b8-e8e75b0c55a8
📒 Files selected for processing (5)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.pypackages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.pypackages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.pypackages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.pypackages/opentelemetry-instrumentation-openai/tests/traces/test_semconv_compliance.py
✅ Files skipped from review due to trivial changes (1)
- packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py
🚧 Files skipped from review as they are similar to previous changes (3)
- packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py
- packages/opentelemetry-instrumentation-openai/tests/traces/test_semconv_compliance.py
- packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-openai/tests/traces/test_finish_reasons.py (1)
381-423: Replace placeholderpasstests with explicit skip/xfail markers.Silent passes make CI look green without coverage for these behaviors.
Proposed refactor
class TestFinishReasonOmission: @@ + `@pytest.mark.skip`(reason="Needs mocked scenario with all provider finish_reason values = None") def test_finish_reasons_omitted_when_empty( @@ - pass + ... @@ class TestResponsesAPIFinishReasons: @@ + `@pytest.mark.skip`(reason="Enable when Responses API fixtures/cassettes are available") `@pytest.mark.vcr` def test_responses_api_extracts_finish_reason_from_provider( @@ - pass + ... @@ + `@pytest.mark.skip`(reason="Enable when Responses API fixtures/cassettes are available") `@pytest.mark.vcr` def test_responses_api_sets_top_level_finish_reasons_from_response( @@ - pass + ...If you want, I can draft concrete mocked fixtures for these three tests so they can be fully asserted instead of skipped.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-openai/tests/traces/test_finish_reasons.py` around lines 381 - 423, Replace the placeholder pass in the three tests so they don't silently succeed: update TestFinishReasonOmission.test_finish_reasons_omitted_when_empty, TestResponsesAPIFinishReasons.test_responses_api_extracts_finish_reason_from_provider, and TestResponsesAPIFinishReasons.test_responses_api_sets_top_level_finish_reasons_from_response to use explicit pytest markers (e.g., `@pytest.mark.skip`(reason="placeholder: Responses API not available / requires mocked provider") or `@pytest.mark.xfail`(reason="placeholder: requires mocked Responses API", strict=False)) instead of pass; add an appropriate import for pytest if missing and attach the marker directly above each test function so CI reports them as skipped/expected-fail rather than passing silently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@packages/opentelemetry-instrumentation-openai/tests/traces/test_finish_reasons.py`:
- Around line 359-368: The test test_streaming_deduplicates_finish_reasons is
currently invoking a non-streaming call; change the invocation of
openai_client.chat.completions.create in that test to use stream=True and
consume the returned stream (iterator/generator) so the streaming
accumulation/deduplication path is exercised (i.e., call
openai_client.chat.completions.create(..., stream=True) and iterate through the
stream to build/collect the final response), ensuring the assertions remain
against the streamed result.
- Around line 335-353: The test
test_finish_reason_defaults_to_empty_string_when_missing currently only checks
that finish_reason is a string; update it to mock a provider response where
finish_reason is None (when calling openai_client.chat.completions.create) and
then assert the traced output message's finish_reason equals the empty string
""; locate this test and adjust the setup to inject a controlled response with
finish_reason: None, call get_output_messages(span) and replace the loose
isinstance/string assertion with assert output_messages[0]["finish_reason"] ==
"" to confirm the empty-string fallback in the tracing logic.
---
Nitpick comments:
In
`@packages/opentelemetry-instrumentation-openai/tests/traces/test_finish_reasons.py`:
- Around line 381-423: Replace the placeholder pass in the three tests so they
don't silently succeed: update
TestFinishReasonOmission.test_finish_reasons_omitted_when_empty,
TestResponsesAPIFinishReasons.test_responses_api_extracts_finish_reason_from_provider,
and
TestResponsesAPIFinishReasons.test_responses_api_sets_top_level_finish_reasons_from_response
to use explicit pytest markers (e.g., `@pytest.mark.skip`(reason="placeholder:
Responses API not available / requires mocked provider") or
`@pytest.mark.xfail`(reason="placeholder: requires mocked Responses API",
strict=False)) instead of pass; add an appropriate import for pytest if missing
and attach the marker directly above each test function so CI reports them as
skipped/expected-fail rather than passing silently.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 150234c0-d877-4232-af77-e7a650c36543
📒 Files selected for processing (3)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.pypackages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.pypackages/opentelemetry-instrumentation-openai/tests/traces/test_finish_reasons.py
| @pytest.mark.vcr | ||
| def test_completions_sets_top_level_finish_reasons( | ||
| self, instrument_legacy, span_exporter, openai_client | ||
| ): | ||
| """Non-streaming completions must set gen_ai.response.finish_reasons.""" | ||
| openai_client.completions.create( | ||
| model="davinci-002", | ||
| prompt="Tell me a joke about opentelemetry", | ||
| ) | ||
|
|
||
| spans = span_exporter.get_finished_spans() | ||
| assert len(spans) == 1 | ||
| span = spans[0] | ||
|
|
||
| # Verify top-level finish_reasons attribute is set | ||
| finish_reasons = span.attributes.get( | ||
| GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS | ||
| ) | ||
| assert finish_reasons is not None, "gen_ai.response.finish_reasons must be set" | ||
| assert isinstance(finish_reasons, (list, tuple)) | ||
| assert len(finish_reasons) > 0 | ||
| # Should contain mapped finish_reason from response | ||
| assert "length" in finish_reasons or "stop" in finish_reasons | ||
|
|
||
| @pytest.mark.vcr | ||
| def test_completions_finish_reason_in_output_messages( | ||
| self, instrument_legacy, span_exporter, openai_client | ||
| ): | ||
| """Output messages must have finish_reason field.""" | ||
| openai_client.completions.create( | ||
| model="davinci-002", | ||
| prompt="Tell me a joke about opentelemetry", | ||
| ) | ||
|
|
||
| spans = span_exporter.get_finished_spans() | ||
| span = spans[0] | ||
| output_messages = get_output_messages(span) | ||
|
|
||
| assert len(output_messages) > 0 | ||
| for msg in output_messages: | ||
| assert "finish_reason" in msg | ||
| # finish_reason should be a string (empty string if missing from provider) | ||
| assert isinstance(msg["finish_reason"], str) | ||
|
|
||
| @pytest.mark.vcr | ||
| def test_completions_streaming_sets_top_level_finish_reasons( | ||
| self, instrument_legacy, span_exporter, mock_openai_client | ||
| ): | ||
| """Streaming completions must accumulate and set finish_reasons.""" | ||
| response = mock_openai_client.completions.create( | ||
| model="davinci-002", | ||
| prompt="Tell me a joke about opentelemetry", | ||
| stream=True, | ||
| ) | ||
|
|
||
| for _ in response: | ||
| pass | ||
|
|
||
| spans = span_exporter.get_finished_spans() | ||
| span = spans[0] | ||
|
|
||
| # Verify top-level finish_reasons attribute is set | ||
| finish_reasons = span.attributes.get( | ||
| GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS | ||
| ) | ||
| assert finish_reasons is not None, "Streaming must set finish_reasons" | ||
| assert isinstance(finish_reasons, (list, tuple)) | ||
| assert len(finish_reasons) > 0 | ||
|
|
||
| @pytest.mark.vcr | ||
| @pytest.mark.asyncio | ||
| async def test_async_completions_sets_top_level_finish_reasons( | ||
| self, instrument_legacy, span_exporter, async_openai_client | ||
| ): | ||
| """Async completions must set gen_ai.response.finish_reasons.""" | ||
| await async_openai_client.completions.create( | ||
| model="davinci-002", | ||
| prompt="Tell me a joke about opentelemetry", | ||
| ) | ||
|
|
||
| spans = span_exporter.get_finished_spans() | ||
| span = spans[0] | ||
|
|
||
| finish_reasons = span.attributes.get( | ||
| GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS | ||
| ) | ||
| assert finish_reasons is not None | ||
| assert isinstance(finish_reasons, (list, tuple)) | ||
| assert len(finish_reasons) > 0 | ||
|
|
||
| @pytest.mark.vcr | ||
| @pytest.mark.asyncio | ||
| async def test_async_completions_streaming_sets_top_level_finish_reasons( | ||
| self, instrument_legacy, span_exporter, async_openai_client | ||
| ): | ||
| """Async streaming completions must set finish_reasons.""" | ||
| response = await async_openai_client.completions.create( | ||
| model="davinci-002", | ||
| prompt="Tell me a joke about opentelemetry", | ||
| stream=True, | ||
| ) | ||
|
|
||
| async for _ in response: | ||
| pass | ||
|
|
||
| spans = span_exporter.get_finished_spans() | ||
| span = spans[0] | ||
|
|
||
| finish_reasons = span.attributes.get( | ||
| GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS | ||
| ) | ||
| assert finish_reasons is not None | ||
| assert isinstance(finish_reasons, (list, tuple)) | ||
| assert len(finish_reasons) > 0 | ||
|
|
||
|
|
||
| class TestChatCompletionsFinishReasons: | ||
| """Test finish_reason handling in chat completions API (P1-3)""" | ||
|
|
||
| @pytest.mark.vcr | ||
| def test_chat_completions_sets_top_level_finish_reasons( | ||
| self, instrument_legacy, span_exporter, openai_client | ||
| ): | ||
| """Non-streaming chat must set gen_ai.response.finish_reasons.""" | ||
| openai_client.chat.completions.create( | ||
| model="gpt-3.5-turbo", | ||
| messages=[{"role": "user", "content": "Tell me a joke about opentelemetry"}], | ||
| ) | ||
|
|
||
| spans = span_exporter.get_finished_spans() | ||
| span = spans[0] | ||
|
|
||
| # Verify top-level finish_reasons attribute is set | ||
| finish_reasons = span.attributes.get( | ||
| GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS | ||
| ) | ||
| assert finish_reasons is not None, "gen_ai.response.finish_reasons must be set" | ||
| assert isinstance(finish_reasons, (list, tuple)) | ||
| assert len(finish_reasons) > 0 | ||
| # Should contain "stop" for normal completion | ||
| assert "stop" in finish_reasons | ||
|
|
||
| @pytest.mark.vcr | ||
| def test_chat_completions_finish_reason_in_output_messages( | ||
| self, instrument_legacy, span_exporter, openai_client | ||
| ): | ||
| """Output messages must have finish_reason field.""" | ||
| openai_client.chat.completions.create( | ||
| model="gpt-3.5-turbo", | ||
| messages=[{"role": "user", "content": "Tell me a joke about opentelemetry"}], | ||
| ) | ||
|
|
||
| spans = span_exporter.get_finished_spans() | ||
| span = spans[0] | ||
| output_messages = get_output_messages(span) | ||
|
|
||
| assert len(output_messages) > 0 | ||
| for msg in output_messages: | ||
| assert "finish_reason" in msg | ||
| assert msg["finish_reason"] == "stop" | ||
|
|
||
| @pytest.mark.vcr | ||
| def test_chat_streaming_sets_top_level_finish_reasons( | ||
| self, instrument_legacy, span_exporter, openai_client | ||
| ): | ||
| """Streaming chat must accumulate and set finish_reasons.""" | ||
| response = openai_client.chat.completions.create( | ||
| model="gpt-3.5-turbo", | ||
| messages=[{"role": "user", "content": "Tell me a joke about opentelemetry"}], | ||
| stream=True, | ||
| ) | ||
|
|
||
| for _ in response: | ||
| pass | ||
|
|
||
| spans = span_exporter.get_finished_spans() | ||
| span = spans[0] | ||
|
|
||
| # Verify top-level finish_reasons attribute is set | ||
| finish_reasons = span.attributes.get( | ||
| GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS | ||
| ) | ||
| assert finish_reasons is not None, "Streaming must set finish_reasons" | ||
| assert isinstance(finish_reasons, (list, tuple)) | ||
| assert len(finish_reasons) > 0 | ||
|
|
||
| @pytest.mark.vcr | ||
| @pytest.mark.asyncio | ||
| async def test_async_chat_sets_top_level_finish_reasons( | ||
| self, instrument_legacy, span_exporter, async_openai_client | ||
| ): | ||
| """Async chat must set gen_ai.response.finish_reasons.""" | ||
| await async_openai_client.chat.completions.create( | ||
| model="gpt-3.5-turbo", | ||
| messages=[{"role": "user", "content": "Tell me a joke about opentelemetry"}], | ||
| ) | ||
|
|
||
| spans = span_exporter.get_finished_spans() | ||
| span = spans[0] | ||
|
|
||
| finish_reasons = span.attributes.get( | ||
| GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS | ||
| ) | ||
| assert finish_reasons is not None | ||
| assert isinstance(finish_reasons, (list, tuple)) | ||
| assert len(finish_reasons) > 0 | ||
|
|
||
| @pytest.mark.vcr | ||
| @pytest.mark.asyncio | ||
| async def test_async_chat_streaming_sets_top_level_finish_reasons( | ||
| self, instrument_legacy, span_exporter, async_openai_client | ||
| ): | ||
| """Async streaming chat must set finish_reasons.""" | ||
| response = await async_openai_client.chat.completions.create( | ||
| model="gpt-3.5-turbo", | ||
| messages=[{"role": "user", "content": "Tell me a joke about opentelemetry"}], | ||
| stream=True, | ||
| ) | ||
|
|
||
| async for _ in response: | ||
| pass | ||
|
|
||
| spans = span_exporter.get_finished_spans() | ||
| span = spans[0] | ||
|
|
||
| finish_reasons = span.attributes.get( | ||
| GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS | ||
| ) | ||
| assert finish_reasons is not None | ||
| assert isinstance(finish_reasons, (list, tuple)) | ||
| assert len(finish_reasons) > 0 | ||
|
|
||
|
|
||
| class TestFinishReasonMapping: | ||
| """Test finish_reason mapping from provider values""" | ||
|
|
||
| @pytest.mark.vcr | ||
| def test_finish_reason_mapped_from_provider_not_derived( | ||
| self, instrument_legacy, span_exporter, openai_client | ||
| ): | ||
| """finish_reason must come from response, not inferred from parts. | ||
|
|
||
| This test validates that we use the actual finish_reason from the | ||
| provider response (e.g., "length", "content_filter") rather than | ||
| deriving it from the presence of tool_calls or other content. | ||
| """ | ||
| # Use a completion that hits max_tokens to get finish_reason="length" | ||
| openai_client.completions.create( | ||
| model="davinci-002", | ||
| prompt="Tell me a very long story about opentelemetry", | ||
| max_tokens=10, # Force length finish_reason | ||
| ) | ||
|
|
||
| spans = span_exporter.get_finished_spans() | ||
| span = spans[0] | ||
| output_messages = get_output_messages(span) | ||
|
|
||
| # Verify finish_reason is "length" from provider, not "stop" | ||
| assert len(output_messages) > 0 | ||
| assert output_messages[0]["finish_reason"] == "length" | ||
|
|
||
| # Verify top-level attribute also has "length" | ||
| finish_reasons = span.attributes.get( | ||
| GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS | ||
| ) | ||
| assert "length" in finish_reasons | ||
|
|
||
| @pytest.mark.vcr | ||
| def test_finish_reason_tool_calls_mapped_correctly( | ||
| self, instrument_legacy, span_exporter, openai_client | ||
| ): | ||
| """finish_reason "tool_calls" must be mapped to "tool_call".""" | ||
| response = openai_client.chat.completions.create( | ||
| model="gpt-3.5-turbo", | ||
| messages=[{"role": "user", "content": "What's the weather in Boston?"}], | ||
| tools=[ | ||
| { | ||
| "type": "function", | ||
| "function": { | ||
| "name": "get_weather", | ||
| "description": "Get the current weather", | ||
| "parameters": { | ||
| "type": "object", | ||
| "properties": { | ||
| "location": {"type": "string"}, | ||
| }, | ||
| }, | ||
| }, | ||
| } | ||
| ], | ||
| ) | ||
|
|
||
| spans = span_exporter.get_finished_spans() | ||
| span = spans[0] | ||
| output_messages = get_output_messages(span) | ||
|
|
||
| # OpenAI returns "tool_calls" but we should map to "tool_call" | ||
| if output_messages and output_messages[0].get("parts"): | ||
| has_tool_call = any( | ||
| p.get("type") == "tool_call" for p in output_messages[0]["parts"] | ||
| ) | ||
| if has_tool_call: | ||
| # Should be mapped to "tool_call" (singular) | ||
| assert output_messages[0]["finish_reason"] == "tool_call" | ||
|
|
||
| finish_reasons = span.attributes.get( | ||
| GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS | ||
| ) | ||
| assert "tool_call" in finish_reasons | ||
|
|
||
| @pytest.mark.vcr | ||
| def test_finish_reason_defaults_to_empty_string_when_missing( | ||
| self, instrument_legacy, span_exporter, openai_client | ||
| ): | ||
| """When provider doesn't return finish_reason, default to empty string.""" | ||
| # This test may need a mocked response where finish_reason is None | ||
| openai_client.chat.completions.create( | ||
| model="gpt-3.5-turbo", | ||
| messages=[{"role": "user", "content": "Hello"}], | ||
| ) | ||
|
|
||
| spans = span_exporter.get_finished_spans() | ||
| span = spans[0] | ||
| output_messages = get_output_messages(span) | ||
|
|
||
| # Even if provider returns None, output message should have "" | ||
| assert len(output_messages) > 0 | ||
| assert "finish_reason" in output_messages[0] | ||
| assert isinstance(output_messages[0]["finish_reason"], str) | ||
|
|
||
|
|
||
| class TestFinishReasonDeduplication: | ||
| """Test finish_reason deduplication in streaming scenarios""" | ||
|
|
||
| @pytest.mark.vcr | ||
| def test_streaming_deduplicates_finish_reasons( | ||
| self, instrument_legacy, span_exporter, openai_client | ||
| ): | ||
| """Streaming with multiple choices should deduplicate finish_reasons.""" | ||
| response = openai_client.chat.completions.create( | ||
| model="gpt-3.5-turbo", | ||
| messages=[{"role": "user", "content": "Tell me a joke"}], | ||
| n=2, # Request 2 completions | ||
| ) | ||
|
|
||
| spans = span_exporter.get_finished_spans() | ||
| span = spans[0] | ||
|
|
||
| finish_reasons = span.attributes.get( | ||
| GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS | ||
| ) | ||
|
|
||
| if finish_reasons: | ||
| # Should be deduplicated (no duplicate "stop" entries) | ||
| assert len(finish_reasons) == len(set(finish_reasons)) | ||
|
|
||
|
|
||
| class TestFinishReasonOmission: | ||
| """Test that top-level finish_reasons is omitted when no meaningful values""" | ||
|
|
||
| def test_finish_reasons_omitted_when_empty( | ||
| self, instrument_legacy, span_exporter, openai_client | ||
| ): | ||
| """When no finish_reasons available, attribute should be omitted entirely. | ||
|
|
||
| This test validates that we don't set an empty array when there are | ||
| no meaningful finish_reasons (e.g., all None values filtered out). | ||
| """ | ||
| # This would require a mocked scenario where all finish_reasons are None | ||
| # For now, this is a placeholder test that documents the expected behavior | ||
| pass | ||
|
|
||
|
|
||
| class TestResponsesAPIFinishReasons: | ||
| """Test finish_reason handling in Responses API (P1-1, P1-2)""" | ||
|
|
||
| @pytest.mark.vcr | ||
| def test_responses_api_extracts_finish_reason_from_provider( | ||
| self, instrument_legacy, span_exporter, openai_client | ||
| ): | ||
| """Responses API must extract finish_reason from provider response. | ||
|
|
||
| This test validates that finish_reason comes from the actual response | ||
| object, not derived from the presence of tool_calls or message types. | ||
| """ | ||
| # This test requires the Responses API which may not be in all OpenAI versions | ||
| # Placeholder for when Responses API tests are available | ||
| pass | ||
|
|
||
| @pytest.mark.vcr | ||
| def test_responses_api_sets_top_level_finish_reasons_from_response( | ||
| self, instrument_legacy, span_exporter, openai_client | ||
| ): | ||
| """Responses API must extract top-level finish_reasons from response choices. | ||
|
|
||
| This test validates that finish_reasons are read from response.choices[] | ||
| rather than fabricated from output block types. | ||
| """ | ||
| # Placeholder for Responses API test | ||
| pass |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Locate likely cassette files
echo "== Candidate VCR cassette files =="
fd -i 'cassette|cassettes|vcr' .
# 2) Scan YAML/JSON artifacts for likely secrets/PII markers
echo
echo "== Potential secret/PII markers in yaml/json artifacts =="
fd -e yml -e yaml -e json . | rg -n --iglob '*cassette*' \
-e 'Authorization' \
-e 'Bearer ' \
-e 'api[_-]?key' \
-e 'x-api-key' \
-e 'sk-[A-Za-z0-9]+' || true
# 3) Show references to filters that scrub headers/body in VCR setup
echo
echo "== VCR scrub/filter configuration references =="
rg -n -C2 'filter_headers|before_record|before_record_request|before_record_response'Repository: traceloop/openllmetry
Length of output: 13550
🏁 Script executed:
# Check if cassette files exist for test_finish_reasons.py
ls -la ./packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/ | head -20
echo "---"
echo "Count of cassette files:"
find ./packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/ -type f | wc -l
echo "---"
echo "Check for test_finish_reasons cassettes specifically:"
find ./packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/ -name "*finish_reason*" | head -10Repository: traceloop/openllmetry
Length of output: 1122
🏁 Script executed:
# Verify the conftest.py has proper filter configuration
cat -n ./packages/opentelemetry-instrumentation-openai/tests/traces/conftest.py | head -30Repository: traceloop/openllmetry
Length of output: 535
🏁 Script executed:
# Double-check: is there a test_finish_reasons directory?
ls -la ./packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/ | grep -i finish
# Also check if cassettes directory exists in the test directory itself
find ./packages/opentelemetry-instrumentation-openai/tests/traces -type d -name "cassettes" -o -type d -name "*finish*"
# Count total test methods in test_finish_reasons.py
grep -c "def test_" ./packages/opentelemetry-instrumentation-openai/tests/traces/test_finish_reasons.py || echo "File not found"
# Check if the test file exists
ls -la ./packages/opentelemetry-instrumentation-openai/tests/traces/test_finish_reasons.py 2>/dev/null || echo "test_finish_reasons.py not found"Repository: traceloop/openllmetry
Length of output: 272
🏁 Script executed:
# Check the VCR record mode configuration in conftest.py (if set)
grep -n "record_mode\|vcr_config" ./packages/opentelemetry-instrumentation-openai/tests/traces/conftest.py
# Check if there's pytest.ini or pyproject.toml VCR settings
cat ./packages/opentelemetry-instrumentation-openai/tests/traces/pytest.ini 2>/dev/null || echo "No pytest.ini"Repository: traceloop/openllmetry
Length of output: 98
🏁 Script executed:
# List all cassette subdirectories to see structure
ls -1 ./packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/
echo "---"
# Check if cassettes were recorded for any test in test_finish_reasons
find ./packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/ -type f -name "*.yaml" -o -name "*.json" | xargs grep -l "finish.reason\|finish_reason" | head -5 || echo "No cassettes found with finish_reason"
echo "---"
# Check when cassettes were last modified to see if they're recent
find ./packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/ -type f -printf '%T@ %Tc %p\n' | sort -n | tail -5Repository: traceloop/openllmetry
Length of output: 1811
🏁 Script executed:
# Check test_finish_reasons.py to see what the test decorator says and cassette strategy
head -50 ./packages/opentelemetry-instrumentation-openai/tests/traces/test_finish_reasons.py
echo "---"
# Check if there's a test_finish_reasons cassette directory at all
find ./packages/opentelemetry-instrumentation-openai/tests/traces -name "*finish_reasons*" -type dRepository: traceloop/openllmetry
Length of output: 1943
🏁 Script executed:
# Check the cassettes directory - does test_finish_reasons have any cassettes?
test -d ./packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_finish_reasons && echo "Directory exists" || echo "test_finish_reasons cassette directory DOES NOT EXIST"
# Count VCR-marked tests in test_finish_reasons.py
grep -c "@pytest.mark.vcr" ./packages/opentelemetry-instrumentation-openai/tests/traces/test_finish_reasons.py
# Check git status to see if cassettes were added
git -C ./packages/opentelemetry-instrumentation-openai status --short tests/traces/cassettes/ 2>/dev/null | head -20 || echo "Git status unavailable"Repository: traceloop/openllmetry
Length of output: 121
🏁 Script executed:
# Check if there's a record_mode setting anywhere that might affect VCR behavior
grep -r "record_mode\|vcr.*once\|vcr.*new_episodes" ./packages/opentelemetry-instrumentation-openai/tests/ 2>/dev/null | head -10 || echo "No record_mode found"
# Verify the conftest fixtures are being used properly
grep -A5 "def openai_client" ./packages/opentelemetry-instrumentation-openai/tests/conftest.py | head -15Repository: traceloop/openllmetry
Length of output: 149
🏁 Script executed:
# Verify the conftest filter configuration is complete
cat ./packages/opentelemetry-instrumentation-openai/tests/traces/conftest.py
echo "---"
# Check if cassettes need to be recorded by looking at git diff or status
git -C ./packages/opentelemetry-instrumentation-openai diff --name-only 2>/dev/null | grep -i cassette || echo "No cassette changes in diff"
echo "---"
# Check if test_finish_reasons.py was just added
git -C ./packages/opentelemetry-instrumentation-openai status --short 2>/dev/null | grep finish_reason || echo "File status unclear"Repository: traceloop/openllmetry
Length of output: 479
🏁 Script executed:
# Confirm what happens when pytest runs - check if there's a record_mode default
python3 -c "
import os
os.chdir('./packages/opentelemetry-instrumentation-openai/tests/traces')
# Check if pytest-vcr is configured anywhere
with open('conftest.py', 'r') as f:
content = f.read()
if 'record_mode' in content:
print('record_mode found in conftest')
else:
print('No explicit record_mode - will use default (usually once)')
" 2>/dev/null || echo "Cannot check record_mode"Repository: traceloop/openllmetry
Length of output: 122
Cassettes for test_finish_reasons.py must be recorded before merging.
The 16 new VCR-marked tests in test_finish_reasons.py have no corresponding cassette files in ./packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_finish_reasons/. With the default record_mode=once, these tests cannot playback in CI and will fail.
The VCR filter configuration in conftest.py is properly configured with filter_headers for authorization, api-key, openai-organization, openai-project, set-cookie, and x-request-id. However, the cassettes themselves must be recorded. Record them locally using a valid OpenAI API key in an environment variable, and commit the cassettes to verify they contain no secrets and are properly scrubbed.
| def test_finish_reason_defaults_to_empty_string_when_missing( | ||
| self, instrument_legacy, span_exporter, openai_client | ||
| ): | ||
| """When provider doesn't return finish_reason, default to empty string.""" | ||
| # This test may need a mocked response where finish_reason is None | ||
| openai_client.chat.completions.create( | ||
| model="gpt-3.5-turbo", | ||
| messages=[{"role": "user", "content": "Hello"}], | ||
| ) | ||
|
|
||
| spans = span_exporter.get_finished_spans() | ||
| span = spans[0] | ||
| output_messages = get_output_messages(span) | ||
|
|
||
| # Even if provider returns None, output message should have "" | ||
| assert len(output_messages) > 0 | ||
| assert "finish_reason" in output_messages[0] | ||
| assert isinstance(output_messages[0]["finish_reason"], str) | ||
|
|
There was a problem hiding this comment.
This test does not assert the empty-string fallback it describes.
Right now it passes for any string value; it should assert "" against a controlled response where provider finish_reason is None.
Proposed assertion tightening
assert len(output_messages) > 0
assert "finish_reason" in output_messages[0]
- assert isinstance(output_messages[0]["finish_reason"], str)
+ assert output_messages[0]["finish_reason"] == ""🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@packages/opentelemetry-instrumentation-openai/tests/traces/test_finish_reasons.py`
around lines 335 - 353, The test
test_finish_reason_defaults_to_empty_string_when_missing currently only checks
that finish_reason is a string; update it to mock a provider response where
finish_reason is None (when calling openai_client.chat.completions.create) and
then assert the traced output message's finish_reason equals the empty string
""; locate this test and adjust the setup to inject a controlled response with
finish_reason: None, call get_output_messages(span) and replace the loose
isinstance/string assertion with assert output_messages[0]["finish_reason"] ==
"" to confirm the empty-string fallback in the tracing logic.
| def test_streaming_deduplicates_finish_reasons( | ||
| self, instrument_legacy, span_exporter, openai_client | ||
| ): | ||
| """Streaming with multiple choices should deduplicate finish_reasons.""" | ||
| response = openai_client.chat.completions.create( | ||
| model="gpt-3.5-turbo", | ||
| messages=[{"role": "user", "content": "Tell me a joke"}], | ||
| n=2, # Request 2 completions | ||
| ) | ||
|
|
There was a problem hiding this comment.
test_streaming_deduplicates_finish_reasons is not testing streaming code.
The test name/docstring says streaming, but Line 363-367 call is non-streaming (stream=True is missing), so this misses the streaming accumulation path.
Proposed fix
response = openai_client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Tell me a joke"}],
n=2, # Request 2 completions
+ stream=True,
)
+ for _ in response:
+ pass
+
spans = span_exporter.get_finished_spans()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@packages/opentelemetry-instrumentation-openai/tests/traces/test_finish_reasons.py`
around lines 359 - 368, The test test_streaming_deduplicates_finish_reasons is
currently invoking a non-streaming call; change the invocation of
openai_client.chat.completions.create in that test to use stream=True and
consume the returned stream (iterator/generator) so the streaming
accumulation/deduplication path is exercised (i.e., call
openai_client.chat.completions.create(..., stream=True) and iterate through the
stream to build/collect the final response), ensuring the assertions remain
against the streamed result.
b4198cc to
4f5a25e
Compare
OpenAI, Anthropic, LangChain
Aligns all packages with the Bedrock, to follow OTel genAI convention: when finish_reason is unknown/None, use "" (empty string) instead of fabricating "stop" (OpenAI) or omitting the key entirely (Anthropic, LangChain).
or "stop"→or ""or "stop"→or ""if mapped→ always set with""fallback (3 spots)if fr→ always includefinish_reasonwith""fallbackSummary by CodeRabbit
Bug Fixes
Tests