Skip to content

OpenAICompletionsProvider.stream_content() returns ContentThinking instead of ContentThinkingDelta, breaking streaming thinking for DeepSeek/Groq/OpenRouter #300

@Damonsoul

Description

@Damonsoul

Description:

When using a provider based on OpenAICompletionsProvider (e.g., ChatDeepSeek, ChatGroq, ChatOpenRouter) with stream(..., content="all"), the model’s reasoning/thinking content is emitted as a ContentThinking object rather than the expected ContentThinkingDelta. This prevents proper handling of thinking boundaries and yields only plain text – the streaming iterator never sees ContentThinkingDelta objects, nor does it trigger the start/end phases.

Steps to reproduce:

import os
from chatlas import ChatDeepSeek
from chatlas.types import ContentThinkingDelta

chat = ChatDeepSeek(api_key=os.getenv("DEEPSEEK_API_KEY"))

stream = chat.stream(
    "Explain the concept of recursion in detail, step by step.",
    content="all",
    echo="none"
)

for chunk in stream:
    print(type(chunk), chunk)

Expected behavior:

The generator should yield ContentThinkingDelta objects (with thinking text and phase attributes) for reasoning chunks, and str for regular text.

Actual behavior:

The generator yields ContentThinking objects (not delta) only once after the entire thinking block finishes, and the _chat_impl loop never enters the is_thinking_delta branch – thus no phase="start"/"end" signals are emitted, and no special display occurs.

Root cause:

In _provider_openai_completions.py, the stream_content method returns ContentThinking when reasoning_content is present in the delta:

reasoning = getattr(delta, "reasoning_content", None)
if reasoning is not None:
    return ContentThinking(thinking=reasoning)   # ❌ should be ContentThinkingDelta

The _chat.py streaming loop only treats objects of type ContentThinkingDelta specially (e.g., to start/end thinking blocks). ContentThinking is not a delta, so it bypasses that logic and is eventually stringified.

For comparison, the correct behavior is implemented in the Responses API provider (_provider_openai.py):

if chunk.type == "response.reasoning_summary_text.delta":
    return ContentThinkingDelta(thinking=chunk.delta)

Proposed fix:

Change OpenAICompletionsProvider.stream_content() to return ContentThinkingDelta (with phase "body") for reasoning deltas. To fully match Responses behavior, the provider could also synthesize "start" and "end" phases, but even the minimal change (returning ContentThinkingDelta) would allow downstream consumers to detect thinking content.

from ._content import ContentThinkingDelta

# ...
reasoning = getattr(delta, "reasoning_content", None)
if reasoning is not None:
    return ContentThinkingDelta(thinking=reasoning, phase="body")

Affected providers:

  • ChatDeepSeek
  • ChatGroq
  • ChatOpenRouter
  • ChatHuggingFace (via OpenAICompletionsProvider)
  • ChatLMStudio
  • ChatMistral
  • ChatOllama
  • Any other provider that inherits from OpenAICompletionsProvider

Environment:

  • chatlas version: 0.17.0
  • Python: 3.12
  • Provider: DeepSeek (but same issue for any completions‑based provider with reasoning)

Additional context:

Related issues/PRs: #294, #297, #299 (discussions about thinking/streaming). This specific bug is not yet addressed.

Thank you for maintaining chatlas! I can help test a patch if needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions