Description:
When using a provider based on OpenAICompletionsProvider (e.g., ChatDeepSeek, ChatGroq, ChatOpenRouter) with stream(..., content="all"), the model’s reasoning/thinking content is emitted as a ContentThinking object rather than the expected ContentThinkingDelta. This prevents proper handling of thinking boundaries and yields only plain text – the streaming iterator never sees ContentThinkingDelta objects, nor does it trigger the start/end phases.
Steps to reproduce:
import os
from chatlas import ChatDeepSeek
from chatlas.types import ContentThinkingDelta
chat = ChatDeepSeek(api_key=os.getenv("DEEPSEEK_API_KEY"))
stream = chat.stream(
"Explain the concept of recursion in detail, step by step.",
content="all",
echo="none"
)
for chunk in stream:
print(type(chunk), chunk)
Expected behavior:
The generator should yield ContentThinkingDelta objects (with thinking text and phase attributes) for reasoning chunks, and str for regular text.
Actual behavior:
The generator yields ContentThinking objects (not delta) only once after the entire thinking block finishes, and the _chat_impl loop never enters the is_thinking_delta branch – thus no phase="start"/"end" signals are emitted, and no special display occurs.
Root cause:
In _provider_openai_completions.py, the stream_content method returns ContentThinking when reasoning_content is present in the delta:
reasoning = getattr(delta, "reasoning_content", None)
if reasoning is not None:
return ContentThinking(thinking=reasoning) # ❌ should be ContentThinkingDelta
The _chat.py streaming loop only treats objects of type ContentThinkingDelta specially (e.g., to start/end thinking blocks). ContentThinking is not a delta, so it bypasses that logic and is eventually stringified.
For comparison, the correct behavior is implemented in the Responses API provider (_provider_openai.py):
if chunk.type == "response.reasoning_summary_text.delta":
return ContentThinkingDelta(thinking=chunk.delta)
Proposed fix:
Change OpenAICompletionsProvider.stream_content() to return ContentThinkingDelta (with phase "body") for reasoning deltas. To fully match Responses behavior, the provider could also synthesize "start" and "end" phases, but even the minimal change (returning ContentThinkingDelta) would allow downstream consumers to detect thinking content.
from ._content import ContentThinkingDelta
# ...
reasoning = getattr(delta, "reasoning_content", None)
if reasoning is not None:
return ContentThinkingDelta(thinking=reasoning, phase="body")
Affected providers:
ChatDeepSeek
ChatGroq
ChatOpenRouter
ChatHuggingFace (via OpenAICompletionsProvider)
ChatLMStudio
ChatMistral
ChatOllama
- Any other provider that inherits from
OpenAICompletionsProvider
Environment:
- chatlas version: 0.17.0
- Python: 3.12
- Provider: DeepSeek (but same issue for any completions‑based provider with reasoning)
Additional context:
Related issues/PRs: #294, #297, #299 (discussions about thinking/streaming). This specific bug is not yet addressed.
Thank you for maintaining chatlas! I can help test a patch if needed.
Description:
When using a provider based on
OpenAICompletionsProvider(e.g.,ChatDeepSeek,ChatGroq,ChatOpenRouter) withstream(..., content="all"), the model’s reasoning/thinking content is emitted as aContentThinkingobject rather than the expectedContentThinkingDelta. This prevents proper handling of thinking boundaries and yields only plain text – the streaming iterator never seesContentThinkingDeltaobjects, nor does it trigger the start/end phases.Steps to reproduce:
Expected behavior:
The generator should yield
ContentThinkingDeltaobjects (withthinkingtext andphaseattributes) for reasoning chunks, andstrfor regular text.Actual behavior:
The generator yields
ContentThinkingobjects (not delta) only once after the entire thinking block finishes, and the_chat_implloop never enters theis_thinking_deltabranch – thus nophase="start"/"end"signals are emitted, and no special display occurs.Root cause:
In
_provider_openai_completions.py, thestream_contentmethod returnsContentThinkingwhenreasoning_contentis present in the delta:The
_chat.pystreaming loop only treats objects of typeContentThinkingDeltaspecially (e.g., to start/end thinking blocks).ContentThinkingis not a delta, so it bypasses that logic and is eventually stringified.For comparison, the correct behavior is implemented in the Responses API provider (
_provider_openai.py):Proposed fix:
Change
OpenAICompletionsProvider.stream_content()to returnContentThinkingDelta(with phase"body") for reasoning deltas. To fully match Responses behavior, the provider could also synthesize"start"and"end"phases, but even the minimal change (returningContentThinkingDelta) would allow downstream consumers to detect thinking content.Affected providers:
ChatDeepSeekChatGroqChatOpenRouterChatHuggingFace(viaOpenAICompletionsProvider)ChatLMStudioChatMistralChatOllamaOpenAICompletionsProviderEnvironment:
Additional context:
Related issues/PRs: #294, #297, #299 (discussions about thinking/streaming). This specific bug is not yet addressed.
Thank you for maintaining chatlas! I can help test a patch if needed.