feat: add thinking/reasoning display UI#208
Conversation
Add thinking_start, thinking, and thinking_end ChatAction variants for streaming LLM reasoning tokens. Extend ChatMessageData with role "thinking", durationMs, topic, and startedAt fields. Handle all three actions in the reducer following the chunk_start/chunk/chunk_end pattern.
Create pkg-r/R/thinking.R with ThinkingAccumulator pattern: - new_thinking_state() for mutable state tracking - handle_thinking_chunk() sends thinking_start/thinking actions - end_thinking() computes duration and sends thinking_end - extract_topics() strips <topic> tags and buffers partial tags across chunk boundaries
Add contents_shinychat method for ellmer::ContentThinking that returns a sentinel object. Modify chat_append_stream_impl to detect thinking content, route to handle_thinking_chunk(), and call end_thinking() when transitioning to non-thinking content or at stream end.
Collapsible panel showing LLM reasoning with:
- Animated dots during streaming, auto-collapse on completion
- Topic label display in header during streaming
- Duration display ("Thought for Xs") when complete
- Accessible markup with aria-expanded and role="region"
- CSS custom properties for theming (--shinychat-thinking-*)
- prefers-reduced-motion support
Route messages with role "thinking" to ThinkingDisplay in both the finalized message list (ChatMessages) and the streaming message slot (ChatContainer).
…integration Add ThinkingStartAction, ThinkingAction, ThinkingEndAction to chat types. Create _thinking.py with ThinkingAccumulator (topic tag extraction with chunk boundary buffering) and ThinkingState (timing). Integrate into _append_message_stream to intercept ContentThinking objects, send thinking actions directly, and transition cleanly to content streaming.
On bookmark restore, thinking blocks are replayed instantly so duration_ms will be 0. Show "Thinking" instead of "Thought for less than a second" in that case.
ThinkingDisplay.css was imported via JS but never reached the SCSS build output. Moved styles into chat.scss, replaced text chevron with SVG, and refined hover/spacing for the thinking header button.
The stream loop was sending chunk_start before any content arrived. When thinking_start replaced the streaming message, subsequent chunk actions were dropped because streamingMessage was null after thinking_end. Now chunk_start is deferred until the first non-thinking content, and if no content follows thinking, a remove_loading action re-enables input. Also fixes named vector warning from proc.time() in thinking duration.
Remove server-side thinking state management (ThinkingState, ThinkingAccumulator, topic extraction) from both R and Python backends. Thinking content now flows through the existing chunk pipeline with content_type="thinking". The JS reducer handles state transitions (thinking↔markdown), topic tag extraction with cross-chunk buffering, and duration computation — one implementation instead of three. Removes pkg-r/R/thinking.R and pkg-py/src/shinychat/_thinking.py.
Add smooth expand/collapse animation using CSS grid transitions, debounce topic text updates with a minimum display time to prevent rapid flickering, and respect prefers-reduced-motion.
Fade out the old topic text and fade in the new one using a 200ms opacity transition, preventing jarring text swaps during streaming.
Replace <topic> tags with a styled div instead of stripping them, so topics appear as section markers in the expanded thinking trace. Default style is bold; users can override .shinychat-thinking-topic.
Instead of thinking being a separate message with role "thinking", thinking data is now stored as a field on the assistant message. ThinkingDisplay renders inside ChatMessage above the response content, so it participates in scroll management and is visually part of the assistant response.
Support interleaved thinking blocks by using a single `blocks` array
on ChatMessageData. Each block is either `{ type: "content" }` or
`{ type: "thinking" }`, rendered in order. This allows multiple
thinking-then-content cycles within a single assistant message.
Bump dot size from 4px to 5px and add aspect-ratio: 1 to prevent subpixel rendering from distorting the circles.
Use a single SVG circle with scale+opacity pulse animation (matching the streaming dot style) instead of three staggered opacity dots. Includes a 1s animation delay so the dot sits still briefly before pulsing.
Append remaining topicBuffer content when finalizing thinking blocks and when transitioning from thinking to non-thinking content, preventing silent text loss when a stream ends with a partial topic tag.
Expand PendingMessage tuple and _flush_pending_messages to carry content_type_override through queued chunks, ensuring thinking content renders correctly under concurrent stream conditions.
- Use proper aria-hidden="true" string value instead of boolean - Respect prefers-reduced-motion in useFadingText hook by skipping the fade delay entirely, preventing a confusing 200ms pause - Fix auto-collapse effect to only update prevStreamingRef on actual transitions
…ience) When the connection drops mid-thinking-stream, remove_loading now finalizes the in-flight streaming message rather than discarding it, preserving accumulated thinking content for the user.
Prevents very long thinking content from pushing the rest of the conversation off-screen.
Cover: thinking chunk creation, appending, content type transitions, topic extraction, cross-chunk topic buffering, topicBuffer flush on finalization and transition, multiple thinking/content cycles, remove_loading disconnect resilience, empty chunks, missing startedAt.
Use dvh units for better mobile viewport handling and cap expanded thinking content at 33dvh instead of 60vh to keep response text visible.
|
|
||
| # Chunked messages get accumulated (using this property) before changing state | ||
| self._current_stream_message: str = "" | ||
| self._current_stream_thinking: str = "" |
There was a problem hiding this comment.
I would like to avoid adding this piece of state -- #211 proposes a way of doing that.
Admittedly, the state management stuff in Python has always been problematic from the standpoint of complexity with somewhat questionable upside. By adding more state, we're compounding that problem and also contributing to drift between R and Python.
It's also worth calling out that for nested streams we track a separate piece of "checkpoint" state. I'm pretty sure if we were to keep this approach, we would also need a piece of checkpoint state for thinking as well.
There was a problem hiding this comment.
Another problem I just realized with the current implementation: thinking state never gets reset when the stream is finished.
There was a problem hiding this comment.
Yeah most of this is about me not being sure about the machinery and usage behind _current_stream_message. My more direct take is that I'd rather consider dropping _current_stream_thinking entirely unless it's absolutely required. So my first question in response is really: do we even need it in the first place?
There was a problem hiding this comment.
Okay, I took another look and understand how _current_stream_message operates now. As an aside, I think we should revisit this at some point to move away from accumulating strings into this field, but that's for another time.
In 1b2a514, I updated the internal logic to buffer thinking tokens until thinking is complete and then append them to _current_stream_message when other content is received (or in a finally block if interrupted). This way, we stream tokens to the UI as they arrive, but also keep the current semantics of _current_stream_message.
There was a problem hiding this comment.
So my first question in response is really: do we even need it in the first place?
If we want to properly restore streams that are getting transformed as they're sent out, then yes. I know we'll probably disagree on whether that is an important use case, but I can tell you right now there are real apps doing useful things by transforming the stream (shiny assistant, etc).
Can I follow up by asking what your objection to heading in the direction of #211 would be?
There was a problem hiding this comment.
Do you mean _current_stream_thinking?
Not specifically. I mean more generally any state that is tracking outgoing messages with the intention of restoring them on bookmark.
Does the new approach that appends them to _current_stream_message work?
Mostly yes, but not if the content type is changing mid-stream. Doing that actually wasn't supported before the React migration. Now that we more officially support it, I think we should update the Python state management to mirror how the JS side works.
solve problems locally in shinychat before we need to make changes in chatlas/ellmer.
I think I agree, but can we also agree that the chatlas/ellmer changes are an improvement regardless of shinychat? If so, I think we'll want to wait for those changes to land so that we can detect the right delta class.
There was a problem hiding this comment.
Mostly yes, but not if the content type is changing mid-stream.
That part of this PR does work, though, because we're storing the thinking as <thinking>...</thinking> in _current_stream_message, and those are handled client side. I think this an acceptable trade-off for now. I think in the long run we should stop storing or passing state via strings, but that's not something we should try to thread through this PR.
I think I agree, but can we also agree that the chatlas/ellmer changes are an improvement regardless of shinychat? If so, I think we'll want to wait for those changes to land so that we can detect the right delta class.
I think ContentDelta types would be useful, but they're a big enough feature that they should be developed carefully and independently. I think they'll be an improvement, but from ellmer and chatlas' perspectives the thinking deltas are just one type of delta that we'd want to support. I don't think it's a good idea to commit to a design now that without taking into consideration the other ways that delta types would be used.
Also, because ellmer just released, it will likely be more than a month before delta content classes will land in a usable way in ellmer. So I don't think we should wait for a ContentDeltaThinking class to land upstream.
There was a problem hiding this comment.
I think in the long run we should stop storing or passing state via strings, but that's not something we should try to thread through this PR.
Yeah, agreed. If we do that, it would open a path where the python thinking logic becomes very minimal, while also more generally fixing the "incorrect restoration of mixed content types" problem.
BTW, I appreciate your pragmatic approach, but it would also bother me to keep kicking this pre-existing (& self-inflicted) problem down the road, making it harder to undo. It doesn't necessarily have to block this PR, but I'd at least like to take a look first.
There was a problem hiding this comment.
it would also bother me to keep kicking this pre-existing (& self-inflicted) problem down the road, making it harder to undo. It doesn't necessarily have to block this PR, but I'd at least like to take a look first.
I agree, let's tackle this head-on. But I do think we should thread it separately from this PR. AFAIU, we could merge this PR and then refactor server-side state tracking, so I'd prefer to take that approach. It would also let me release this work, since the refactor would touch only the Python package internals.
There was a problem hiding this comment.
Yep, agreed. I have a very half-baked attempt that needs a bunch more attention over at #213
I was hoping that to find a decently simple way of addressing it, but to do it properly feels like more of a 2-3 day effort rather than a half day effort, so I might drop it momentarily.
FWIW, I think it's fine to punt on this for now, but if and when we get to branching conversations, we'll definitely want to think more deeply about properly storing/restoring.
Replace instance-level `_current_stream_thinking` state with a local `thinking_buffer` in `_append_message_stream`. Thinking chunks now flow through `_append_message_chunk` for UI streaming (skipping accumulation into `_current_stream_message` via a `content_type == "thinking"` guard), and are flushed as a single wrapped `<thinking>...</thinking>` block at the transition to non-thinking content or at stream end. Also adds `content_type` to `ChatMessage` so the wire-format content type is carried on the message itself, removing the need for the `content_type_override` escape hatch on `_append_message_chunk` and `_send_append_message`.
Thinking chunks were being sent to the UI via _append_message_chunk during streaming, then sent a second time wrapped in <thinking> tags when flush_thinking fired at the transition point. flush_thinking now only updates _current_stream_message directly for bookmark/restore without triggering another client-side send.
| content: string | ||
| contentType: ContentType | ||
| streaming: boolean | ||
| /** True for the empty placeholder message shown while waiting for the assistant to respond. */ | ||
| isPlaceholder?: boolean | ||
| icon?: string | ||
| segments?: ContentSegment[] | ||
| blocks: MessageBlock[] |
There was a problem hiding this comment.
Here's another area where I'm now realizing that I introduced a problem that's getting more exposed. That is, there are two levels of content and content type: top level content as well as block level content. I think it'd be worth simplifying this to have just one (block) level of content and building that in at the protocol level. That way we can restore properly and have less confusion around the source of truth.
There was a problem hiding this comment.
Agree this is the right direction. I don't think we should take this on in this PR though — it touches the wire protocol and the restore path, so I'd like to do it as a follow-up where we can think through migration for existing bookmarks.
There was a problem hiding this comment.
Yeah, totally agreed -- this is really just a note to self. Feel free to resolve.
Annotate `_is_content_thinking` as `TypeGuard[chatlas.ContentThinking]` so
type checkers narrow `msg` inside the guard, and drop the now-unneeded
`hasattr("thinking")` fallback. Also fall back to `chatlas._content` when
`chatlas.types` doesn't re-export `ContentThinking` (older chatlas), and
stop wrapping `msg` in a redundant `message_content_chunk()` call before
passing it to `_append_message_chunk` — which already normalizes its
input.
The thinking block's `streaming` flag could fall out of sync with the outer message's `streaming` state in two ways: - `chunk_start` set the message to streaming but left any initial thinking block (created from the start payload) with `streaming: false`. - The explicit-thinking chunk path reused a trailing thinking block unconditionally, even after it had been finalized (`streaming: false`, `durationMs` set) by an intervening content segment. Mark initial thinking blocks as streaming on `chunk_start`, and mirror the tag-detection path by only extending a trailing thinking block when it is still streaming. A new thinking burst after a content segment now starts a fresh block, preserving the prior block's duration.
Newer Shiny reactive `Value`s register an `on_destroy` callback on the active session, which broke tests that use the minimal `_MockSession` helper. Add a no-op stub so `Chat(id=...)` can be constructed inside `session_context(test_session)`.
cpsievert
left a comment
There was a problem hiding this comment.
I just released chatlas v0.17.0 with a new ContentThinkingDelta type since IMO being able to distinguish between partial thought and full thought now is more important than implementing delta classes more generally.
…king Address PR feedback: bump chatlas to >=0.17.0, replace _is_content_thinking with is_thinking_delta using the new ContentThinkingDelta type, and register singledispatch handlers for both ContentThinking (stored turns) and ContentThinkingDelta (stream chunks) in _chat_normalize.
Summary
content_type: "thinking"transport (R/ellmer, Python/chatlas) and client-side<thinking>tag detection (local models, Python bookmark restore)<thinking>tags inside fenced code blocks (preserved as content), empty thinking blocks (hidden), history replay (suppresses flash/duration artifacts), and multiple thinking-response cycles within a single turnExamples
Local thinking models
Models that emit
<thinking>...</thinking>tags (DeepSeek, QwQ, Gemma, etc.) get thinking UI automatically with no server-side changes:Python:
R:
Structured thinking APIs
Models with dedicated reasoning APIs (Claude with extended thinking, OpenAI with reasoning) emit
ContentThinkingobjects that flow through thecontent_type: "thinking"transport:Full demo app (Python) — mock thinking stream for testing
Full demo app (R) — mock thinking stream for testing
Test plan
cd js && npm run lint— TypeScript + ESLint passnpx vitest run— 54 state reducer tests pass<thinking>tags inside fenced code blocks are NOT promoted to thinking UI<thinking>tag wrapper)