Skip to content

feat(pkg-py): segment-based message storage for mixed content type streaming#213

Draft
cpsievert wants to merge 10 commits into
mainfrom
segment-based-message-storage
Draft

feat(pkg-py): segment-based message storage for mixed content type streaming#213
cpsievert wants to merge 10 commits into
mainfrom
segment-based-message-storage

Conversation

@cpsievert
Copy link
Copy Markdown
Collaborator

Summary

  • Replaces flat string accumulation (_current_stream_message + _current_stream_deps) with a segment list that preserves per-chunk content type boundaries during streaming
  • Adds segments to the wire protocol (MessagePayload) so the client can faithfully restore mixed-content messages from bookmarks
  • Extracts segment logic into _chat_segments.py with dedicated helpers for append, copy, serialize, and content/dep extraction

Motivation

When an assistant response streams a mix of content types, the Python server previously concatenated everything into a single flat string tagged with one content_type. This worked fine for live streaming (the client tracked segments internally), but broke on bookmark restore: the server had lost the content type boundaries, so it sent the entire message back tagged as the first content type.

The bug specifically surfaces when a message starts with HTML content and then switches to markdown. On restore, the whole message gets tagged as HTML (the first content type), causing the trailing markdown to be rendered as raw HTML instead of being markdown-processed. (The reverse — markdown first followed by HTML — is less visibly broken because HTML passed through a markdown renderer is often passable.)

The deeper issue is that the server's internal state didn't reflect what the client already knew: a message is an ordered list of typed segments, not a flat string with a single type. This PR aligns the two, making bookmark save/restore correct by construction.

This also lays groundwork for an upcoming PR that introduces an additional content type, which will make mixed-type messages more common and the restore bug more noticeable.

Test plan

  • New unit tests for segment accumulation, serialization, bookmark round-trip, nested stream checkpoints, and per-chunk content type on the wire
  • Manual test: stream a mixed markdown+HTML response, bookmark it, restore it, verify rendering matches the original

cpsievert added 3 commits May 8, 2026 14:47
Replace separate _current_stream_message/deps tracking with a unified
list of ContentSegment objects that preserve per-chunk content type
(markdown vs html) and HTML dependencies. This enables correct
round-tripping of mixed-content streams through bookmark save/restore.

Key changes:
- ContentSegment dataclass and StoredContentSegment TypedDict for
  runtime and serialized segment representations
- BookmarkMessageDict includes optional segments for bookmark state
- _restore_bookmark_message replays multi-segment messages as streaming
  sequences to preserve content type boundaries on restore
- _send_append_message accepts explicit content_type to override
  inference from the message object
Add optional `segments` field to MessagePayload on both JS and Python
sides. The JS client uses segments directly when present, falling back
to synthesizing a single segment from content + content_type.

On the Python side, _restore_bookmark_message collapses from a streaming
replay (chunk_start/chunk/chunk_end) to a single message send. Segment
html_deps are hoisted to the envelope. Redundant top-level deps storage
on StoredMessage is removed when segments carry them.
… types

Replace deepcopy with a targeted copy_segments that only copies what's
needed (segment dataclass fields, not HTMLDependency objects). Also break
BookmarkMessageDict inheritance from ChatMessageDict since they have
different dep semantics, and fix the append_to_segments early-return guard.
@cpsievert cpsievert changed the title feat: segment-based message storage for mixed content type streaming feat(pkg-py): segment-based message storage for mixed content type streaming May 8, 2026
@cpsievert cpsievert marked this pull request as draft May 8, 2026 21:48
@cpsievert cpsievert force-pushed the segment-based-message-storage branch from 09bf542 to edd8836 Compare May 8, 2026 21:49
cpsievert added 3 commits May 8, 2026 17:22
StoredMessage now stores only role + segments. Content and html_deps
are computed properties. Wire protocol MessagePayload carries segments
exclusively (no top-level content/content_type). Bookmark format uses
segments with a legacy shim for old bookmarks missing the key.
Remove the `deps` parameter from both methods — deps are now always
attached directly to segments at the point they're created, rather than
threaded through storage helpers.
Update R's chat_append_message to send segments array in chunk_start
and complete message payloads, matching the Python wire format.
Rebuild JS with simplified content model (no top-level contentType).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant