Skip to content

canvas: authoritative full-state snapshot sync#52

Merged
Anton-Horn merged 1 commit into
mainfrom
canvas-snapshot-sync
Jun 9, 2026
Merged

canvas: authoritative full-state snapshot sync#52
Anton-Horn merged 1 commit into
mainfrom
canvas-snapshot-sync

Conversation

@Anton-Horn

Copy link
Copy Markdown
Contributor

Problem

Canvas real-time collaboration was unreliable — users drifted out of sync and stayed that way, and the agent looked out of sync too. Root causes, all in the op-stream design:

  • The client hydrated once from REST, then trusted a lossy op stream: outbound sends are silently dropped while the socket isn't OPEN, and nothing re-synced on reconnect (unlike chat, handleJoin never pushed canvas state). Any disconnect blip left a client permanently diverged.
  • The remote-add playback queue silently dropped follow-up ops: an update/delete targeting a shape whose add was still queued hit applyOpToShapes as a no-op and was lost — so a committed text edit never propagated and the shape revealed with its stale "Text" placeholder. Same hole affected moving/recoloring/deleting just-created shapes, and the agent's create-then-set sequences.

Fix — commands in, snapshots out

Replace the op-broadcast model with an authoritative server that broadcasts the entire board as a coalesced canvas.state snapshot on every change and on join. Clients render the snapshot directly, so a missed message, reconnect, or race can never leave anyone diverged — the next snapshot is the truth and overwrites them.

Server (ws.ts)

  • scheduleCanvasBroadcast: coalesced (40 ms trailing) full-state snapshot to the project room; reads the latest doc at fire time, so a burst collapses into one broadcast.
  • applyCanvasOpAndBroadcast applies + persists, then schedules a snapshot. Agent path unchanged (same function).
  • Join pushes the full snapshot immediately → self-healing reconnect.

Client (CanvasPage.tsx)

  • Render = serverShapes (latest snapshot) + a thin local override layer for the shape you're actively manipulating or just committed (keeps your own drag/type instant).
  • Overrides retire once a snapshot confirms them (field-equal), with an 800 ms TTL backstop so a concurrent same-shape edit can't wedge one.
  • Deletes the fragile stack: pending-add queue + drain timer, applyOpToShapes-for-remote, echo-origin filtering, and the reveal/flash/entering animations.

Tradeoffs / notes

  • Divergence is now structurally impossible rather than patched. Net −176 lines in the component.
  • Intended UX losses (also align with the no-animations preference): per-shape reveal playback, change-flash ring, and the agent's shape-to-shape cursor hops. Human presence cursors still work.
  • Single-server assumption (in-memory projectRooms) unchanged — fine for one process; would need pub/sub to scale horizontally.

Verification

tsc --noEmit clean and bun build.ts green. Not yet exercised with two live clients.

Real-time collaboration was unreliable — users drifted out of sync and
stayed that way, and the agent looked out of sync too. Root causes were
all in the op-stream design: the client hydrated once then trusted a
lossy op stream (sends drop while the socket is down, no re-sync on
reconnect), and the remote-add playback queue silently dropped any
update/delete targeting a shape still queued (so committed text never
propagated, just the "Text" placeholder).

Replace the op-broadcast model with an authoritative server that
broadcasts the entire board as a coalesced `canvas.state` snapshot on
every change and on join. Clients render the snapshot directly, so a
missed message, reconnect, or race can never leave anyone diverged —
the next snapshot is the truth and overwrites them.

Server:
- scheduleCanvasBroadcast: coalesced (40ms trailing) full-state
  snapshot to the project room; reads the latest doc at fire time.
- applyCanvasOpAndBroadcast applies+persists then schedules a snapshot
  (agent path unchanged — same function).
- join pushes the full snapshot immediately (self-healing reconnect).

Client:
- render = serverShapes (latest snapshot) + a thin local override layer
  for the shape you're actively manipulating or just committed.
- overrides retire once a snapshot confirms them (field-equal), with an
  800ms TTL backstop so a concurrent same-shape edit can't wedge one.
- deletes the pending-add queue, applyOpToShapes-for-remote, echo-origin
  filtering, and the reveal/flash/entering animations.
@Anton-Horn Anton-Horn merged commit eeb3d52 into main Jun 9, 2026
1 check passed
@Anton-Horn Anton-Horn deleted the canvas-snapshot-sync branch June 9, 2026 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant