Skip to content

fix(server): resync a lagged consumer in-band instead of silently dropping VT#134

Open
phall1 wants to merge 1 commit into
mainfrom
phux-resync-on-lag
Open

fix(server): resync a lagged consumer in-band instead of silently dropping VT#134
phall1 wants to merge 1 commit into
mainfrom
phux-resync-on-lag

Conversation

@phall1

@phall1 phall1 commented Jun 16, 2026

Copy link
Copy Markdown
Owner

The last known corruption path from the TUI diagnosis (Rank 1). Closes the "stays mangled under heavy output" case.

Root cause

Each attached client has a per-pane output pump forwarding the actor's broadcast of VT chunks. The broadcast buffer is bounded (DEFAULT_OUTPUT_BROADCAST = 256); under a sustained output burst (a full-screen TUI repainting on attach/split, a log flood) a pump that falls behind gets RecvError::Lagged(n) — those n chunks are gone. The arm only warn!ed, so the client's libghostty mirror stayed permanently diverged until some unrelated resize/reattach happened to resync.

Fix

On Lagged, the pump asks the actor to broadcast an in-band resync — a full grid snapshot (PaneOutput::Resync) on the same ordered broadcast channel. Because it rides the same channel it lands after the post-lag tail and cleanly supersedes the gap:

  • no double-apply — the snapshot is a full-grid reset (DECSTR + ED2 + replay), not an additive delta;
  • no lost output — unlike a point-to-point snapshot request, which would race the still-buffered deltas and either double-apply or drop them.

Mechanism: a new ResizeRequest::resync_only flag tells the actor to skip the resize and only run its existing debounced resync broadcast (broadcast_resync, renamed from broadcast_resync_after_resize since it now serves both resize and lag recovery). The debounce coalesces a burst of lag events into one snapshot. try_send failure is benign (a resync is already queued, or the actor is gone).

Relationship to #132

#132 made a dropped-byte divergence self-heal on the next resync; this triggers that resync immediately on the drop instead of waiting for an unrelated event. Together they close the path.

Testing

  • New resync_only_request_rebroadcasts_snapshot_without_resizing: the snapshot is re-broadcast and the grid size stays unchanged (the ignored 0×0 geometry must not take effect).
  • Existing resize-resync + coalescing tests still pass.

just ci green.

🤖 Generated with Claude Code

…pping VT

The last known corruption path from the TUI diagnosis (Rank 1). Each attached
client has a per-pane output pump that forwards the actor's `broadcast` of VT
chunks. The broadcast buffer is bounded (DEFAULT_OUTPUT_BROADCAST = 256); under
a sustained output burst (a full-screen TUI repainting on attach/split, a
flood of logs) a pump that falls behind gets `RecvError::Lagged(n)` — the n
chunks are gone. The arm only `warn!`ed, so the client's libghostty mirror
stayed permanently diverged until some unrelated resize/reattach happened to
resync. Silent, unrecoverable screen corruption on exactly the bursty events
users hit.

Fix: on `Lagged`, the pump asks the actor to broadcast an in-band resync — a
full grid snapshot (`PaneOutput::Resync`) on the *same* ordered broadcast
channel. Because it rides the same channel, it lands in the pump's receiver
after the post-lag tail and cleanly supersedes the gap: no double-apply (the
snapshot is a full-grid reset, not an additive delta) and no lost output
(unlike a point-to-point snapshot, which would race the buffered deltas).

Mechanism: a new `ResizeRequest::resync_only` flag tells the actor to skip the
resize and only run its existing debounced resync broadcast
(`broadcast_resync`, renamed from `broadcast_resync_after_resize` since it now
serves both callers). The debounce coalesces a burst of lag events into one
snapshot. `try_send` failure is benign (a resync is already queued, or the
actor is gone).

Tested: new `resync_only_request_rebroadcasts_snapshot_without_resizing`
asserts the snapshot is re-broadcast and the grid size is unchanged (the
ignored 0x0 geometry must not take effect).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant