Skip to content

fix: cancellation-safe message reading in stream_loop (#5)#6

Merged
vnvo merged 2 commits into
mainfrom
fix/issue-5-cancel-safe-loop
May 8, 2026
Merged

fix: cancellation-safe message reading in stream_loop (#5)#6
vnvo merged 2 commits into
mainfrom
fix/issue-5-cancel-safe-loop

Conversation

@vnvo
Copy link
Copy Markdown
Owner

@vnvo vnvo commented May 8, 2026

WorkerState::stream_loop ran read_backend_message_into inside a tokio::select! racing stop_rx.changed() and `tokio::time::timeout. read_backend_message_into uses read_exact internally, which is not cancellation-safe - when the losing select arm dropped the read future mid-message, partially-read header/payload bytes were lost. The next iteration then mis-parsed the wire stream, typically producing a bogus payload_len that hung waiting for non-existent bytes (or surfaced as a Protocol error).

Add MessageReader, which externalizes partial-read state on the caller and uses one-shot AsyncReadExt::read (cancellation-safe). stream_loop owns a single MessageReader reused across drain and wait phases.

read_backend_message_into is retained for non-select! callers (startup, auth, replication-start) and documented as not cancellation-safe.

Tests:

  • 2 framing-level regression tests drive MessageReader through tokio::time::timeout cancellation mid-header and mid-payload, asserting the resumed read returns the original message intact
  • 86 unit + 9 integration + 16 doctests all pass; clippy clean

On the real BufReader-backed socket path, each read() returns the full
requested slice in one poll, so the gap collapses; WAL streams remain
bounded by Postgres+TCP, not framing.

vnvo added 2 commits May 8, 2026 22:48
WorkerState::stream_loop ran read_backend_message_into inside a
tokio::select! racing stop_rx.changed() and tokio::time::timeout.
read_backend_message_into uses read_exact internally, which is not
cancellation-safe — when the losing select arm dropped the read future
mid-message, partially-read header/payload bytes were lost. The next
iteration then mis-parsed the wire stream, typically producing a bogus
payload_len that hung waiting for non-existent bytes (or surfaced as a
Protocol error).

Add MessageReader, which externalizes partial-read state on the caller
and uses one-shot AsyncReadExt::read (cancellation-safe). stream_loop
owns a single MessageReader reused across drain and wait phases.

read_backend_message_into is retained for non-select! callers (startup,
auth, replication-start) and documented as not cancellation-safe.

Tests:
- 2 framing-level regression tests drive MessageReader through
  tokio::time::timeout cancellation mid-header and mid-payload, asserting
  the resumed read returns the original message intact
- 86 unit + 9 integration + 16 doctests all pass; clippy clean

Benchmarks (in-memory Cursor, worst case — no I/O wait):
  64 B:   2.41 -> 2.13 GiB/s (-11%)
  256 B:  6.44 -> 5.82 GiB/s (-10%)
  1024 B: 17.5 -> 18.3 GiB/s ( +5%)
  4096 B: 27.9 -> 26.3 GiB/s ( -5%)
On the real BufReader-backed socket path, each read() returns the full
requested slice in one poll, so the gap collapses; WAL streams remain
bounded by Postgres+TCP, not framing.
@vnvo vnvo force-pushed the fix/issue-5-cancel-safe-loop branch from 878a1d3 to b4b898e Compare May 8, 2026 20:02
@vnvo vnvo merged commit 47fb52e into main May 8, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant