Write Slic Pong frames in the background and cap queued Pongs#4764
Merged
Conversation
Since icerpc#4578, the read frames loop awaits the Pong frame write that responds to a Ping frame. This write acquires _writeSemaphore, which another writer can hold for an arbitrarily long time while parked on FlushAsync when the outbound pipe reaches PauseWriterThreshold (peer not reading). While the loop is blocked on the semaphore, no further frames are read and the idle timeout cannot fire, since the idle abort is only armed during ReadAsync: the connection hangs until it's disposed. The Pong frame is now written by a background task, so the read frames loop keeps processing incoming frames. To prevent a peer from queuing Pong writes without limit by flooding the connection with Ping frames, the new SlicTransportOptions.MaxOutstandingPongs option (default: 100) bounds the number of Pong frames queued for sending; the connection is aborted when a Ping frame is received while this limit is reached. Fixes icerpc/icerpc-csharp-audit#17.
There was a problem hiding this comment.
Pull request overview
This PR addresses a Slic transport hang where the read-frames loop could block behind a Pong write (contending on _writeSemaphore while another write is stalled on FlushAsync due to PauseWriterThreshold). It moves Pong writes off the read loop into a background task and adds a configurable bound on queued Pong responses to prevent unbounded Ping→Pong backpressure.
Changes:
- Write Pong frames in the background so the read loop (and idle timeout) can continue even when writes are stalled.
- Add
SlicTransportOptions.MaxOutstandingPongs(default 100, min 1) to cap queued Pong responses; exceeding the cap aborts the connection with a protocol error. - Add new tests covering the non-blocking read loop behavior, abort-on-cap behavior, and option default/validation.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| tests/IceRpc.Tests/Transports/Slic/SlicTransportTests.cs | Adds option tests and end-to-end regression tests reproducing the stalled-write scenario and verifying capped Pong queuing behavior. |
| src/IceRpc/Transports/Slic/SlicTransportOptions.cs | Introduces MaxOutstandingPongs with default/validation and API docs. |
| src/IceRpc/Transports/Slic/Internal/SlicConnection.cs | Implements background Pong writing and enforces the max-outstanding-pongs limit. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
pepone
commented
Jun 12, 2026
…irst A 10 s WaitAsync would fail the test before CI's --blame-hang-timeout (60 s) detects the hang and collects dumps, losing the hang state. With a 2 min margin the hang detection fires first; the WaitAsync timeout remains as a fallback for runs without hang detection.
- Throw IceRpcException (IceRpcError.IceRpcError) instead of InvalidDataException when MaxOutstandingPongs is reached: exceeding this limit is a resource-limit violation like exceeding the flow control window, not malformed data. - Check the outstanding pong count before incrementing it so the count stays exact on the abort path. - Rethrow unexpected exceptions from the background pong write so they are not swallowed in release builds. - Reword the MaxOutstandingPongs doc: the option protects against a peer that doesn't read but keeps sending Ping frames.
Writing the Pong frame in the background (with connection-closed exceptions swallowed) also fixes icerpc/icerpc-csharp-audit#29: a Ping frame received after a local CloseAsync no longer faults the read frames loop, which previously made the graceful close fail with ConnectionAborted.
pepone
commented
Jun 17, 2026
pepone
commented
Jun 17, 2026
pepone
commented
Jun 17, 2026
Comment on lines
+1320
to
+1325
| if (Interlocked.Increment(ref _outstandingPongCount) > _maxOutstandingPongs) | ||
| { | ||
| throw new IceRpcException( | ||
| IceRpcError.IceRpcError, | ||
| $"Received a {nameof(FrameType.Ping)} frame while {_maxOutstandingPongs} {nameof(FrameType.Pong)} frames are already queued for sending."); | ||
| } |
Member
There was a problem hiding this comment.
And this all fine. We never read again _outstandingPongCount in the abort path.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes icerpc/icerpc-csharp-audit#17.
Fixes icerpc/icerpc-csharp-audit#29.
Since #4578, the Slic read frames loop awaits the Pong frame write that responds to a Ping frame. This write acquires
_writeSemaphore, which another writer can hold for an arbitrarily long time while parked onFlushAsyncwhen the connection's outbound pipe reachesPauseWriterThreshold(peer not reading). While the loop is blocked on the semaphore, no further frames are read and the idle timeout cannot fire, since the idle abort is only armed duringReadAsync: the connection hangs until it's disposed.Changes
SlicTransportOptions.MaxOutstandingPongsoption (default: 100, minimum: 1) that bounds the number of Pong frames queued for sending. The connection is aborted with a protocol error when a Ping frame is received while this limit is reached, so a peer can't queue Pong writes without limit by flooding the connection with Ping frames.Tests
Read_frames_loop_is_not_blocked_by_pong_write_when_a_write_is_parked_on_pipe_threshold: parks a server stream write on the connection-level pipe pause (smallPauseWriterThresholdplus a held duplex write) and verifies, using a raw duplex client, that the server still reads frames and accepts a new stream after receiving a Ping frame. Verified that this test fails (the accept times out) without theSlicConnectionfix.Connection_is_aborted_when_max_outstanding_pongs_is_exceeded: withMaxOutstandingPongs = 3and a parked write, a fourth Ping frame aborts the connection withReceived a Ping frame while 3 Pong frames are already queued for sending.Default value and validation tests for the new option.
Ping_received_after_local_close_does_not_fail_graceful_close: writing the Pong frame in the background (with connection-closed exceptions swallowed) also fixes audit C# Ice/ami test failure on master (4.0) #29 — a Ping frame received after a local CloseAsync no longer faults the read frames loop. Verified that this test fails without the fix with the exact audit C# Ice/ami test failure on master (4.0) #29 failure mode (the graceful close throwsIceRpcExceptionfrom the Pong write).