Skip to content

Write Slic Pong frames in the background and cap queued Pongs#4764

Merged
pepone merged 8 commits into
icerpc:mainfrom
pepone:fix-audit-17
Jun 17, 2026
Merged

Write Slic Pong frames in the background and cap queued Pongs#4764
pepone merged 8 commits into
icerpc:mainfrom
pepone:fix-audit-17

Conversation

@pepone

@pepone pepone commented Jun 12, 2026

Copy link
Copy Markdown
Member

Fixes icerpc/icerpc-csharp-audit#17.
Fixes icerpc/icerpc-csharp-audit#29.

Since #4578, the Slic read frames loop awaits the Pong frame write that responds to a Ping frame. This write acquires _writeSemaphore, which another writer can hold for an arbitrarily long time while parked on FlushAsync when the connection's outbound pipe reaches PauseWriterThreshold (peer not reading). While the loop is blocked on the semaphore, no further frames are read and the idle timeout cannot fire, since the idle abort is only armed during ReadAsync: the connection hangs until it's disposed.

Changes

  • The Pong frame is now written by a background task, so the read frames loop keeps processing incoming frames (and the idle timeout keeps working) when writes are slow or stuck.
  • New SlicTransportOptions.MaxOutstandingPongs option (default: 100, minimum: 1) that bounds the number of Pong frames queued for sending. The connection is aborted with a protocol error when a Ping frame is received while this limit is reached, so a peer can't queue Pong writes without limit by flooding the connection with Ping frames.

Tests

  • Read_frames_loop_is_not_blocked_by_pong_write_when_a_write_is_parked_on_pipe_threshold: parks a server stream write on the connection-level pipe pause (small PauseWriterThreshold plus a held duplex write) and verifies, using a raw duplex client, that the server still reads frames and accepts a new stream after receiving a Ping frame. Verified that this test fails (the accept times out) without the SlicConnection fix.

  • Connection_is_aborted_when_max_outstanding_pongs_is_exceeded: with MaxOutstandingPongs = 3 and a parked write, a fourth Ping frame aborts the connection with Received a Ping frame while 3 Pong frames are already queued for sending.

  • Default value and validation tests for the new option.

  • Ping_received_after_local_close_does_not_fail_graceful_close: writing the Pong frame in the background (with connection-closed exceptions swallowed) also fixes audit C# Ice/ami test failure on master (4.0) #29 — a Ping frame received after a local CloseAsync no longer faults the read frames loop. Verified that this test fails without the fix with the exact audit C# Ice/ami test failure on master (4.0) #29 failure mode (the graceful close throws IceRpcException from the Pong write).

Since icerpc#4578, the read frames loop awaits the Pong frame write that
responds to a Ping frame. This write acquires _writeSemaphore, which
another writer can hold for an arbitrarily long time while parked on
FlushAsync when the outbound pipe reaches PauseWriterThreshold (peer not
reading). While the loop is blocked on the semaphore, no further frames
are read and the idle timeout cannot fire, since the idle abort is only
armed during ReadAsync: the connection hangs until it's disposed.

The Pong frame is now written by a background task, so the read frames
loop keeps processing incoming frames. To prevent a peer from queuing
Pong writes without limit by flooding the connection with Ping frames,
the new SlicTransportOptions.MaxOutstandingPongs option (default: 100)
bounds the number of Pong frames queued for sending; the connection is
aborted when a Ping frame is received while this limit is reached.

Fixes icerpc/icerpc-csharp-audit#17.
Copilot AI review requested due to automatic review settings June 12, 2026 09:54

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a Slic transport hang where the read-frames loop could block behind a Pong write (contending on _writeSemaphore while another write is stalled on FlushAsync due to PauseWriterThreshold). It moves Pong writes off the read loop into a background task and adds a configurable bound on queued Pong responses to prevent unbounded Ping→Pong backpressure.

Changes:

  • Write Pong frames in the background so the read loop (and idle timeout) can continue even when writes are stalled.
  • Add SlicTransportOptions.MaxOutstandingPongs (default 100, min 1) to cap queued Pong responses; exceeding the cap aborts the connection with a protocol error.
  • Add new tests covering the non-blocking read loop behavior, abort-on-cap behavior, and option default/validation.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
tests/IceRpc.Tests/Transports/Slic/SlicTransportTests.cs Adds option tests and end-to-end regression tests reproducing the stalled-write scenario and verifying capped Pong queuing behavior.
src/IceRpc/Transports/Slic/SlicTransportOptions.cs Introduces MaxOutstandingPongs with default/validation and API docs.
src/IceRpc/Transports/Slic/Internal/SlicConnection.cs Implements background Pong writing and enforces the max-outstanding-pongs limit.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/IceRpc/Transports/Slic/Internal/SlicConnection.cs
Comment thread src/IceRpc/Transports/Slic/Internal/SlicConnection.cs
Comment thread src/IceRpc/Transports/Slic/Internal/SlicConnection.cs Outdated
Comment thread src/IceRpc/Transports/Slic/Internal/SlicConnection.cs
Comment thread src/IceRpc/Transports/Slic/SlicTransportOptions.cs Outdated
pepone added 3 commits June 12, 2026 12:18
…irst

A 10 s WaitAsync would fail the test before CI's --blame-hang-timeout
(60 s) detects the hang and collects dumps, losing the hang state. With
a 2 min margin the hang detection fires first; the WaitAsync timeout
remains as a fallback for runs without hang detection.
- Throw IceRpcException (IceRpcError.IceRpcError) instead of
  InvalidDataException when MaxOutstandingPongs is reached: exceeding
  this limit is a resource-limit violation like exceeding the flow
  control window, not malformed data.
- Check the outstanding pong count before incrementing it so the count
  stays exact on the abort path.
- Rethrow unexpected exceptions from the background pong write so they
  are not swallowed in release builds.
- Reword the MaxOutstandingPongs doc: the option protects against a
  peer that doesn't read but keeps sending Ping frames.
Writing the Pong frame in the background (with connection-closed
exceptions swallowed) also fixes icerpc/icerpc-csharp-audit#29: a Ping
frame received after a local CloseAsync no longer faults the read frames
loop, which previously made the graceful close fail with
ConnectionAborted.
Comment thread src/IceRpc/Transports/Slic/Internal/SlicConnection.cs Outdated
Comment thread tests/IceRpc.Tests/Transports/Slic/SlicTransportTests.cs Outdated
Comment thread tests/IceRpc.Tests/Transports/Slic/SlicTransportTests.cs
Comment thread src/IceRpc/Transports/Slic/Internal/SlicConnection.cs Outdated

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

Comment on lines +1320 to +1325
if (Interlocked.Increment(ref _outstandingPongCount) > _maxOutstandingPongs)
{
throw new IceRpcException(
IceRpcError.IceRpcError,
$"Received a {nameof(FrameType.Ping)} frame while {_maxOutstandingPongs} {nameof(FrameType.Pong)} frames are already queued for sending.");
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this all fine. We never read again _outstandingPongCount in the abort path.

@bernardnormier bernardnormier left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@pepone pepone merged commit 41a193a into icerpc:main Jun 17, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants