Skip to content

Fix Slic stream lifecycle bugs triggered by canceled writes#4765

Open
pepone wants to merge 3 commits into
icerpc:mainfrom
pepone:fix-audit-25-26
Open

Fix Slic stream lifecycle bugs triggered by canceled writes#4765
pepone wants to merge 3 commits into
icerpc:mainfrom
pepone:fix-audit-25-26

Conversation

@pepone

@pepone pepone commented Jun 12, 2026

Copy link
Copy Markdown
Member

Fixes icerpc/icerpc-csharp-audit#25.
Fixes icerpc/icerpc-csharp-audit#26.

Two bugs left a Slic connection degraded when an invocation was canceled at the wrong time:

  • Stream-count permit leak (audit Encoding not checked when resolving a well-known proxy from the locator cache #25): a stream created by CreateStreamAsync acquires a stream-count semaphore permit that was only released for started streams (TrySetStateReleaseStream, gated on IsStarted). A stream abandoned before its first write — e.g. invocation canceled while waiting on payload data or send credit — burned its permit forever; after MaxBidirectionalStreams/MaxUnidirectionalStreams such leaks, every new CreateStreamAsync on an otherwise healthy connection blocked forever. The permit is now released when an unstarted local stream reaches its terminal state (new SlicConnection.ReleaseUnstartedStream).

  • _writesClosePending never rolled back (audit idempotent and tuple return type #26): SlicStream.WriteStreamFrameAsync set _writesClosePending before the endStream write and never rolled it back when that write failed before the StreamLast frame was queued (cancellation while blocked on send credit or the write semaphore). The later Complete(exception) on the stream output was then suppressed by the CloseWrites guard: neither StreamLast nor StreamWritesClosed was sent, and the peer hung reading the payload until connection closure. _writesClosePending is now rolled back when the endStream write fails before the StreamLast frame was queued — detected via _writesClosedTcs, which WroteLastStreamFrame completes exactly when the frame is queued — so CloseWrites sends the StreamWritesClosed frame and the peer observes TruncatedData. The bundled _closeReadsOnWritesClosure is preserved as well since it is never cleared on this path.

Tests

  • Completing_unstarted_stream_releases_the_stream_count_permit (bidirectional and unidirectional): with max streams = 1, completes an unstarted stream and verifies a new stream can still be created.
  • Canceled_end_stream_write_does_not_lose_the_writes_closed_frame: cancels an endStream write blocked on send credit, completes the output with an exception, and verifies the peer observes TruncatedData instead of waiting for data forever.
  • End_stream_write_canceled_after_frame_queued_preserves_graceful_peer_reads: pins the opposite case — when the StreamLast frame was already queued (write parked on the connection-level pipe pause), no StreamWritesClosed frame is sent after it and the peer reads all the data with a graceful end of stream. This guards against an over-eager rollback; it passes before and after the fix.

Verified the first three fail without the fix (all time out on their 2 min fallback guards); the WaitAsync margins follow the convention that CI's --blame-hang-timeout (60 s) detects and dumps a hang first.

Full IceRpc.Tests suite passes (991 passed / 0 failed), dotnet build and dotnet format --verify-no-changes clean.

Copilot AI review requested due to automatic review settings June 12, 2026 13:52

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses two Slic transport lifecycle bugs that could leave an otherwise healthy multiplexed connection in a degraded state when an invocation is canceled at specific points (permit leak for unstarted streams; suppressed writes-closure notification when an endStream write is canceled before StreamLast is queued).

Changes:

  • Release stream-count semaphore permits for local streams that are created but never started (unstarted streams) once they reach the terminal state.
  • Roll back _writesClosePending when an endStream write fails before the StreamLast frame is queued, allowing CloseWrites to still send StreamWritesClosed.
  • Add Slic transport tests covering the above cancellation/cleanup scenarios.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
tests/IceRpc.Tests/Transports/Slic/SlicTransportTests.cs Adds regression tests for unstarted-stream permit release and canceled endStream write closure signaling.
src/IceRpc/Transports/Slic/Internal/SlicStream.cs Rolls back _writesClosePending on failed endStream writes; releases permits for unstarted local streams once fully closed.
src/IceRpc/Transports/Slic/Internal/SlicConnection.cs Adds ReleaseUnstartedStream to return the stream-count semaphore permit for unstarted local streams.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/IceRpc.Tests/Transports/Slic/SlicTransportTests.cs
@bernardnormier bernardnormier self-requested a review June 16, 2026 18:17
Comment thread src/IceRpc/Transports/Slic/Internal/SlicConnection.cs Outdated
Comment thread src/IceRpc/Transports/Slic/Internal/SlicStream.cs Outdated
@bernardnormier bernardnormier self-requested a review June 17, 2026 14:31
pepone added 3 commits June 17, 2026 16:48
Fixes icerpc/icerpc-csharp-audit#25.
Fixes icerpc/icerpc-csharp-audit#26.

Two bugs left a Slic connection degraded when an invocation was canceled
at the wrong time:

- A stream created by CreateStreamAsync acquires a stream-count
  semaphore permit that was only released for started streams. A stream
  abandoned before its first write (e.g. invocation canceled while
  waiting on payload data or send credit) burned its permit forever;
  after MaxBidirectionalStreams or MaxUnidirectionalStreams such leaks,
  every new CreateStreamAsync on the connection blocked forever. The
  permit is now released when an unstarted local stream reaches its
  terminal state.

- SlicStream.WriteStreamFrameAsync set _writesClosePending before
  writing the endStream frame and never rolled it back when that write
  failed before the StreamLast frame was queued (cancellation while
  blocked on send credit or the write semaphore). The later
  Complete(exception) on the stream output was then suppressed by the
  CloseWrites guard: neither StreamLast nor StreamWritesClosed was sent
  and the peer hung reading the payload until connection closure.
  _writesClosePending is now rolled back when the endStream write fails
  before the StreamLast frame was queued, so CloseWrites sends the
  StreamWritesClosed frame and the peer observes TruncatedData.
@pepone pepone force-pushed the fix-audit-25-26 branch from 093708a to b44e847 Compare June 17, 2026 14:50
Comment on lines +706 to 708
// A stream that was created but never started is always local; only started streams are registered in _streams.
Debug.Assert(stream.IsStarted || !stream.IsRemote);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReleaseStream already handles unstarted streams safely — _streams.Remove is guarded by IsStarted — so we can drop the assert and let the single caller invoke it unconditionally:

Suggested change
// A stream that was created but never started is always local; only started streams are registered in _streams.
Debug.Assert(stream.IsStarted || !stream.IsRemote);
// Only a started stream has an Id and is registered in _streams.

The unconditional call now also runs for an unstarted remote stream (the brief state when AddStream throws at line 1686, before it assigns the Id). That reaches the if (stream.IsRemote) decrement below, which is correct — the remote count is incremented at line 1659/1677 before AddStream — and effectively a no-op: AddStream only throws when _isClosed (line 1025), and once the connection is closed the stream counts are never read again (they only gate accepting new remote streams, line 1653). So the decrement has no observable effect; it just keeps the bookkeeping balanced.

Comment on lines 454 to 461
// The stream reads and writes are closed, it's time to release the stream to either allow creating or
// accepting a new stream.
if (IsStarted)
// accepting a new stream. We don't release an unstarted remote stream (which can briefly exist when
// AddStream fails on a closing connection) since it never acquired any resource. Releasing an unstarted
// local stream returns the stream-count semaphore permit acquired by CreateStreamAsync.
if (IsStarted || !IsRemote)
{
_connection.ReleaseStream(this);
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the call site collapses to an unconditional call (safe for the unstarted-remote case — see the ReleaseStream comment):

Suggested change
// The stream reads and writes are closed, it's time to release the stream to either allow creating or
// accepting a new stream.
if (IsStarted)
// accepting a new stream. We don't release an unstarted remote stream (which can briefly exist when
// AddStream fails on a closing connection) since it never acquired any resource. Releasing an unstarted
// local stream returns the stream-count semaphore permit acquired by CreateStreamAsync.
if (IsStarted || !IsRemote)
{
_connection.ReleaseStream(this);
}
// The stream reads and writes are closed, it's time to release the stream to either allow creating or
// accepting a new stream.
_connection.ReleaseStream(this);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants