Skip to content

fix(audience-http): HTTP transport and flush hardening (SDK-234)#701

Merged
ImmutableJeffrey merged 11 commits into
mainfrom
feat/audience-http-hardening
Apr 24, 2026
Merged

fix(audience-http): HTTP transport and flush hardening (SDK-234)#701
ImmutableJeffrey merged 11 commits into
mainfrom
feat/audience-http-hardening

Conversation

@ImmutableJeffrey

@ImmutableJeffrey ImmutableJeffrey commented Apr 23, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Locks HttpTransport backoff state behind _backoffLock. Readers (IsInBackoffWindow, NextAttemptAt, BackoffMs, RescheduleSendTimer) and writers (RecordFailure, ResetBackoff) all go through it.
  • Stops HttpTransport.HttpClient from disposing a consumer-supplied handler (disposeHandler: false). Matches the existing _controlClient convention.
  • Surfaces partial rejection: SendBatchAsync parses {accepted, rejected} from the 2xx body and fires OnError(ValidationRejected, ...) with the count when rejected > 0. Malformed body falls through as zero-rejected so a diagnostic parse error never blocks the success path.
  • Gates FlushAsync on the existing _sendInFlight flag (the same gate the timer tick uses). Two parallel callers no longer double-POST the same batch; the second polls via Task.Yield until the first releases.
  • Adds optional CancellationToken cancellationToken = default to FlushAsync (no behaviour change for existing callers) and catches ObjectDisposedException thrown when a concurrent Shutdown disposes the transport mid-flush.
  • Adds 4 tests — concurrent-flush serialisation, partial-rejection warning on 200 with rejected > 0, zero-rejected silence, malformed-body fall-through.

Linear: SDK-234


Note

Medium Risk
Touches the SDK’s event flushing/network send path, adding new concurrency and cancellation behavior plus backoff locking; regressions could affect delivery, retries, or shutdown behavior under load.

Overview
Hardens event flushing and HTTP transport behavior to avoid duplicate sends and improve observability.

ImmutableAudience.FlushAsync now accepts an optional CancellationToken, serializes concurrent flushes using the existing _sendInFlight gate (preventing double-POST of the same on-disk batch), and exits cleanly if a concurrent Shutdown disposes the transport.

HttpTransport now (1) locks all backoff state reads/writes, (2) avoids disposing a consumer-provided HttpMessageHandler, (3) surfaces partial 2xx acceptance by parsing rejected from the response body and firing OnError(ValidationRejected, ...), and (4) rethrows caller-initiated cancellation so flush loops don’t hot-spin. Tests were added to cover concurrent flush serialization, cancellation propagation, and the new 200-with-rejected response handling.

Reviewed by Cursor Bugbot for commit b588bdf. Bugbot is set up for automated code reviews on this repo. Configure here.

@ImmutableJeffrey ImmutableJeffrey requested review from a team as code owners April 23, 2026 04:54
@ImmutableJeffrey ImmutableJeffrey force-pushed the feat/audience-http-hardening branch 3 times, most recently from 1a20626 to 05e9f33 Compare April 23, 2026 05:22
Comment thread src/Packages/Audience/Runtime/ImmutableAudience.cs
ImmutableJeffrey added a commit that referenced this pull request Apr 23, 2026
Addresses PR #701 review from @nattb8.

The catch-filter branch for caller cancellation used to silently swallow
the exception and let the method return `true` — its normal "batch sent,
ask me again" signal. FlushAsync's send loop takes that return value at
face value and immediately re-enters on the same cancelled token.
HttpClient throws on entry (token still cancelled), the same branch
swallows it, `true` is returned again. The batch is never deleted on
this path, so ReadBatch keeps handing back the same events — a tight
infinite loop.

Rethrow instead. The caller's send loop exits via the exception; no
behaviour change for HttpClient's internal timeout path (still filtered
out by the `when (ct.IsCancellationRequested)` guard) so timeouts still
trigger backoff.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ImmutableJeffrey added a commit that referenced this pull request Apr 23, 2026
Two regression guards for PR #701 review from @nattb8.

SendBatchAsync_CallerCancelled_Throws: pre-cancel the token, confirm
the method throws OperationCanceledException, confirm the batch stays
on disk, confirm no backoff and no onError. Sabotage: re-add the empty
catch body and this fails because SendBatchAsync returns true silently.

FlushAsync_CancelledToken_Terminates_DoesNotHotLoop: pre-cancel the
token, start FlushAsync, race against a 2s timeout. With the fix the
task faults quickly; without it the task never completes. Also flips
the handler to 200 and runs a follow-up FlushAsync to prove
_sendInFlight was released (the finally block didn't get stranded).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
nattb8
nattb8 previously approved these changes Apr 23, 2026
ImmutableJeffrey and others added 11 commits April 24, 2026 08:35
_consecutiveFailures and _nextAttemptAt had no synchronisation.
Readers (RescheduleSendTimer, FlushAsync.IsInBackoffWindow) could
observe torn state while SendBatchAsync wrote them.

Add a dedicated _backoffLock. All reads and writes go through it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
HttpTransport's HttpClient constructor used the default disposeHandler:true,
which meant Shutdown disposed the consumer's HttpMessageHandler. A caller
who shared the handler across Init cycles (tests, proxy config, connection
pooling) saw it fail on the second Init.

Matches _controlClient which already used disposeHandler:false.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per the Unity Implementation Plan §4.6, the backend may return
{accepted, rejected} on a 2xx to signal per-message validation errors.
The old code deleted the batch silently on any 2xx — studios had no
way to see that some events were being dropped.

Parse the response body, surface via OnError(ValidationRejected, ...)
when rejected > 0. The batch is still deleted (rejections are
validation errors; retries would not help). Body parse failures fall
through as zero-rejected — a malformed diagnostic must not block the
success path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pin the I9 fix (commit 09b23cdf):
- 200 with rejected>0: batch deleted, ValidationRejected surfaced via
  onError with the rejected count.
- 200 with rejected=0: onError silent.
- 200 with malformed body: falls through as zero-rejected, batch still
  deleted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two concurrent FlushAsync callers would both call ReadBatch with the
same paths and double-POST. Reuse the timer-tick gate so at most one
SendBatchAsync runs at a time; other callers poll cheaply until the
gate clears.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…isposedException

- Add optional CancellationToken. Caller can cancel the gate-wait and
  the in-flight HTTP send (default is CancellationToken.None — no
  behaviour change for existing callers).
- Catch ObjectDisposedException thrown when a concurrent Shutdown
  disposed the transport mid-flush. Previously, the exception
  propagated to the awaiter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pin the H2-A fix (commit f58260d5): two parallel FlushAsync calls must
not both issue HTTP POSTs against the same on-disk batch.

BlockingHandler blocks in SendAsync until released. First FlushAsync
enters; second starts and must wait on the gate; RequestCount stays at 1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rewrites jargon (2xx server acked, serialise, diagnostic must not
block the success path) into plain language.

No code changes. All 184 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses PR #701 review from @nattb8.

The catch-filter branch for caller cancellation used to silently swallow
the exception and let the method return `true` — its normal "batch sent,
ask me again" signal. FlushAsync's send loop takes that return value at
face value and immediately re-enters on the same cancelled token.
HttpClient throws on entry (token still cancelled), the same branch
swallows it, `true` is returned again. The batch is never deleted on
this path, so ReadBatch keeps handing back the same events — a tight
infinite loop.

Rethrow instead. The caller's send loop exits via the exception; no
behaviour change for HttpClient's internal timeout path (still filtered
out by the `when (ct.IsCancellationRequested)` guard) so timeouts still
trigger backoff.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two regression guards for PR #701 review from @nattb8.

SendBatchAsync_CallerCancelled_Throws: pre-cancel the token, confirm
the method throws OperationCanceledException, confirm the batch stays
on disk, confirm no backoff and no onError. Sabotage: re-add the empty
catch body and this fails because SendBatchAsync returns true silently.

FlushAsync_CancelledToken_Terminates_DoesNotHotLoop: pre-cancel the
token, start FlushAsync, race against a 2s timeout. With the fix the
task faults quickly; without it the task never completes. Also flips
the handler to 200 and runs a follow-up FlushAsync to prove
_sendInFlight was released (the finally block didn't get stranded).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SendBatchAsync_CallerCancelled_Throws was asserting the exact type
`OperationCanceledException` via `Assert.ThrowsAsync`. HttpClient
internally catches the OCE our mock throws and rethrows it as
`TaskCanceledException` (a subclass), so `ThrowsAsync` — which is
exact-type — missed.

Switch to `CatchAsync<OperationCanceledException>`, which accepts the
whole cancellation family. This is what we actually want to assert:
"cancellation propagated", not "HttpClient happens to throw this exact
subclass".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ImmutableJeffrey ImmutableJeffrey merged commit 6ba9bb9 into main Apr 24, 2026
20 checks passed
ImmutableJeffrey added a commit that referenced this pull request Apr 24, 2026
Addresses PR #701 review from @nattb8.

The catch-filter branch for caller cancellation used to silently swallow
the exception and let the method return `true` — its normal "batch sent,
ask me again" signal. FlushAsync's send loop takes that return value at
face value and immediately re-enters on the same cancelled token.
HttpClient throws on entry (token still cancelled), the same branch
swallows it, `true` is returned again. The batch is never deleted on
this path, so ReadBatch keeps handing back the same events — a tight
infinite loop.

Rethrow instead. The caller's send loop exits via the exception; no
behaviour change for HttpClient's internal timeout path (still filtered
out by the `when (ct.IsCancellationRequested)` guard) so timeouts still
trigger backoff.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ImmutableJeffrey added a commit that referenced this pull request Apr 24, 2026
Two regression guards for PR #701 review from @nattb8.

SendBatchAsync_CallerCancelled_Throws: pre-cancel the token, confirm
the method throws OperationCanceledException, confirm the batch stays
on disk, confirm no backoff and no onError. Sabotage: re-add the empty
catch body and this fails because SendBatchAsync returns true silently.

FlushAsync_CancelledToken_Terminates_DoesNotHotLoop: pre-cancel the
token, start FlushAsync, race against a 2s timeout. With the fix the
task faults quickly; without it the task never completes. Also flips
the handler to 200 and runs a follow-up FlushAsync to prove
_sendInFlight was released (the finally block didn't get stranded).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ImmutableJeffrey ImmutableJeffrey deleted the feat/audience-http-hardening branch April 24, 2026 01:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants