fix(audience-http): HTTP transport and flush hardening (SDK-234)#701
Merged
Conversation
1a20626 to
05e9f33
Compare
nattb8
reviewed
Apr 23, 2026
ImmutableJeffrey
added a commit
that referenced
this pull request
Apr 23, 2026
Addresses PR #701 review from @nattb8. The catch-filter branch for caller cancellation used to silently swallow the exception and let the method return `true` — its normal "batch sent, ask me again" signal. FlushAsync's send loop takes that return value at face value and immediately re-enters on the same cancelled token. HttpClient throws on entry (token still cancelled), the same branch swallows it, `true` is returned again. The batch is never deleted on this path, so ReadBatch keeps handing back the same events — a tight infinite loop. Rethrow instead. The caller's send loop exits via the exception; no behaviour change for HttpClient's internal timeout path (still filtered out by the `when (ct.IsCancellationRequested)` guard) so timeouts still trigger backoff. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ImmutableJeffrey
added a commit
that referenced
this pull request
Apr 23, 2026
Two regression guards for PR #701 review from @nattb8. SendBatchAsync_CallerCancelled_Throws: pre-cancel the token, confirm the method throws OperationCanceledException, confirm the batch stays on disk, confirm no backoff and no onError. Sabotage: re-add the empty catch body and this fails because SendBatchAsync returns true silently. FlushAsync_CancelledToken_Terminates_DoesNotHotLoop: pre-cancel the token, start FlushAsync, race against a 2s timeout. With the fix the task faults quickly; without it the task never completes. Also flips the handler to 200 and runs a follow-up FlushAsync to prove _sendInFlight was released (the finally block didn't get stranded). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
nattb8
previously approved these changes
Apr 23, 2026
_consecutiveFailures and _nextAttemptAt had no synchronisation. Readers (RescheduleSendTimer, FlushAsync.IsInBackoffWindow) could observe torn state while SendBatchAsync wrote them. Add a dedicated _backoffLock. All reads and writes go through it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
HttpTransport's HttpClient constructor used the default disposeHandler:true, which meant Shutdown disposed the consumer's HttpMessageHandler. A caller who shared the handler across Init cycles (tests, proxy config, connection pooling) saw it fail on the second Init. Matches _controlClient which already used disposeHandler:false. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per the Unity Implementation Plan §4.6, the backend may return
{accepted, rejected} on a 2xx to signal per-message validation errors.
The old code deleted the batch silently on any 2xx — studios had no
way to see that some events were being dropped.
Parse the response body, surface via OnError(ValidationRejected, ...)
when rejected > 0. The batch is still deleted (rejections are
validation errors; retries would not help). Body parse failures fall
through as zero-rejected — a malformed diagnostic must not block the
success path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pin the I9 fix (commit 09b23cdf): - 200 with rejected>0: batch deleted, ValidationRejected surfaced via onError with the rejected count. - 200 with rejected=0: onError silent. - 200 with malformed body: falls through as zero-rejected, batch still deleted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two concurrent FlushAsync callers would both call ReadBatch with the same paths and double-POST. Reuse the timer-tick gate so at most one SendBatchAsync runs at a time; other callers poll cheaply until the gate clears. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…isposedException - Add optional CancellationToken. Caller can cancel the gate-wait and the in-flight HTTP send (default is CancellationToken.None — no behaviour change for existing callers). - Catch ObjectDisposedException thrown when a concurrent Shutdown disposed the transport mid-flush. Previously, the exception propagated to the awaiter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pin the H2-A fix (commit f58260d5): two parallel FlushAsync calls must not both issue HTTP POSTs against the same on-disk batch. BlockingHandler blocks in SendAsync until released. First FlushAsync enters; second starts and must wait on the gate; RequestCount stays at 1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rewrites jargon (2xx server acked, serialise, diagnostic must not block the success path) into plain language. No code changes. All 184 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses PR #701 review from @nattb8. The catch-filter branch for caller cancellation used to silently swallow the exception and let the method return `true` — its normal "batch sent, ask me again" signal. FlushAsync's send loop takes that return value at face value and immediately re-enters on the same cancelled token. HttpClient throws on entry (token still cancelled), the same branch swallows it, `true` is returned again. The batch is never deleted on this path, so ReadBatch keeps handing back the same events — a tight infinite loop. Rethrow instead. The caller's send loop exits via the exception; no behaviour change for HttpClient's internal timeout path (still filtered out by the `when (ct.IsCancellationRequested)` guard) so timeouts still trigger backoff. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two regression guards for PR #701 review from @nattb8. SendBatchAsync_CallerCancelled_Throws: pre-cancel the token, confirm the method throws OperationCanceledException, confirm the batch stays on disk, confirm no backoff and no onError. Sabotage: re-add the empty catch body and this fails because SendBatchAsync returns true silently. FlushAsync_CancelledToken_Terminates_DoesNotHotLoop: pre-cancel the token, start FlushAsync, race against a 2s timeout. With the fix the task faults quickly; without it the task never completes. Also flips the handler to 200 and runs a follow-up FlushAsync to prove _sendInFlight was released (the finally block didn't get stranded). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SendBatchAsync_CallerCancelled_Throws was asserting the exact type `OperationCanceledException` via `Assert.ThrowsAsync`. HttpClient internally catches the OCE our mock throws and rethrows it as `TaskCanceledException` (a subclass), so `ThrowsAsync` — which is exact-type — missed. Switch to `CatchAsync<OperationCanceledException>`, which accepts the whole cancellation family. This is what we actually want to assert: "cancellation propagated", not "HttpClient happens to throw this exact subclass". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8b8e33e to
b588bdf
Compare
nattb8
approved these changes
Apr 23, 2026
ImmutableJeffrey
added a commit
that referenced
this pull request
Apr 24, 2026
Addresses PR #701 review from @nattb8. The catch-filter branch for caller cancellation used to silently swallow the exception and let the method return `true` — its normal "batch sent, ask me again" signal. FlushAsync's send loop takes that return value at face value and immediately re-enters on the same cancelled token. HttpClient throws on entry (token still cancelled), the same branch swallows it, `true` is returned again. The batch is never deleted on this path, so ReadBatch keeps handing back the same events — a tight infinite loop. Rethrow instead. The caller's send loop exits via the exception; no behaviour change for HttpClient's internal timeout path (still filtered out by the `when (ct.IsCancellationRequested)` guard) so timeouts still trigger backoff. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ImmutableJeffrey
added a commit
that referenced
this pull request
Apr 24, 2026
Two regression guards for PR #701 review from @nattb8. SendBatchAsync_CallerCancelled_Throws: pre-cancel the token, confirm the method throws OperationCanceledException, confirm the batch stays on disk, confirm no backoff and no onError. Sabotage: re-add the empty catch body and this fails because SendBatchAsync returns true silently. FlushAsync_CancelledToken_Terminates_DoesNotHotLoop: pre-cancel the token, start FlushAsync, race against a 2s timeout. With the fix the task faults quickly; without it the task never completes. Also flips the handler to 200 and runs a follow-up FlushAsync to prove _sendInFlight was released (the finally block didn't get stranded). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
HttpTransportbackoff state behind_backoffLock. Readers (IsInBackoffWindow,NextAttemptAt,BackoffMs,RescheduleSendTimer) and writers (RecordFailure,ResetBackoff) all go through it.HttpTransport.HttpClientfrom disposing a consumer-supplied handler (disposeHandler: false). Matches the existing_controlClientconvention.SendBatchAsyncparses{accepted, rejected}from the 2xx body and firesOnError(ValidationRejected, ...)with the count whenrejected > 0. Malformed body falls through as zero-rejected so a diagnostic parse error never blocks the success path.FlushAsyncon the existing_sendInFlightflag (the same gate the timer tick uses). Two parallel callers no longer double-POST the same batch; the second polls viaTask.Yielduntil the first releases.CancellationToken cancellationToken = defaulttoFlushAsync(no behaviour change for existing callers) and catchesObjectDisposedExceptionthrown when a concurrentShutdowndisposes the transport mid-flush.rejected > 0, zero-rejected silence, malformed-body fall-through.Linear: SDK-234
Note
Medium Risk
Touches the SDK’s event flushing/network send path, adding new concurrency and cancellation behavior plus backoff locking; regressions could affect delivery, retries, or shutdown behavior under load.
Overview
Hardens event flushing and HTTP transport behavior to avoid duplicate sends and improve observability.
ImmutableAudience.FlushAsyncnow accepts an optionalCancellationToken, serializes concurrent flushes using the existing_sendInFlightgate (preventing double-POST of the same on-disk batch), and exits cleanly if a concurrentShutdowndisposes the transport.HttpTransportnow (1) locks all backoff state reads/writes, (2) avoids disposing a consumer-providedHttpMessageHandler, (3) surfaces partial 2xx acceptance by parsingrejectedfrom the response body and firingOnError(ValidationRejected, ...), and (4) rethrows caller-initiated cancellation so flush loops don’t hot-spin. Tests were added to cover concurrent flush serialization, cancellation propagation, and the new 200-with-rejected response handling.Reviewed by Cursor Bugbot for commit b588bdf. Bugbot is set up for automated code reviews on this repo. Configure here.