Skip to content

Return 408 for cancelled bundle transaction cleanup#5553

Draft
mikaelweave wants to merge 3 commits into
mainfrom
agents/incident-repair-unassigned-sprint
Draft

Return 408 for cancelled bundle transaction cleanup#5553
mikaelweave wants to merge 3 commits into
mainfrom
agents/incident-repair-unassigned-sprint

Conversation

@mikaelweave
Copy link
Copy Markdown
Contributor

@mikaelweave mikaelweave commented May 5, 2026

Summary

Fixes Bug #188321 for the BundleHandler response-mapping failure mode: transaction bundle processing can return HTTP 500 when the client disconnects and cleanup hits SqlTransaction has completed; it is no longer usable.

This PR is intentionally limited to the BundleHandler cancellation response mapping and its targeted regression test. The SqlServerFhirDataStore cleanup-token change was split into a separate PR.

Changes

  • BundleHandler.cs: when IsCompletedTransactionException() is caught and the request token is cancelled, throw FhirTransactionCancelledException so the response is HTTP 408 instead of HTTP 500.
  • BundleHandlerTests.cs: add GivenATransaction_WithCancellationAndSqlTransactionZombied_ReturnsCancelledException to cover the cancelled-token + zombied-transaction path.

Non-goals

  • Does not change SqlServerFhirDataStore.cs; that cleanup-token change is split to a separate PR.
  • Does not change broader transaction abort behavior for response statuses without OperationOutcome.
  • Does not change the parallel bundle path.
  • Does not handle SQL conflict retry behavior; that is covered separately by PR [BugFix] Handle SQL Conflicts during C# transactions as HTTP412 #5541 / AB#188318.

Testing

  • dotnet test src\Microsoft.Health.Fhir.Api.UnitTests\Microsoft.Health.Fhir.R4.Api.UnitTests.csproj --filter "FullyQualifiedName~GivenATransaction_WithCancellationAndSqlTransactionZombied_ReturnsCancelledException" --verbosity minimal

Related

ADO Bug #188321

… bundles

Bug #188321: Batch/Transaction bundles fail with 'SqlTransaction has
completed; it is no longer usable' race condition.

Root cause is two interacting bugs:

Bug A - SqlServerFhirDataStore.MergeInternalAsync passes an already-
cancelled CancellationToken to MergeResourcesCommitTransactionAsync in
the error catch block. The cancelled token causes OpenAsync to throw
OperationCanceledException immediately, masking the original exception
and leaving server-side transaction state unclean.

Fix A: Pass CancellationToken.None so the abort call always completes.

Bug B - SqlTransactionScope.Dispose() (healthcare-shared-components)
disposes SqlConnection before SqlTransaction. This zombies the
transaction, causing SqlTransaction.Dispose()'s internal Rollback() to
throw InvalidOperationException('This SqlTransaction has completed').
During C# using-block unwinding this replaces the active exception
(FhirTransactionCancelledException), which BundleHandler then catches
and returns HTTP 500 instead of HTTP 408.

Fix B: Check cancellationToken.IsCancellationRequested in the
IsCompletedTransactionException catch block and throw
FhirTransactionCancelledException (408) instead of
FhirTransactionFailedException (500) when the client cancelled.

Fix C (defense-in-depth): Expand the transaction abort guard in both
sequential (BundleHandler) and parallel (BundleHandlerParallelOperations)
paths to abort on any 4xx/5xx status code, not only when
entry.Response.Outcome != null (outcome is null when client disconnects
mid-stream and the response body is empty).

Also adds a unit test covering the full failure chain:
2-entry transaction bundle + cancelled token + SqlTransaction zombie
exception -> asserts FhirTransactionCancelledException (408).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mikaelweave mikaelweave requested a review from a team as a code owner May 5, 2026 22:07
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.33%. Comparing base (a894a3b) to head (e6e7fdb).
⚠️ Report is 18 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5553      +/-   ##
==========================================
+ Coverage   77.02%   77.33%   +0.30%     
==========================================
  Files         983      993      +10     
  Lines       36007    36421     +414     
  Branches     5469     5519      +50     
==========================================
+ Hits        27736    28167     +431     
+ Misses       6927     6892      -35     
- Partials     1344     1362      +18     

see 54 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mikaelweave mikaelweave marked this pull request as draft May 7, 2026 04:41
Remove broader transaction abort-guard changes so PR #5553 only keeps the cancellation mapping, cleanup token handling, and targeted regression test for bug 188321.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mikaelweave mikaelweave changed the title Fix SqlTransaction race condition causing 500 on client disconnect in bundles Return 408 for cancelled bundle transaction cleanup May 14, 2026
Keep PR #5553 scoped to BundleHandler cancellation response mapping and its regression test. The SqlServerFhirDataStore cleanup token change is split to a separate branch for independent review.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants