Reverted unnecessary reindex test changes by SergeyGaluzo · Pull Request #5563 · microsoft/fhir-server

SergeyGaluzo · 2026-05-12T16:04:18Z

Reverts back all recent changes related to CI pipeline not working in the e2e ReindexTests class.
Reverts back parallel update test in e2e ReindexTests class to spread the update load.
Add deletes of resources before reindex for count sensitive test.
Changes resource deletes on cleanup to hard deletes.
Adjusted CI settings to match PR ones (exception CPU and RAM on replicas)

codecov-commenter · 2026-05-12T17:26:53Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.42%. Comparing base (57ff116) to head (acf203d).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #5563      +/-   ##
==========================================
+ Coverage   77.36%   77.42%   +0.05%     
==========================================
  Files         993      993              
  Lines       36406    36418      +12     
  Branches     5515     5518       +3     
==========================================
+ Hits        28167    28197      +30     
+ Misses       6879     6860      -19     
- Partials     1360     1361       +1

see 13 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

jestradaMS · 2026-05-13T19:10:53Z

-        // Maximum time to wait for a reindex job to reach a terminal state. Set high enough to accommodate
-        // multi-replica search-parameter cache convergence in CI (poll interval up to 30s, conformance refresh
-        // up to 60s, plus reindex worker queue scheduling and retry backoffs).
-        private static readonly TimeSpan ReindexJobCompletionTimeout = TimeSpan.FromMinutes(20);


removing this will cause issues for in PaaS as well as CI as we have seen cases where reindex tests take longer than 5 minutes. IF you have already solved this, great, otherwise you will recreate this failure of the failing as it expects completed but got running......

All reindex tests are passing in CI

For example in your current run, it passed next team as it got lucky on retry with no other load

https://microsofthealthoss.visualstudio.com/FhirServer/_build/results?buildId=49380&view=ms.vss-test-web.build-test-results-tab&runId=2228876&resultId=100848&paneView=debug

There should not any good reason for single resource test to run longer than 5 minutes. We need to look at why it is happening. Increasing time is not a correct approach because it might hide the root cause.

Based on log analytics it was a phantom host that had scaled down but fell into the lookback period set in the convergence logic. You can take a look at this job in CI log analytics.

Finding: Of 11 reindex orchestrator jobs in this build, only Job 1571 (the first) exceeded 5 minutes — it ran 330 seconds (17:33:39 → 17:39:09 UTC) before being cancelled by the test client. Jobs 1572–1644 all completed in 81–94 seconds.

Root cause: ReindexOrchestratorJob.WaitForAllInstancesCacheSyncAsync got stuck waiting for 3/4 hosts synced. The 4th "active" host was a phantom — a replica (xhgdw or lqlp8) that ACA had scaled down ~2 min before the test started but whose last heartbeat still
fell inside the orchestrator's 180 s active-hosts window (ActiveHostsEventsMultiplier=9 × SearchParameterCacheRefreshIntervalSeconds=20s). The dead replica could never refresh its search-parameter cache hash, so the orchestrator polled until the E2E client
cancelled it first.

Good info. If we waited a little bit longer, orchestrator would have failed anyway. Therefore, increase in total reindex wait time is not a solution for this problem. I think it is acceptable to have intermittent tests failures because of this,

This is something that could happen in production as well during scaling. If nothing else we should at least have a follow up work item to harden the convergence for self healing this case e.g. for hosts that haven't converged, check if they are still active in last x time vs just relying on the start lookback.

I am not following. First, reindex will fail with "unable to update cache please retry" or such. When customer retries, old pods should not be considered as there are no messages in the interval orchestrator looks at. Looks that we do not need to do anything. Am I missing sometging?

Yes, it's a bad customer experience for something we can solve.

We are discussing current functionality that exists in PROD, and it is not related to this PR.
I don't see indications that we have PROD problems in this functionality, and therefore I am not comfortable to add any work items. If you think it is justified, please go ahead.

Reverted previous chanmges and restored update test

0a146ba

SergeyGaluzo requested a review from a team as a code owner May 12, 2026 16:04

SergeyGaluzo changed the title ~~Reverted previous reindex test changes~~ Reverted unnecessary reindex test changes May 12, 2026

Delete all Person resources first

f8a67df

github-advanced-security AI found potential problems May 12, 2026

View reviewed changes

Comment thread test/Microsoft.Health.Fhir.Shared.Tests.E2E/Rest/Reindex/ReindexTests.cs Dismissed

SergeyGaluzo added 4 commits May 12, 2026 16:08

100

90736a8

set check success to true

eb490b7

added empty reindex

82f77ef

100

3e16098

fhibf previously approved these changes May 13, 2026

View reviewed changes

Comment

739fd3a

SergeyGaluzo dismissed fhibf’s stale review via 739fd3a May 13, 2026 18:27

SergeyGaluzo enabled auto-merge (squash) May 13, 2026 19:09

jestradaMS reviewed May 13, 2026

View reviewed changes

Comment thread test/Microsoft.Health.Fhir.Shared.Tests.E2E/Rest/Reindex/ReindexTests.cs

jestradaMS reviewed May 13, 2026

View reviewed changes

Comment thread test/Microsoft.Health.Fhir.Shared.Tests.E2E/Rest/Reindex/ReindexTests.cs Outdated

remove old commentr

dfe0952

jestradaMS reviewed May 13, 2026

View reviewed changes

Comment thread test/Microsoft.Health.Fhir.Shared.Tests.E2E/Rest/Reindex/ReindexTests.cs

SergeyGaluzo added 5 commits May 14, 2026 08:39

CI == PR

5462484

delete on init

cbc19fd

added init with deletes

2a26006

Remoived not needed delay

44031f8

24 -> 8

acf203d

fhibf approved these changes May 14, 2026

View reviewed changes

SergeyGaluzo merged commit 0c27653 into main May 14, 2026
48 of 49 checks passed

SergeyGaluzo deleted the users/sergal/tests-back branch May 14, 2026 23:55

Conversation

SergeyGaluzo commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

jestradaMS May 13, 2026

Choose a reason for hiding this comment

Uh oh!

SergeyGaluzo May 13, 2026

Choose a reason for hiding this comment

Uh oh!

jestradaMS May 13, 2026

Choose a reason for hiding this comment

Uh oh!

SergeyGaluzo May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jestradaMS May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SergeyGaluzo May 13, 2026

Choose a reason for hiding this comment

Uh oh!

jestradaMS May 13, 2026

Choose a reason for hiding this comment

Uh oh!

SergeyGaluzo May 13, 2026

Choose a reason for hiding this comment

Uh oh!

jestradaMS May 14, 2026

Choose a reason for hiding this comment

Uh oh!

SergeyGaluzo May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

SergeyGaluzo commented May 12, 2026 •

edited

Loading

codecov-commenter commented May 12, 2026 •

edited

Loading

SergeyGaluzo May 13, 2026 •

edited

Loading

jestradaMS May 13, 2026 •

edited

Loading

SergeyGaluzo May 14, 2026 •

edited

Loading