Skip to content

test(spirit): stabilize volume-progress test against the restart transient#352

Merged
aparajon merged 3 commits into
mainfrom
aparajon/fix-volume-progress-flake
Jun 15, 2026
Merged

test(spirit): stabilize volume-progress test against the restart transient#352
aparajon merged 3 commits into
mainfrom
aparajon/fix-volume-progress-flake

Conversation

@aparajon

@aparajon aparajon commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Why

TestEngine_Volume_PreservesProgress needs to prove that a volume change preserves in-flight copy progress, but the previous fixture was small enough that Spirit could finish copying before the volume change meaningfully interrupted the apply. That let the test pass via completion without exercising the resume path.

There is also a brief SchemaBot-side restart/setup window during Volume(): the new runner has been scheduled, but it has not necessarily exposed per-table Spirit progress yet. A progress poll in that window can see SchemaBot's zero-valued fallback row. The test should ignore only that setup sample, then assert on the first real resumed table-progress sample.

What

Seed the volume-progress test with the existing 500k-row in-flight-copy fixture so the volume change lands during copy work. After the volume change, skip only samples that do not yet contain real table progress (rows_total == 0), then assert that the first resumed table-progress value has not materially moved backward. Terminal failure states still fail fast, while successful completion remains valid.

Test-only change; no engine behavior is modified.

🤖 Generated with Amp

…sient

A volume change is a Stop (force checkpoint) + Start (resume with new settings).
Right after the restart the resumed runner can briefly report rows_copied=0
before the checkpoint watermark is reflected in its progress. The post-volume
check broke on the first progress reading, so it could snapshot that transient
zero and assert it as a progress reset.

Poll until the resumed progress settles instead: track the highest rows_copied
seen and exit early once it climbs back past the threshold or the change
completes. A transient restart reading no longer fails the test, while a genuine
reset (progress never climbing back) still fails as before.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 15, 2026 14:56

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Stabilizes the TestEngine_Volume_PreservesProgress integration test by avoiding a flaky assertion caused by a transient rows_copied=0 progress report immediately after a volume-triggered Stop+Start restart.

Changes:

  • Adds detailed rationale/commentary documenting the transient progress behavior after volume changes.
  • Reworks the post-volume progress check to poll longer and track the maximum rows_copied observed, exiting once progress recovers past a threshold or the migration completes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/engine/spirit/spirit_integration_test.go
The post-volume poll only treated StateCompleted specially, so an apply driven
to a terminal failure/cancelled/reverted state by the volume change could still
pass (if rows_copied happened to be above the threshold) or take the full poll
window before being mis-reported as a progress reset. A volume change resumes
the copy and must never terminate it as anything but completed, so fail fast on
any other terminal state.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@aparajon aparajon marked this pull request as ready for review June 15, 2026 15:33
@aparajon aparajon requested review from Kiran01bm and morgo as code owners June 15, 2026 15:33
@aparajon aparajon merged commit f041713 into main Jun 15, 2026
25 checks passed
@aparajon aparajon deleted the aparajon/fix-volume-progress-flake branch June 15, 2026 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants