test(spirit): stabilize volume-progress test against the restart transient#352
Merged
Conversation
…sient A volume change is a Stop (force checkpoint) + Start (resume with new settings). Right after the restart the resumed runner can briefly report rows_copied=0 before the checkpoint watermark is reflected in its progress. The post-volume check broke on the first progress reading, so it could snapshot that transient zero and assert it as a progress reset. Poll until the resumed progress settles instead: track the highest rows_copied seen and exit early once it climbs back past the threshold or the change completes. A transient restart reading no longer fails the test, while a genuine reset (progress never climbing back) still fails as before. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Stabilizes the TestEngine_Volume_PreservesProgress integration test by avoiding a flaky assertion caused by a transient rows_copied=0 progress report immediately after a volume-triggered Stop+Start restart.
Changes:
- Adds detailed rationale/commentary documenting the transient progress behavior after volume changes.
- Reworks the post-volume progress check to poll longer and track the maximum
rows_copiedobserved, exiting once progress recovers past a threshold or the migration completes.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The post-volume poll only treated StateCompleted specially, so an apply driven to a terminal failure/cancelled/reverted state by the volume change could still pass (if rows_copied happened to be above the threshold) or take the full poll window before being mis-reported as a progress reset. A volume change resumes the copy and must never terminate it as anything but completed, so fail fast on any other terminal state. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
morgo
approved these changes
Jun 15, 2026
Amp-Thread-ID: https://ampcode.com/threads/T-019eb357-321a-7760-b4fe-e3c606cba251 Co-authored-by: Amp <amp@ampcode.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
TestEngine_Volume_PreservesProgressneeds to prove that a volume change preserves in-flight copy progress, but the previous fixture was small enough that Spirit could finish copying before the volume change meaningfully interrupted the apply. That let the test pass via completion without exercising the resume path.There is also a brief SchemaBot-side restart/setup window during
Volume(): the new runner has been scheduled, but it has not necessarily exposed per-table Spirit progress yet. A progress poll in that window can see SchemaBot's zero-valued fallback row. The test should ignore only that setup sample, then assert on the first real resumed table-progress sample.What
Seed the volume-progress test with the existing 500k-row in-flight-copy fixture so the volume change lands during copy work. After the volume change, skip only samples that do not yet contain real table progress (
rows_total == 0), then assert that the first resumed table-progress value has not materially moved backward. Terminal failure states still fail fast, while successful completion remains valid.Test-only change; no engine behavior is modified.
🤖 Generated with Amp