Skip to content

test(spanner): handle AlreadyExists idempotently on restore test retries#17174

Merged
sakthivelmanii merged 1 commit into
mainfrom
fix-spanner-backup-test-retries
May 19, 2026
Merged

test(spanner): handle AlreadyExists idempotently on restore test retries#17174
sakthivelmanii merged 1 commit into
mainfrom
fix-spanner-backup-test-retries

Conversation

@sakthivelmanii
Copy link
Copy Markdown
Contributor

When a restore database test times out on the client-side, Spanner continues the restore operation in the background. When the test runner retries the test, the call would fail with AlreadyExists.

This change catches the AlreadyExists exception, retrieves the database metadata, and asserts that it is restoring from the correct source backup and achieves a READY or READY_OPTIMIZING state. This allows the test to execute idempotently without namespace collisions.

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

@sakthivelmanii sakthivelmanii requested a review from a team as a code owner May 19, 2026 10:22
@sakthivelmanii sakthivelmanii force-pushed the fix-spanner-backup-test-retries branch from 88b7fe6 to 19cb8be Compare May 19, 2026 10:24
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the Spanner backup and restore sample tests by adding handling for 'AlreadyExists' exceptions, allowing tests to verify existing resources instead of failing. A new helper function, '_verify_restored_database', was introduced to validate the state and encryption configuration of restored databases. Feedback from the review identifies a potential resource leak in 'test_create_database_with_retention_period' where the database is not dropped if successfully created, and suggests improving the helper function to verify all KMS keys instead of just the first one.

I am having trouble creating individual review comments. Click here to see my feedback.

packages/google-cloud-spanner/samples/samples/backup_sample_test.py (332-337)

high

The database.drop() call is currently nested within the except AlreadyExists block. This means the database will not be cleaned up if the creation succeeds on the first attempt, leading to a resource leak in the test environment. The cleanup should be moved outside the except block to ensure it runs in both the success and AlreadyExists scenarios.

    except AlreadyExists:
        # Verify version retention period options match
        database.reload()
        assert database.version_retention_period is not None

    # Clean up
    database.drop()

packages/google-cloud-spanner/samples/samples/backup_sample_test.py (62-63)

medium

The current implementation only verifies the first KMS key when multiple keys are provided. This reduces test coverage compared to the original assertions which checked all keys. It is better to verify that all expected keys are present in the database's encryption configuration.

    if kms_key_names:
        assert len(db.encryption_config.kms_key_names) == len(kms_key_names)
        for expected, actual in zip(kms_key_names, db.encryption_config.kms_key_names):
            assert expected in actual

@sakthivelmanii sakthivelmanii force-pushed the fix-spanner-backup-test-retries branch 2 times, most recently from adccc36 to dcc0af4 Compare May 19, 2026 10:27
When a restore database test times out on the client-side, Spanner
continues the restore operation in the background. When the test runner
retries the test, the call would fail with AlreadyExists.

This change catches the AlreadyExists exception, retrieves the database
metadata, and asserts that it is restoring from the correct source
backup and achieves a READY or READY_OPTIMIZING state. This allows the
test to execute idempotently without namespace collisions.
@sakthivelmanii sakthivelmanii force-pushed the fix-spanner-backup-test-retries branch from dcc0af4 to e959f9e Compare May 19, 2026 10:28
@olavloite
Copy link
Copy Markdown
Contributor

I don't quite understand this comment:

When the test runner retries the test, the call would fail with AlreadyExists.

When would the test runner retry the test? Or do you mean when a new test run of all the integration tests are executed? If it is the latter, isn't this rather an indication that there is a problem with this test re-using the same database name when restoring a database? And that the actual fix would be to use a dynamically generated database name?

(Note: I would also be open to just remove/skip these backup tests. I don't think they add much value, as the client code that is being tested is minimal.)

@sakthivelmanii
Copy link
Copy Markdown
Contributor Author

For backup tests, we do test retry errors like below

@RetryErrors(exception=(DeadlineExceeded, TimeoutError), max_tries=2).

when we take a backup A and tried to restore to db A, we will wait for more than 20m to restore. if it fails due to timeout, we retry the entire test again. retry means, we will try to restore backup A to db A again and it will fail because backup is already in progress and db A was already created which throws AlreadyExists error. To avoid this, we are further waiting the database to be ready rather than restoring it.

@sakthivelmanii sakthivelmanii merged commit 87cc707 into main May 19, 2026
31 checks passed
@sakthivelmanii sakthivelmanii deleted the fix-spanner-backup-test-retries branch May 19, 2026 11:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants