fix(conformance): make OOB plugin RPC timeout configurable, default to 30m#435
Merged
Merged
Conversation
…rridable The conformance framework's retryOnRecoverable helper was hard-coding a 10-minute deadline on the single-attempt wait for OOB plugin operations. Resources that legitimately take longer to reach a terminal state — most notably AWS::EKS::Cluster, which typically needs 10–15 min to become ACTIVE — fail the Discovery test's CreateOOB step purely because the deadline fires before AWS finishes provisioning. The outer retry loop then burns through its budget on subsequent attempts until the matrix job's 2h cap cancels the run. Raise the default to 30 min, which covers the cloud resources we've actually tested. Plugin authors with even slower resources can override via FORMAE_CONFORMANCE_OOB_TIMEOUT (any Go duration string, e.g. "45m" or "1h").
Align with the existing FORMAE_TEST_TIMEOUT and FORMAE_TEST_DISCOVERY_TIMEOUT convention: bare integer minutes, not Go duration strings. Rename to FORMAE_TEST_OOB_TIMEOUT to match the FORMAE_TEST_* prefix used by siblings.
Make the docstring on oobOperationTimeout explicit that it bounds the plugin-side Create *and* Delete RPC waits via retryOnRecoverable, and explicitly distinguish it from FORMAE_TEST_OOB_DELETE_TIMEOUT (#436), which bounds the post-sync inventory-tombstone wait — a different, much shorter wait handled in runner.go Step 24. Also surface FORMAE_TEST_OOB_TIMEOUT in the RunCRUDTests env-var docblock so it appears alongside FORMAE_TEST_TIMEOUT / FORMAE_TEST_DISCOVERY_TIMEOUT when readers look up the convention.
6964bf6 to
e694026
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The conformance framework's
retryOnRecoverablehelper hard-codes a10-minute deadline on the single-attempt wait for an OOB Create or
Delete RPC to the plugin (
h.waitForOperationProgress). Resourcesthat legitimately take longer to reach a terminal state — most
notably
AWS::EKS::Cluster, which typically needs 10–15 min tobecome
ACTIVE, and similarly slow managed-Kubernetes clusters onother providers — fail the discovery test's
CreateOOBstep purelybecause the deadline fires before the cloud API finishes
provisioning. The outer retry loop then burns through its budget on
subsequent attempts until the matrix job's 2h cap cancels the run.
This PR:
resources we've actually exercised.
FORMAE_TEST_OOB_TIMEOUT(integerminutes — matches the existing
FORMAE_TEST_TIMEOUTandFORMAE_TEST_DISCOVERY_TIMEOUTconvention) so plugin authors witheven slower resources can extend it without code changes.
Scope-wise this bounds both the OOB Create and OOB Delete plugin
RPCs (both flow through
retryOnRecoverable). It is distinct from#436's
FORMAE_TEST_OOB_DELETE_TIMEOUT, which bounds the post-syncinventory-tombstone wait (runner.go Step 24) — a separate, much
shorter wait that runs after the plugin-side Delete RPC has already
returned. Both PRs can land independently.
Pairs with the AWS plugin fixes shipped in
platform-engineering-labs/formae-plugin-aws#47.
Once this lands and a pseudo-version or tagged release is available,
formae-plugin-awscan bump its dependency to pick up the fix —that should get
eks-clusteroff the timeout list.