Skip to content

fix(conformance): make OOB plugin RPC timeout configurable, default to 30m#435

Merged
JeroenSoeters merged 3 commits into
mainfrom
fix/conformance-oob-create-timeout
Apr 24, 2026
Merged

fix(conformance): make OOB plugin RPC timeout configurable, default to 30m#435
JeroenSoeters merged 3 commits into
mainfrom
fix/conformance-oob-create-timeout

Conversation

@JeroenSoeters
Copy link
Copy Markdown
Collaborator

@JeroenSoeters JeroenSoeters commented Apr 24, 2026

Summary

The conformance framework's retryOnRecoverable helper hard-codes a
10-minute deadline on the single-attempt wait for an OOB Create or
Delete RPC to the plugin (h.waitForOperationProgress). Resources
that legitimately take longer to reach a terminal state — most
notably AWS::EKS::Cluster, which typically needs 10–15 min to
become ACTIVE, and similarly slow managed-Kubernetes clusters on
other providers — fail the discovery test's CreateOOB step purely
because the deadline fires before the cloud API finishes
provisioning. The outer retry loop then burns through its budget on
subsequent attempts until the matrix job's 2h cap cancels the run.

This PR:

  • Raises the default to 30 minutes, which covers the cloud
    resources we've actually exercised.
  • Makes it overridable via FORMAE_TEST_OOB_TIMEOUT (integer
    minutes — matches the existing FORMAE_TEST_TIMEOUT and
    FORMAE_TEST_DISCOVERY_TIMEOUT convention) so plugin authors with
    even slower resources can extend it without code changes.

Scope-wise this bounds both the OOB Create and OOB Delete plugin
RPCs (both flow through retryOnRecoverable). It is distinct from
#436's
FORMAE_TEST_OOB_DELETE_TIMEOUT, which bounds the post-sync
inventory-tombstone wait (runner.go Step 24) — a separate, much
shorter wait that runs after the plugin-side Delete RPC has already
returned. Both PRs can land independently.

Pairs with the AWS plugin fixes shipped in
platform-engineering-labs/formae-plugin-aws#47.
Once this lands and a pseudo-version or tagged release is available,
formae-plugin-aws can bump its dependency to pick up the fix —
that should get eks-cluster off the timeout list.

@JeroenSoeters JeroenSoeters changed the title fix(conformance): raise OOB create/delete timeout to 30m fix(conformance): make OOB plugin RPC timeout configurable, default to 30m Apr 24, 2026
…rridable

The conformance framework's retryOnRecoverable helper was hard-coding a
10-minute deadline on the single-attempt wait for OOB plugin operations.
Resources that legitimately take longer to reach a terminal state — most
notably AWS::EKS::Cluster, which typically needs 10–15 min to become
ACTIVE — fail the Discovery test's CreateOOB step purely because the
deadline fires before AWS finishes provisioning. The outer retry loop
then burns through its budget on subsequent attempts until the matrix
job's 2h cap cancels the run.

Raise the default to 30 min, which covers the cloud resources we've
actually tested. Plugin authors with even slower resources can override
via FORMAE_CONFORMANCE_OOB_TIMEOUT (any Go duration string, e.g. "45m"
or "1h").
Align with the existing FORMAE_TEST_TIMEOUT and FORMAE_TEST_DISCOVERY_TIMEOUT
convention: bare integer minutes, not Go duration strings. Rename to
FORMAE_TEST_OOB_TIMEOUT to match the FORMAE_TEST_* prefix used by
siblings.
Make the docstring on oobOperationTimeout explicit that it bounds the
plugin-side Create *and* Delete RPC waits via retryOnRecoverable, and
explicitly distinguish it from FORMAE_TEST_OOB_DELETE_TIMEOUT (#436),
which bounds the post-sync inventory-tombstone wait — a different,
much shorter wait handled in runner.go Step 24.

Also surface FORMAE_TEST_OOB_TIMEOUT in the RunCRUDTests env-var
docblock so it appears alongside FORMAE_TEST_TIMEOUT /
FORMAE_TEST_DISCOVERY_TIMEOUT when readers look up the convention.
@JeroenSoeters JeroenSoeters force-pushed the fix/conformance-oob-create-timeout branch from 6964bf6 to e694026 Compare April 24, 2026 19:44
@JeroenSoeters JeroenSoeters merged commit d7edc93 into main Apr 24, 2026
25 checks passed
@JeroenSoeters JeroenSoeters deleted the fix/conformance-oob-create-timeout branch April 24, 2026 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant