Skip to content

postgres: Support replace_existing on branches and endpoints#5264

Merged
pietern merged 8 commits into
mainfrom
postgres-replace-existing
May 19, 2026
Merged

postgres: Support replace_existing on branches and endpoints#5264
pietern merged 8 commits into
mainfrom
postgres-replace-existing

Conversation

@pietern
Copy link
Copy Markdown
Contributor

@pietern pietern commented May 18, 2026

Summary

Adds support for replace_existing on Lakebase postgres_branches and postgres_endpoints so users can bring the implicitly-created production branch and primary read-write endpoint under bundle management — previously these always returned 409 ALREADY_EXISTS on deploy.

Also closes the destroy-side gap: the backend signals lifecycle-owned-by-parent via details[].ErrorInfo.metadata.declarative_context: MANAGED_BY_PARENT. The Terraform provider already honors this; this PR adds the equivalent suppression in the direct engine. The acceptance testserver is taught the same payload shape.

Two focused acceptance tests cover the override-implicit-resource flow on both engines:

  • postgres_branches/replace_existing: takes over the production branch, applies a non-default no_expiry: true.
  • postgres_endpoints/replace_existing: takes over the primary endpoint of a user-created branch, overrides the project-inherited suspend_timeout_duration (300s → 600s).

Depends on

Built on top of #5263 (testproxy: forward raw error body and headers from upstream) — without that fix, the acceptance test proxy strips details[] from upstream errors before they reach the engines, so the suppression in TF and the direct engine never fires when tests run against cloud.

Test plan

  • Manually ran both acceptance tests against a real workspace — full deploy + destroy on both engines.

This pull request and its description were written by Isaac.

pietern added 5 commits May 18, 2026 16:10
Lakebase auto-provisions a "production" branch on every project and a
"primary" read-write endpoint on every branch. Without replace_existing,
declaring these in a bundle returns 409 ALREADY_EXISTS and the user has
no way to bring them under management.

Wires the replace_existing field through PostgresBranchConfig /
PostgresEndpointConfig and the direct + terraform engines, marks it as
input-only (the GET API never returns it; it cannot be updated after
create), and adds two focused acceptance tests that each prove a default
setting is overridden:

  * postgres_branches/replace_existing: flip is_protected on the implicit
    production branch from false (default) to true.
  * postgres_endpoints/replace_existing: override suspend_timeout_duration
    on the implicit primary endpoint of a managed branch from the
    project-inherited 300s to 600s.

Aligns the in-memory testserver: every branch (default and user-created)
now implicitly provisions a primary RW endpoint, and replace_existing=true
on CreateBranch / CreateEndpoint updates in place instead of returning
409. Both tests pass on aws-prod-ucws and locally on both engines.

Co-authored-by: Isaac
Lakebase resources whose lifecycle is owned by a parent (the implicit
production branch on a project, the implicit primary read-write endpoint
on a branch) cannot be deleted independently. The backend signals this
by attaching declarative_context=MANAGED_BY_PARENT to ErrorInfo.metadata
on the 400 response; declarative tools are expected to disregard the
delete and rely on the parent to cascade-clean. The Terraform provider
v1.115.0 already implements this via declarative.IsDeleteError.

Mirrors the suppression in the direct engine: new isManagedByParent
helper in bundle/direct/util.go, used alongside isResourceGone in
DeploymentUnit.Delete. With this in place, bundle destroy now does the
expected cascade — leaf delete fails with the metadata marker, is
disregarded, parent delete proceeds and cleans the leaf along the way.

The acceptance testserver is taught the same payload (new
postgresManagedByParentErrorResponse) so the implicit-branch and
implicit-endpoint delete paths emit the cloud-shape error including
details[].ErrorInfo. Both replace_existing acceptance tests now exercise
bundle destroy end-to-end on both engines; outputs and request +
response traces are captured per-engine.

Note: an existing test-proxy bug (libs/testproxy/server.go re-marshals
APIError to only error_code+message, dropping details[]) means cloud
runs still see destroy fail; that is being addressed separately.

Co-authored-by: Isaac
The reverse proxy in libs/testproxy re-marshalled apierr.APIError into a
{error_code, message} envelope, dropping details[] and any other fields
the workspace returned. As a result, acceptance tests run against the
cloud could not observe error metadata that real CLI/TF invocations rely
on.

Forward apiErr.ResponseWrapper.DebugBytes verbatim with the original
status code so callers see exactly what the workspace sent. Also pass
through response headers in includeResponseHeaders on the error path;
WithResponseHeader visitors are not invoked when apiClient.Do returns
an error.

ResponseWrapper has been populated on every APIError since
databricks/databricks-sdk-go#1261 (v0.100.0); the CLI is on v0.132.0.
A panic guards the invariant in case the SDK ever changes shape.

Co-authored-by: Isaac
@pietern pietern temporarily deployed to test-trigger-is May 18, 2026 19:00 — with GitHub Actions Inactive
@pietern pietern temporarily deployed to test-trigger-is May 18, 2026 19:00 — with GitHub Actions Inactive
…sing

The destroy-side suppression landed in the previous commit; the test
scripts still referred to bundle destroy as "expected to fail." Update
the inline comments and section titles to describe the actual flow
(backend signals MANAGED_BY_PARENT, both engines disregard, parent
delete cascades). No behavioural change.

Co-authored-by: Isaac
@pietern pietern temporarily deployed to test-trigger-is May 18, 2026 19:07 — with GitHub Actions Inactive
@pietern pietern temporarily deployed to test-trigger-is May 18, 2026 19:07 — with GitHub Actions Inactive
@pietern pietern changed the title postgres: Support replace_existing on branches and endpoints postgres: Support replace_existing on branches and endpoints May 18, 2026
The bundle/refschema test catalogs every resource field. Adding
replace_existing to PostgresBranchConfig and PostgresEndpointConfig
shows up there and needed the snapshot update.

Co-authored-by: Isaac
@pietern pietern temporarily deployed to test-trigger-is May 18, 2026 19:29 — with GitHub Actions Inactive
@pietern pietern temporarily deployed to test-trigger-is May 18, 2026 19:29 — with GitHub Actions Inactive
Recreate is a separate path from Delete in the direct engine and only
checked isResourceGone. That left a direct/terraform behavior gap: an
immutable-field change on a parent-managed resource (e.g. flipping
endpoint_type on an implicit primary endpoint) would hard-fail on
the delete-half of recreate in direct mode, while terraform routes
the same call through the provider's Delete with the suppression in
place.

Apply isManagedByParent here too so the subsequent Create with
replace_existing=true reconfigures the resource in place, matching
terraform.

Co-authored-by: Isaac
@pietern pietern temporarily deployed to test-trigger-is May 18, 2026 20:04 — with GitHub Actions Inactive
@pietern pietern temporarily deployed to test-trigger-is May 18, 2026 20:04 — with GitHub Actions Inactive
Base automatically changed from testserver-proxy-fix to main May 19, 2026 06:23
resources.postgres_endpoints.*.name string REMOTE
resources.postgres_endpoints.*.no_suspension bool INPUT STATE
resources.postgres_endpoints.*.parent string ALL
resources.postgres_endpoints.*.replace_existing bool INPUT STATE
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is create-only field, right? In that case, why do we need to track it in STATE?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no way to pass it into DoCreate without putting it in the state type.

I couldn't find precedent for this in other resources.

Hypothetically if we were to add a create-only side channel, I'm not sure the added complexity is worth it.

@pietern pietern requested a review from denik May 19, 2026 13:05
@pietern pietern added this pull request to the merge queue May 19, 2026
Merged via the queue into main with commit 75586fd May 19, 2026
28 checks passed
@pietern pietern deleted the postgres-replace-existing branch May 19, 2026 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants