Skip to content

DNS-based deployments: vanity hostname claims (first-come-first-served) #148

@posix4e

Description

@posix4e

Parked work from closed PR #145. Full design lives in the PR diff / code if we want to pick it up.

Problem

Every workload URL today is welded to an agent UUID (`-.devopsdefender.com`). When an agent STONITHs or relaunches, the URL breaks. No way to declare a stable short URL like `nvidia-smi.devopsdefender.com` that follows the workload around the fleet, and no automatic failover — if the agent serving a user-visible demo dies, the URL orphans until someone redeploys manually.

Shape

  1. Schema: `expose:` gains a mutually-exclusive `claim_hostname` field alongside `hostname_label`.
  2. Wire: `DD_EXTRA_INGRESS` env extends to `@name:port` for claim entries (auto-label `label:port` stays).
  3. Arbitration: CP POSTs CNAMEs without upsert (`cf::try_claim_cname`). First call wins; later callers hit 409. DNS uniqueness is the lock.
  4. Teardown: collector's orphan-GC path releases the CNAME + CF Access app when the owning agent dies (ownership-checked so it doesn't stomp a legitimate takeover).

Phase 2 (not in #145)

Scraper-driven automatic relaunch: when an agent with active claims goes unhealthy, CP picks another eligible agent (capability match: `require_labels: ["gpu"]` etc.), posts the spec, repoints the CNAME.

Why we closed the PR

Pausing while we focus on smaller near-term fixes. Design is solid; re-open when we want DNS to be the deployment contract.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions