Slice 3B: real docker pull/swap/healthcheck/rollback in the updater#131
Conversation
The Slice 3A sidecar (#128) shipped with stubbed docker operations that log + sleep so the state machine and HTTP API could be reviewed independently of real container surgery. This PR makes those stubs real. What's now real (was stub): Pull(ref) -> docker pull ref InspectAppImageDigest() -> docker inspect <container> -> .Image -> docker inspect <image-id> -> .RepoDigests -> sha256 part Falls back to image ID if no RepoDigest (locally-built images). RunMigration(image, cmd) -> docker run --rm --volumes-from <app> <image> <cmd> Migration runs against the same data volumes as the live app. Idempotent migrations are expected. SwapContainer(newImage) -> rewrite PROCESSGIT_VERSION in deploy/.env then `docker compose -f /deploy/docker-compose.yml up -d --no-deps processgit`. Returns the OLD container ID for the Job log. Healthcheck() -> HTTP GET against PROCESSGIT_UPDATER_HEALTH_URL (default http://processgit:3000/api/v1/version) polled every 3s up to a 2m deadline. Rollback(old) -> revert PROCESSGIT_VERSION to the previous value + the same `docker compose up`. What's still deferred (Slice 3C): Snapshot(dst) -> stubbed; now logs a warning and returns nil so updates aren't blocked. Slice 3C will implement `docker run --rm --volumes-from <app> -v <dst>:/snap alpine tar czf …`. Stub mode is preserved (PROCESSGIT_UPDATER_STUB=true) so: - Existing tests in orchestrator_test.go keep working without docker - Operators can validate the architecture end-to-end before flipping to real updates - The transition is per-deployment via an env var flip New configuration (with sensible defaults that match the compose integration PR): PROCESSGIT_UPDATER_COMPOSE_FILE default: /deploy/docker-compose.yml PROCESSGIT_UPDATER_ENV_FILE default: /deploy/.env PROCESSGIT_UPDATER_HEALTH_URL default: http://processgit:3000/api/v1/version New helper: env.go (.env file editor) SetEnvFileKey(path, key, value) -> (previous, hadKey, error) GetEnvFileKey(path, key) -> (value, ok, error) Atomic write-temp-then-rename. Preserves comments, blank lines, and other keys. Comments out duplicate occurrences of the target key. Tests (9 new + 7 existing = 16, all passing): TestSetEnvFileKey_UpdateExisting TestSetEnvFileKey_AppendsWhenMissing TestSetEnvFileKey_FileDoesNotExist TestSetEnvFileKey_QuotedValues TestSetEnvFileKey_DuplicatesCommented TestSetEnvFileKey_EmptyKeyRejected TestGetEnvFileKey TestImageVersion (covers tag/digest/port-in-registry parsing) TestLastLines (log-output truncation helper) Co-authored-by: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 83c9b2d375
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| newTag := imageVersion(newImage) | ||
| if newTag == "" { | ||
| return oldID, fmt.Errorf("could not derive version tag from image %q", newImage) |
There was a problem hiding this comment.
Derive swap version from manifest tag, not digest ref
SwapContainer extracts PROCESSGIT_VERSION from newImage, but the orchestrator passes manifest.Image.DigestRef() (digest form like repo@sha256:...), so imageVersion returns empty and this path errors before compose is run. In non-stub mode this makes normal updates fail at the swap step for digest-pinned releases.
Useful? React with 👍 / 👎.
| // reliable way to determine the previous PROCESSGIT_VERSION from a | ||
| // raw digest, so we fall back to "latest" — best-effort recovery. | ||
| d.Log.Warn("rollback target has no tag form; falling back to latest", "old_image", oldImage) | ||
| rollbackTag = "latest" |
There was a problem hiding this comment.
Roll back to previous version instead of forcing latest
On rollback, oldImage is typically a digest captured by InspectAppImageDigest, so imageVersion(oldImage) is empty and this branch rewrites PROCESSGIT_VERSION to latest. After a failed healthcheck, that can redeploy a different image than the one that was running before the update, defeating rollback correctness.
Useful? React with 👍 / 👎.
Slice 3B — real docker operations in the updater
Replaces the Slice 3A stubs (#128) with real
dockerCLI invocations. After this lands, withPROCESSGIT_UPDATER_STUB=false, the sidecar can actually pull images, run migrations, swap containers, healthcheck, and roll back.What's now real
Pull(ref)docker pull <ref>InspectAppImageDigest()docker inspect → .Image → .RepoDigests → sha256part. Falls back to image ID for locally-built images that have no registry digest.RunMigration(image, cmd)docker run --rm --volumes-from <app> <image> <cmd>— migration sees the same data dir as the live appSwapContainer(newImage)PROCESSGIT_VERSIONindeploy/.env→docker compose -f /deploy/docker-compose.yml up -d --no-deps processgit. Compose recreates only the app container, leaving the updater and bootstrap containers alone.Healthcheck()$PROCESSGIT_UPDATER_HEALTH_URLpolled every 3s up to a 2m deadlineRollback(old)Compose-driven swap design
The swap doesn't try to reconstruct full container config from
docker inspect. Instead it leans on compose:Two prerequisites, both satisfied by the compose-integration PR:
image: ghcr.io/algomation-ai/processgit:${PROCESSGIT_VERSION:-latest}/deploybind-mounted RWWhat's still deferred (Slice 3C)
Snapshot()now logs a warning and returns nil — updates proceed without an on-disk backup. Slice 3C will implement:This is a separate PR because backup/restore deserves its own design review (compression choice, retention policy, where snapshots live, restore command). Rolling back still works without snapshots — it just doesn't restore data if the migration mutated it. For Slice 3B's intended use case (additive schema migrations on the live app), that's acceptable.
Stub mode preserved
PROCESSGIT_UPDATER_STUB=true(the current default) keeps every method behaving like in Slice 3A: log + sleep, return synthetic values. This:orchestrator_test.goworking unchangedThe compose-integration PR threads
PROCESSGIT_UPDATER_STUBfromdeploy/.env, so flipping is justsed -i 's/STUB=true/STUB=false/' deploy/.env && docker compose up -d processgit-updater.New env file helper
env.gois a tiny .env editor:Atomic write-temp-then-rename. Preserves comments, blank lines, other keys. Duplicate occurrences of the target key are commented out with a
# duplicate of Xmarker for safety.New config
PROCESSGIT_UPDATER_COMPOSE_FILE/deploy/docker-compose.ymlPROCESSGIT_UPDATER_ENV_FILE/deploy/.envPROCESSGIT_UPDATER_HEALTH_URLhttp://processgit:3000/api/v1/versionDefaults align with the path conventions in the compose-integration PR.
Tests (16 total, all passing locally)
9 new:
TestSetEnvFileKey_UpdateExisting— preserves other keys and commentsTestSetEnvFileKey_AppendsWhenMissing— adds key if absentTestSetEnvFileKey_FileDoesNotExist— creates fileTestSetEnvFileKey_QuotedValues— strips quotes on read-backTestSetEnvFileKey_DuplicatesCommented— duplicate occurrences become# X=old # duplicate of XTestSetEnvFileKey_EmptyKeyRejectedTestGetEnvFileKey— exhaustive: present/empty/quoted/missingTestImageVersion— parses tag/digest/port-in-registry refs correctlyTestLastLines— log-tail helper edge cases7 existing (still pass):
TestStore_RoundTrip,TestState_IsTerminal,TestJob_TransitionFinishesOnTerminal,TestOrchestrator_HappyPath,TestOrchestrator_RejectsConcurrent,TestAPI_BearerAuth,TestNoExternalImports.Validation done
Sequencing
This PR is best merged AFTER the compose-integration PR (the new compose contract is what
SwapContainerrelies on). It's safe to merge before: until the compose hasPROCESSGIT_VERSION//deploymount, the real ops will error gracefully with a clear message — and stub mode (the default) keeps working regardless.