Skip to content

Slice 3B: real docker pull/swap/healthcheck/rollback in the updater#131

Merged
rg4444 merged 1 commit into
mainfrom
slice-3b/real-docker-ops
May 23, 2026
Merged

Slice 3B: real docker pull/swap/healthcheck/rollback in the updater#131
rg4444 merged 1 commit into
mainfrom
slice-3b/real-docker-ops

Conversation

@rg4444
Copy link
Copy Markdown
Contributor

@rg4444 rg4444 commented May 23, 2026

Slice 3B — real docker operations in the updater

Replaces the Slice 3A stubs (#128) with real docker CLI invocations. After this lands, with PROCESSGIT_UPDATER_STUB=false, the sidecar can actually pull images, run migrations, swap containers, healthcheck, and roll back.

What's now real

Op Implementation
Pull(ref) docker pull <ref>
InspectAppImageDigest() docker inspect → .Image → .RepoDigests → sha256 part. Falls back to image ID for locally-built images that have no registry digest.
RunMigration(image, cmd) docker run --rm --volumes-from <app> <image> <cmd> — migration sees the same data dir as the live app
SwapContainer(newImage) Rewrite PROCESSGIT_VERSION in deploy/.envdocker compose -f /deploy/docker-compose.yml up -d --no-deps processgit. Compose recreates only the app container, leaving the updater and bootstrap containers alone.
Healthcheck() HTTP GET against $PROCESSGIT_UPDATER_HEALTH_URL polled every 3s up to a 2m deadline
Rollback(old) Same as swap but with the previous PROCESSGIT_VERSION value.

Compose-driven swap design

The swap doesn't try to reconstruct full container config from docker inspect. Instead it leans on compose:

1. Read deploy/.env       → capture current PROCESSGIT_VERSION (for rollback)
2. Rewrite deploy/.env    → PROCESSGIT_VERSION=new-tag (atomic, preserves all other keys)
3. docker compose -f /deploy/docker-compose.yml up -d --no-deps processgit

Two prerequisites, both satisfied by the compose-integration PR:

  • The compose file uses image: ghcr.io/algomation-ai/processgit:${PROCESSGIT_VERSION:-latest}
  • The updater container has /deploy bind-mounted RW

What's still deferred (Slice 3C)

Snapshot() now logs a warning and returns nil — updates proceed without an on-disk backup. Slice 3C will implement:

docker run --rm --volumes-from <app> -v <dst>:/snap alpine \
  tar czf /snap/data-<jobid>.tgz /data

This is a separate PR because backup/restore deserves its own design review (compression choice, retention policy, where snapshots live, restore command). Rolling back still works without snapshots — it just doesn't restore data if the migration mutated it. For Slice 3B's intended use case (additive schema migrations on the live app), that's acceptable.

Stub mode preserved

PROCESSGIT_UPDATER_STUB=true (the current default) keeps every method behaving like in Slice 3A: log + sleep, return synthetic values. This:

  • Keeps existing tests in orchestrator_test.go working unchanged
  • Lets operators validate the architecture end-to-end before flipping to real updates
  • Makes the transition per-deployment via an env flip, not a code change

The compose-integration PR threads PROCESSGIT_UPDATER_STUB from deploy/.env, so flipping is just sed -i 's/STUB=true/STUB=false/' deploy/.env && docker compose up -d processgit-updater.

New env file helper

env.go is a tiny .env editor:

SetEnvFileKey(path, key, value) (prev string, hadKey bool, err error)
GetEnvFileKey(path, key)        (val string, ok bool, err error)

Atomic write-temp-then-rename. Preserves comments, blank lines, other keys. Duplicate occurrences of the target key are commented out with a # duplicate of X marker for safety.

New config

Env var Default Purpose
PROCESSGIT_UPDATER_COMPOSE_FILE /deploy/docker-compose.yml Compose file the swap operates on
PROCESSGIT_UPDATER_ENV_FILE /deploy/.env Where to persist the new PROCESSGIT_VERSION
PROCESSGIT_UPDATER_HEALTH_URL http://processgit:3000/api/v1/version What to GET during healthcheck

Defaults align with the path conventions in the compose-integration PR.

Tests (16 total, all passing locally)

9 new:

  • TestSetEnvFileKey_UpdateExisting — preserves other keys and comments
  • TestSetEnvFileKey_AppendsWhenMissing — adds key if absent
  • TestSetEnvFileKey_FileDoesNotExist — creates file
  • TestSetEnvFileKey_QuotedValues — strips quotes on read-back
  • TestSetEnvFileKey_DuplicatesCommented — duplicate occurrences become # X=old # duplicate of X
  • TestSetEnvFileKey_EmptyKeyRejected
  • TestGetEnvFileKey — exhaustive: present/empty/quoted/missing
  • TestImageVersion — parses tag/digest/port-in-registry refs correctly
  • TestLastLines — log-tail helper edge cases

7 existing (still pass): TestStore_RoundTrip, TestState_IsTerminal, TestJob_TransitionFinishesOnTerminal, TestOrchestrator_HappyPath, TestOrchestrator_RejectsConcurrent, TestAPI_BearerAuth, TestNoExternalImports.

PASS
ok    github.com/Algomation-AI/ProcessGit/updater   14.023s

Validation done

  • gofmt clean
  • go vet clean
  • Build succeeds (sandbox uses Go 1.22; module go.mod stays at 1.25 for production)
  • All 16 tests pass

Sequencing

This PR is best merged AFTER the compose-integration PR (the new compose contract is what SwapContainer relies on). It's safe to merge before: until the compose has PROCESSGIT_VERSION//deploy mount, the real ops will error gracefully with a clear message — and stub mode (the default) keeps working regardless.

The Slice 3A sidecar (#128) shipped with stubbed docker operations that
log + sleep so the state machine and HTTP API could be reviewed
independently of real container surgery. This PR makes those stubs real.

What's now real (was stub):

  Pull(ref)                  -> docker pull ref
  InspectAppImageDigest()    -> docker inspect <container>
                                -> .Image -> docker inspect <image-id>
                                -> .RepoDigests -> sha256 part
                                Falls back to image ID if no
                                RepoDigest (locally-built images).
  RunMigration(image, cmd)   -> docker run --rm --volumes-from <app> <image> <cmd>
                                Migration runs against the same data
                                volumes as the live app. Idempotent
                                migrations are expected.
  SwapContainer(newImage)    -> rewrite PROCESSGIT_VERSION in deploy/.env
                                then `docker compose -f /deploy/docker-compose.yml
                                up -d --no-deps processgit`.
                                Returns the OLD container ID for the
                                Job log.
  Healthcheck()              -> HTTP GET against PROCESSGIT_UPDATER_HEALTH_URL
                                (default http://processgit:3000/api/v1/version)
                                polled every 3s up to a 2m deadline.
  Rollback(old)              -> revert PROCESSGIT_VERSION to the previous
                                value + the same `docker compose up`.

What's still deferred (Slice 3C):

  Snapshot(dst)              -> stubbed; now logs a warning and returns
                                nil so updates aren't blocked. Slice 3C
                                will implement `docker run --rm --volumes-from
                                <app> -v <dst>:/snap alpine tar czf …`.

Stub mode is preserved (PROCESSGIT_UPDATER_STUB=true) so:

  - Existing tests in orchestrator_test.go keep working without docker
  - Operators can validate the architecture end-to-end before flipping
    to real updates
  - The transition is per-deployment via an env var flip

New configuration (with sensible defaults that match the compose
integration PR):

  PROCESSGIT_UPDATER_COMPOSE_FILE   default: /deploy/docker-compose.yml
  PROCESSGIT_UPDATER_ENV_FILE       default: /deploy/.env
  PROCESSGIT_UPDATER_HEALTH_URL     default: http://processgit:3000/api/v1/version

New helper: env.go (.env file editor)

  SetEnvFileKey(path, key, value) -> (previous, hadKey, error)
  GetEnvFileKey(path, key)        -> (value, ok, error)

Atomic write-temp-then-rename. Preserves comments, blank lines, and
other keys. Comments out duplicate occurrences of the target key.

Tests (9 new + 7 existing = 16, all passing):

  TestSetEnvFileKey_UpdateExisting
  TestSetEnvFileKey_AppendsWhenMissing
  TestSetEnvFileKey_FileDoesNotExist
  TestSetEnvFileKey_QuotedValues
  TestSetEnvFileKey_DuplicatesCommented
  TestSetEnvFileKey_EmptyKeyRejected
  TestGetEnvFileKey
  TestImageVersion       (covers tag/digest/port-in-registry parsing)
  TestLastLines          (log-output truncation helper)

Co-authored-by: Claude <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 83c9b2d375

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread updater/docker.go
Comment on lines +197 to +199
newTag := imageVersion(newImage)
if newTag == "" {
return oldID, fmt.Errorf("could not derive version tag from image %q", newImage)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Derive swap version from manifest tag, not digest ref

SwapContainer extracts PROCESSGIT_VERSION from newImage, but the orchestrator passes manifest.Image.DigestRef() (digest form like repo@sha256:...), so imageVersion returns empty and this path errors before compose is run. In non-stub mode this makes normal updates fail at the swap step for digest-pinned releases.

Useful? React with 👍 / 👎.

Comment thread updater/docker.go
// reliable way to determine the previous PROCESSGIT_VERSION from a
// raw digest, so we fall back to "latest" — best-effort recovery.
d.Log.Warn("rollback target has no tag form; falling back to latest", "old_image", oldImage)
rollbackTag = "latest"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Roll back to previous version instead of forcing latest

On rollback, oldImage is typically a digest captured by InspectAppImageDigest, so imageVersion(oldImage) is empty and this branch rewrites PROCESSGIT_VERSION to latest. After a failed healthcheck, that can redeploy a different image than the one that was running before the update, defeating rollback correctness.

Useful? React with 👍 / 👎.

@rg4444 rg4444 merged commit c1fd5c3 into main May 23, 2026
12 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant