Skip to content

fix(crane): require up-to-date PR checks before completion#118

Merged
mrjf merged 2 commits into
mainfrom
codex/strict-crane-completion-gate
Jun 11, 2026
Merged

fix(crane): require up-to-date PR checks before completion#118
mrjf merged 2 commits into
mainfrom
codex/strict-crane-completion-gate

Conversation

@mrjf

@mrjf mrjf commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

fix(crane): require up-to-date PR checks before completion

TL;DR

This tightens Crane’s completion gate so green checks count only after the PR head contains the current base branch SHA. The scheduler now verifies base_sha...head_sha through GitHub’s compare API before it evaluates check-runs, and the Crane workflow prompt now teaches the same gate. That closes the stale-check path that let a migration be marked complete after stricter gates landed on main.

Important

This PR is about the completion mechanism, not claiming the Go migration itself is complete.

Problem (WHY)

  • Issue Crane Migration: Python to Go -- Full APM CLI Rewrite #78 says, “Crane may mark this migration complete only when every gate is true,” but a PR could still complete from checks that were green before the newest base-side gate existed.
  • get_pr_head_check_gate() treated “all checks on the PR head are success” as sufficient without proving that the PR head contained the current base SHA.
  • [!] .github/workflows/crane.md used the weaker pr-head-checks wording, so the agent prompt and the scheduler shared the same stale-base blind spot.

Why these matter: deterministic completion has to be evidence about the code that would actually be merged. A successful check set on an old PR head is not deletion-grade proof after main changes underneath it.

Approach (WHAT)

# Fix
1 Add an explicit base-containment check before PR-head checks can pass.
2 Return a hard stale-base:<head>:<status>:base:<base> failure when GitHub reports behind, diverged, unknown, or any non-current comparison status.
3 Update the Crane workflow instructions and state template from pr-head-checks to up-to-date-pr-head-checks.
4 Add regression tests for stale-base completion and prompt-level gate wording.
5 Recompile .github/workflows/crane.lock.yml with the same v0.74.4 gh-aw compiler used by CI’s lock check.

Implementation (HOW)

  • .github/workflows/scripts/crane_scheduler.pyget_pr_head_check_gate() now reads both PR head.sha and base.sha, calls /compare/{base_sha}...{head_sha}, and refuses to pass unless the comparison is ahead or identical.
  • .github/workflows/crane.md — the halting condition now says completion requires an up-to-date PR head, explicitly calls out the compare API, and forbids completion from checks produced before the current base SHA was included.
  • tests/unit/test_crane_scheduler.py — existing check-gate tests now provide a base SHA and compare response; a new regression test proves a diverged PR fails with stale-base.
  • tests/unit/test_crane_workflow_prompt.py — prompt assertions now require up-to-date-pr-head-checks and the “current base branch SHA” language.
  • .github/workflows/crane.lock.yml / .github/aw/actions-lock.json — regenerated with gh-aw v0.74.4 after editing the workflow source.

Diagrams

Legend: the highlighted region is the new freshness check Crane must pass before check-runs can certify completion.

sequenceDiagram
    participant C as Crane scheduler
    participant P as GitHub PR API
    participant X as GitHub compare API
    participant R as Check-runs API

    C->>P: fetch PR head and base
    rect rgb(255, 247, 200)
        C->>X: compare base_sha...head_sha
        X-->>C: ahead or identical
        Note over C,X: NEW: stale, behind, diverged, or unavailable fails gate
    end
    C->>R: fetch check-runs for head_sha
    R-->>C: check conclusions
    C-->>C: complete only when all checks are success
Loading

Trade-offs

  • Use compare status instead of timestamps. Chose GitHub’s merge-base comparison; rejected check timestamp heuristics because a later check is still not proof that the branch contains the current base.
  • Treat unavailable compare data as not complete. Chose conservative None/failure behavior; rejected optimistic completion because this path controls issue labels and migration finality.
  • Match CI’s compiler instead of the newest local compiler. Chose gh-aw v0.74.4 because .github/workflows/gh-aw-lock-check.yml pins that version; rejected newer local output because CI would mark it stale.

Benefits

  1. A stale Crane PR can no longer add crane-completed just because old head checks passed.
  2. Scheduler behavior and workflow prompt guidance now describe the same deterministic gate.
  3. The stale-base regression has a focused unit test that would have caught the failure mode.
  4. The workflow lock is regenerated from source with the same compiler CI uses, so CI’s lock check has the updated Crane prompt.

Validation

/tmp/gh-aw-v0.74.4/darwin-arm64 --version && /tmp/gh-aw-v0.74.4/darwin-arm64 compile --no-check-update:

gh aw version v0.74.4
✓ .github/workflows/crane.md (122.1 KB)
✓ Compiled 1 workflow(s): 0 error(s), 0 warning(s)

uv run pytest tests/unit/test_crane_scheduler.py tests/unit/test_crane_workflow_prompt.py -q:

..................                                                       [100%]
18 passed in 0.40s

uv run --extra dev ruff check src/ tests/ && uv run --extra dev ruff format --check src/ tests/ && uv run --extra dev python -m pylint --disable=all --enable=R0801 --min-similarity-lines=10 --fail-on=R0801 src/apm_cli/ && bash scripts/lint-auth-signals.sh:

All checks passed!
1042 files already formatted

------------------------------------
Your code has been rated at 10.00/10

[*] Rule A: get_bearer_provider boundary (any reference)
[*] Rule B: git ls-remote auth-delegated annotation
[+] auth-signal lint clean

uv run python -m py_compile .github/workflows/scripts/crane_scheduler.py:

# no output

git diff --check:

# no output

npx --yes -p @mermaid-js/mermaid-cli mmdc -i /tmp/apm-mermaid-check/diag1.mmd -o /tmp/apm-mermaid-check/diag1.svg --quiet:

# no output

Scenario Evidence

# Scenario (user promise) Principle(s) Test(s) proving it Type
1 A Crane completion candidate on a PR that does not contain the current base must stay active instead of being marked complete. Governed by policy, DevX tests/unit/test_crane_scheduler.py::test_pr_head_gate_fails_when_pr_head_does_not_contain_current_base unit
2 A Crane completion candidate whose PR head is current and whose checks all passed can still complete. Governed by policy, DevX tests/unit/test_crane_scheduler.py::test_pr_head_gate_passes_only_when_all_checks_succeed unit
3 Crane’s prompt must instruct agents to use the up-to-date PR-head gate, not the old PR-head-only gate. Governed by policy, OSS / community-driven tests/unit/test_crane_workflow_prompt.py::test_crane_completion_is_two_phase_and_pr_head_gated unit

How to test

  • Run uv run pytest tests/unit/test_crane_scheduler.py tests/unit/test_crane_workflow_prompt.py -q and observe 18 passed.
  • Run the gh-aw v0.74.4 compiler with compile --no-check-update and observe .github/workflows/crane.md compiles with zero errors.
  • Run the repo lint mirror from .apm/instructions/linting.instructions.md and observe ruff, format, duplication, and auth-signal checks pass.
  • Review test_pr_head_gate_fails_when_pr_head_does_not_contain_current_base and confirm a diverged compare response returns stale-base.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

@mrjf mrjf merged commit 9686d17 into main Jun 11, 2026
14 checks passed
@mrjf mrjf deleted the codex/strict-crane-completion-gate branch June 11, 2026 02:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant