feat: human-in-the-loop review gate (hax approval before posting)#38
Merged
Conversation
The human-in-the-loop review gate uses the hax-sdk form-builder + agentfield's app.pause() primitive. Add hax-sdk>=0.2.4 and pin agentfield>=0.1.84 (which has pause()/ApprovalResult). Mirror the change in the Dockerfile's explicit pip list since it installs the package with --no-deps, so that list — not pyproject — is what the runtime image actually resolves. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Ports SWE-AF's plan-phase approval pattern for PR reviews: - client.py: hax client builder (HAX_API_KEY/HAX_SDK_URL on/off switch), control-plane webhook resolver, watchdog-safe create_request wrapper (asyncio.wait_for over a worker thread, per SWE-AF's pause-watchdog lesson), and the ApprovalResult value extractor. - review_gate.py: builds a hax form-builder request (PR-intent blurb + one checkbox per finding + action radio + instructions textarea), pauses the workflow via app.pause(), and parses the response into a ReviewDecision (post_selected / rerun / reject). Terminal outcomes (expired/error, or a failed create/pause) default to reject so an unreviewed review is never posted. Reuses hax's generic form-builder type, so no external hax frontend template is needed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Enables the review gate when HAX_API_KEY is set (same trigger as SWE-AF). Reads only the env vars SWE-AF already uses (HAX_API_KEY, HAX_SDK_URL, AGENTFIELD_SERVER, AGENTFIELD_APPROVAL_USER_ID); expiry (72h) and max re-review revisions (2) are plain config defaults matching SWE-AF's BuildConfig, not new PR-AF-specific env var names. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Restructure run() so intake + anatomy run once, then wrap the finding-producing phases (meta-selectors -> synthesis) in a revision loop. When HITL is enabled (HAX_API_KEY set + a real PR), the gate runs after synthesis: - post_selected -> filter findings to the approved subset and post; - rerun -> re-run the review phases with the reviewer's instructions threaded in, capped at max_review_revisions; - reject / expire / error / revision-cap -> post nothing (public-repo safety). _generate_output gains a post_to_github flag so the no-post path still builds a full ReviewResult for observability. When HITL is off, behavior is unchanged. Reviewer feedback is threaded into dimension selection (the three meta lenses, via _build_meta_context) and the reviewer prompt (review_dimension), so a re-review actually respects guidance like "tone it down". Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Maps to the validation contract: hax client on/off switch, form shape (one checkbox per finding + action + textarea), decision parsing for every action and terminal outcome, watchdog fast-fail, and the end-to-end run() control flow (HITL off posts directly; approve-subset posts only selected; rerun re-runs with feedback then posts; reject/cap post nothing). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an optional human-in-the-loop (HITL) approval gate before PR-AF posts a review, mirroring the plan-phase approval SWE-AF already uses. When the hax env vars are configured, PR-AF no longer posts directly: it summarizes the review, sends a hax form to a workspace member, and pauses until they decide. With the env vars unset, behavior is unchanged (direct post).
Motivation: running PR-AF on a public repo where every PR triggers a review risks spammy/over-aggressive output. The gate lets a maintainer curate before anything lands on the PR.
Reviewer experience
The hax form shows a short blurb of the PR's intent plus every finding as a checkbox. The reviewer can:
First responder wins (a single hax callback resolves the pause).
How it works (reuses existing primitives)
app.pause()/ApprovalResultfrom the agentfield SDK (same primitive SWE-AF's plan gate uses).form-builderrequest type +FormBuilder— no new hax frontend template needed.create_requestwrapper (asyncio.wait_forover a worker thread), porting SWE-AF's pause-watchdog lesson so a wedged hax-sdk fails fast instead of burning the reasoner budget.Env vars (consistent with SWE-AF — no new names)
Reads only what SWE-AF's HITL path already reads:
HAX_API_KEY(on/off switch),HAX_SDK_URL,AGENTFIELD_SERVER,AGENTFIELD_APPROVAL_USER_ID. Expiry (72h) and max re-review revisions (2) are plain config defaults matching SWE-AF'sBuildConfig, not new env vars.Changes
src/pr_af/hitl/— new module (hax client, watchdog-safe create wrapper, form builder, decision parsing, the gate).src/pr_af/config.py—HITLConfig.src/pr_af/orchestrator.py—run()restructured into a revision loop; gate before posting;_generate_outputgains apost_to_githubflag.src/pr_af/reasoners/harnesses.py—reviewer_feedbackthreaded into the 3 meta lenses +review_dimensionso re-runs respect the guidance.Dockerfile/pyproject.toml— addhax-sdk>=0.2.4, pinagentfield>=0.1.84(also in the Dockerfile's explicit pip list, which uses--no-deps).Test plan
ruff check src/ scripts/cleandocker buildsucceeds; image ships hax 0.2.4 + agentfield,pr_af.hitlimportsFormBuilder+ real agentfieldApprovalResultfor all five decision pathswaiting)pr-af.reviewdoes not trip the caller's parentagent_call_timeout(same cross-reasoner pause propagation that already covers SWE-AF's plan gate)🤖 Generated with Claude Code