Add goal-based validation gate to AgenticPhase tasks#23840
Conversation
|
eab3f8d to
11dba42
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 956672a0ae
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
c6bddc5 to
a5757e5
Compare
70da5e0 to
a4543d3
Compare
* Add PipelineContext to simplify args * Rename to FlowServices
a4543d3 to
1fa7c91
Compare
Validation ReportAll 21 validations passed. Show details
|
* Add goal option to phase task * Little nits * Write goal_attempt_log in checkpoints even if the validation failed * Count tokens when GoalParseError is raised * Fix token counting * Clear the try/catch in run_goal_loop * Hoist imports in agent/build * Improve logging test * Add pattern validation to task name * Introduce FlowServices to group shared phase dependencies (#23858) * Add PipelineContext to simplify args * Rename to FlowServices
What does this PR do?
Adds an optional, non-deterministic validation gate to
AgenticPhasetasks. When a task declares agoal(orgoal_path), a fresh independent reviewer agent runs after the worker finishes and checks whether the goal was met. If the check fails, the worker gets one retry; this repeats up tomax_goal_attemptstotal reviewer runs (default: 5). On exhaustion the phase raisesGoalAttemptsExhausted, which flows through the existingPhase.on_errorpath. Tasks without agoalare unaffected.Key design decisions:
read_only=Truesubset of the parent agent's tools.ToolSpecgains aread_onlyflag for this, along with afilter_read_only()helper.AgentConfigare intentionally not forwarded.spawn_subagentis not a read-only tool, so it is filtered out automatically.Files changed:
phases/config.py—TaskConfiggainsgoal,goal_path, andmax_goal_attemptsfields with validators.tools/registry.py—ToolSpecgainsread_only: bool; all manifest entries are explicitly annotated; newfilter_read_only()helper.phases/goal.py(new) — Reviewer system prompt, exceptions, helper functions, andrun_goal_loop().agent/build.py—build_goal_agent()andmake_goal_agent_builder().phases/agentic_phase.py—goal_agent_builderparam,_compact_if_needed()helper, goal loop integration inrun_tasks(),goal_validationssurfaced in the success checkpoint.callbacks/callbacks.py—OnBeforeGoalCheckCallbackandOnAfterGoalCheckCallbackwith matchingfire_*methods onCallbackSetandCallbacks.Motivation
Agentic pipelines produce output that is difficult to verify deterministically. A lightweight, independent reviewer pass — run against the same files the worker produced — catches systematic gaps (missing tests, incomplete implementations, wrong output format) before the phase is considered complete, without requiring the worker to self-evaluate against the goal criterion.
Review checklist (to be filled by reviewers)
qa/requiredif this PR needs QA validation, orqa/skip-qaif it does not. Exactly one of the two is required.backport/<branch-name>label to the PR and it will automatically open a backport PR once this one is merged