Skip to content

Add goal-based validation gate to AgenticPhase tasks#23840

Merged
luisorofino merged 10 commits into
loa/openmetrics-ai-genfrom
loa/goal-task
Jun 1, 2026
Merged

Add goal-based validation gate to AgenticPhase tasks#23840
luisorofino merged 10 commits into
loa/openmetrics-ai-genfrom
loa/goal-task

Conversation

@luisorofino
Copy link
Copy Markdown
Contributor

@luisorofino luisorofino commented May 26, 2026

What does this PR do?

Adds an optional, non-deterministic validation gate to AgenticPhase tasks. When a task declares a goal (or goal_path), a fresh independent reviewer agent runs after the worker finishes and checks whether the goal was met. If the check fails, the worker gets one retry; this repeats up to max_goal_attempts total reviewer runs (default: 5). On exhaustion the phase raises GoalAttemptsExhausted, which flows through the existing Phase.on_error path. Tasks without a goal are unaffected.

Key design decisions:

  • Worker is blind to the goal. It only receives a generic suffix ("your work will be checked by an independent reviewer") appended to its task prompt. It learns the specific criterion only when a check fails, and only sees the reviewer's reason.
  • Reviewer is fresh every attempt. History is reset between attempts; it never sees prior reviewer turns.
  • Reviewer is read-only. It gets only the read_only=True subset of the parent agent's tools. ToolSpec gains a read_only flag for this, along with a filter_read_only() helper.
  • Reviewer uses the parent's provider with default model/max_tokens. No user-visible knobs; overrides declared on the parent AgentConfig are intentionally not forwarded.
  • Reviewer cannot spawn subagents. spawn_subagent is not a read-only tool, so it is filtered out automatically.

Files changed:

  • phases/config.pyTaskConfig gains goal, goal_path, and max_goal_attempts fields with validators.
  • tools/registry.pyToolSpec gains read_only: bool; all manifest entries are explicitly annotated; new filter_read_only() helper.
  • phases/goal.py (new) — Reviewer system prompt, exceptions, helper functions, and run_goal_loop().
  • agent/build.pybuild_goal_agent() and make_goal_agent_builder().
  • phases/agentic_phase.pygoal_agent_builder param, _compact_if_needed() helper, goal loop integration in run_tasks(), goal_validations surfaced in the success checkpoint.
  • callbacks/callbacks.pyOnBeforeGoalCheckCallback and OnAfterGoalCheckCallback with matching fire_* methods on CallbackSet and Callbacks.

Motivation

Agentic pipelines produce output that is difficult to verify deterministically. A lightweight, independent reviewer pass — run against the same files the worker produced — catches systematic gaps (missing tests, incomplete implementations, wrong output format) before the phase is considered complete, without requiring the worker to self-evaluate against the goal criterion.

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add qa/required if this PR needs QA validation, or qa/skip-qa if it does not. Exactly one of the two is required.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

@luisorofino luisorofino added the qa/skip-qa Automatically skip this PR for the next QA label May 26, 2026
@dd-octo-sts dd-octo-sts Bot added the ddev label May 26, 2026
@datadog-datadog-prod-us1
Copy link
Copy Markdown
Contributor

datadog-datadog-prod-us1 Bot commented May 26, 2026

Pipelines  Tests

Fix all issues with BitsAI

⚠️ Warnings

🚦 2 Pipeline jobs failed

PR All | test / j46da136 / JBoss_WildFly   View in Datadog   GitHub Actions

🔄 Retry job. This looks flaky and may succeed on retry. Could not resolve: ddintegrations.blob.core.windows.net:443

PR All Windows | test / j662406b / IBM MQ on Windows   View in Datadog   GitHub Actions

🔄 Retry job. This looks flaky and may succeed on retry. Failed to resolve hostname 'ddintegrations.blob.core.windows.net'.

🧪 20 Tests failed in 1 job

PR All | run   GitHub Actions

test_bulk_table from test_check.py   View in Datadog (Fix with Cursor)
HTTPSConnectionPool(host=&#39;ddintegrations.blob.core.windows.net&#39;, port=443): Max retries exceeded with url: /snmp/cisco-3850.snmprec (Caused by NameResolutionError(&#34;HTTPSConnection(host=&#39;ddintegrations.blob.core.windows.net&#39;, port=443): Failed to resolve &#39;ddintegrations.blob.core.windows.net&#39; ([Errno -2] Name or service not known)&#34;))
test_cast_metrics from test_check.py   View in Datadog (Fix with Cursor)
HTTPSConnectionPool(host=&#39;ddintegrations.blob.core.windows.net&#39;, port=443): Max retries exceeded with url: /snmp/cisco-3850.snmprec (Caused by NameResolutionError(&#34;HTTPSConnection(host=&#39;ddintegrations.blob.core.windows.net&#39;, port=443): Failed to resolve &#39;ddintegrations.blob.core.windows.net&#39; ([Errno -2] Name or service not known)&#34;))

View all 20 test failures

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 1fa7c91 | Docs | Datadog PR Page | Give us feedback!

@luisorofino luisorofino changed the title Add goal option to phase task Add goal-based validation gate to AgenticPhase tasks May 26, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.04%. Comparing base (c6bddc5) to head (70da5e0).

Additional details and impacted files
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@luisorofino
Copy link
Copy Markdown
Contributor Author

@codex

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 956672a0ae

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread ddev/src/ddev/ai/phases/goal.py
@luisorofino luisorofino marked this pull request as ready for review May 27, 2026 13:18
@luisorofino luisorofino requested a review from a team as a code owner May 27, 2026 13:18
Base automatically changed from loa/subagent-tool to loa/openmetrics-ai-gen June 1, 2026 08:31
@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented Jun 1, 2026

Validation Report

All 21 validations passed.

Show details
Validation Description Status
agent-reqs Verify check versions match the Agent requirements file
ci Validate CI configuration and code coverage settings
codeowners Validate every integration has a CODEOWNERS entry
config Validate default configuration files against spec.yaml
dep Verify dependency pins are consistent and Agent-compatible
http Validate integrations use the HTTP wrapper correctly
imports Validate check imports do not use deprecated modules
integration-style Validate check code style conventions
jmx-metrics Validate JMX metrics definition files and config
labeler Validate PR labeler config matches integration directories
legacy-signature Validate no integration uses the legacy Agent check signature
license-headers Validate Python files have proper license headers
licenses Validate third-party license attribution list
metadata Validate metadata.csv metric definitions
models Validate configuration data models match spec.yaml
openmetrics Validate OpenMetrics integrations disable the metric limit
package Validate Python package metadata and naming
qa-label Validate the pull request declares whether it needs QA for the next Agent release
readmes Validate README files have required sections
saved-views Validate saved view JSON file structure and fields
version Validate version consistency between package and changelog

View full run

@luisorofino luisorofino merged commit a0405e1 into loa/openmetrics-ai-gen Jun 1, 2026
323 of 328 checks passed
@luisorofino luisorofino deleted the loa/goal-task branch June 1, 2026 08:45
luisorofino added a commit that referenced this pull request Jun 1, 2026
* Add goal option to phase task

* Little nits

* Write goal_attempt_log in checkpoints even if the validation failed

* Count tokens when GoalParseError is raised

* Fix token counting

* Clear the try/catch in run_goal_loop

* Hoist imports in agent/build

* Improve logging test

* Add pattern validation to task name

* Introduce FlowServices to group shared phase dependencies (#23858)

* Add PipelineContext to simplify args

* Rename to FlowServices
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ddev qa/skip-qa Automatically skip this PR for the next QA team/agent-integrations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants