Add goal-based validation gate to AgenticPhase tasks by luisorofino · Pull Request #23840 · DataDog/integrations-core

luisorofino · 2026-05-26T15:35:43Z

What does this PR do?

Adds an optional, non-deterministic validation gate to AgenticPhase tasks. When a task declares a goal (or goal_path), a fresh independent reviewer agent runs after the worker finishes and checks whether the goal was met. If the check fails, the worker gets one retry; this repeats up to max_goal_attempts total reviewer runs (default: 5). On exhaustion the phase raises GoalAttemptsExhausted, which flows through the existing Phase.on_error path. Tasks without a goal are unaffected.

Key design decisions:

Worker is blind to the goal. It only receives a generic suffix ("your work will be checked by an independent reviewer") appended to its task prompt. It learns the specific criterion only when a check fails, and only sees the reviewer's reason.
Reviewer is fresh every attempt. History is reset between attempts; it never sees prior reviewer turns.
Reviewer is read-only. It gets only the read_only=True subset of the parent agent's tools. ToolSpec gains a read_only flag for this, along with a filter_read_only() helper.
Reviewer uses the parent's provider with default model/max_tokens. No user-visible knobs; overrides declared on the parent AgentConfig are intentionally not forwarded.
Reviewer cannot spawn subagents. spawn_subagent is not a read-only tool, so it is filtered out automatically.

Files changed:

phases/config.py — TaskConfig gains goal, goal_path, and max_goal_attempts fields with validators.
tools/registry.py — ToolSpec gains read_only: bool; all manifest entries are explicitly annotated; new filter_read_only() helper.
phases/goal.py (new) — Reviewer system prompt, exceptions, helper functions, and run_goal_loop().
agent/build.py — build_goal_agent() and make_goal_agent_builder().
phases/agentic_phase.py — goal_agent_builder param, _compact_if_needed() helper, goal loop integration in run_tasks(), goal_validations surfaced in the success checkpoint.
callbacks/callbacks.py — OnBeforeGoalCheckCallback and OnAfterGoalCheckCallback with matching fire_* methods on CallbackSet and Callbacks.

Motivation

Agentic pipelines produce output that is difficult to verify deterministically. A lightweight, independent reviewer pass — run against the same files the worker produced — catches systematic gaps (missing tests, incomplete implementations, wrong output format) before the phase is considered complete, without requiring the worker to self-evaluate against the goal criterion.

Review checklist (to be filled by reviewers)

Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
Add qa/required if this PR needs QA validation, or qa/skip-qa if it does not. Exactly one of the two is required.
If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

datadog-datadog-prod-us1 · 2026-05-26T15:36:54Z

✨ Fix all issues with BitsAI

⚠️ Warnings

🚦 2 Pipeline jobs failed

PR All | test / j46da136 / JBoss_WildFly

🔄 Retry job. This looks flaky and may succeed on retry.
Could not resolve: ddintegrations.blob.core.windows.net:443

PR All Windows | test / j662406b / IBM MQ on Windows

🔄 Retry job. This looks flaky and may succeed on retry.
Failed to resolve hostname 'ddintegrations.blob.core.windows.net'.

🧪 20 Tests failed in 1 job

PR All | run

test_bulk_table from test_check.py

(Fix with Cursor)

HTTPSConnectionPool(host=&#39;ddintegrations.blob.core.windows.net&#39;, port=443): Max retries exceeded with url: /snmp/cisco-3850.snmprec (Caused by NameResolutionError(&#34;HTTPSConnection(host=&#39;ddintegrations.blob.core.windows.net&#39;, port=443): Failed to resolve &#39;ddintegrations.blob.core.windows.net&#39; ([Errno -2] Name or service not known)&#34;))

test_cast_metrics from test_check.py

(Fix with Cursor)

HTTPSConnectionPool(host=&#39;ddintegrations.blob.core.windows.net&#39;, port=443): Max retries exceeded with url: /snmp/cisco-3850.snmprec (Caused by NameResolutionError(&#34;HTTPSConnection(host=&#39;ddintegrations.blob.core.windows.net&#39;, port=443): Failed to resolve &#39;ddintegrations.blob.core.windows.net&#39; ([Errno -2] Name or service not known)&#34;))

View all 20 test failures

Useful? React with 👍 / 👎

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 1fa7c91 | Docs | Datadog PR Page | Give us feedback!}

codecov · 2026-05-26T16:45:32Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.04%. Comparing base (c6bddc5) to head (70da5e0).

Additional details and impacted files

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

luisorofino · 2026-05-27T08:41:21Z

@codex

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 956672a0ae

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

* Add PipelineContext to simplify args * Rename to FlowServices

dd-octo-sts · 2026-06-01T08:44:57Z

Validation Report

All 21 validations passed.

Show details

Validation	Description	Status
`agent-reqs`	Verify check versions match the Agent requirements file	✅
`ci`	Validate CI configuration and code coverage settings	✅
`codeowners`	Validate every integration has a CODEOWNERS entry	✅
`config`	Validate default configuration files against spec.yaml	✅
`dep`	Verify dependency pins are consistent and Agent-compatible	✅
`http`	Validate integrations use the HTTP wrapper correctly	✅
`imports`	Validate check imports do not use deprecated modules	✅
`integration-style`	Validate check code style conventions	✅
`jmx-metrics`	Validate JMX metrics definition files and config	✅
`labeler`	Validate PR labeler config matches integration directories	✅
`legacy-signature`	Validate no integration uses the legacy Agent check signature	✅
`license-headers`	Validate Python files have proper license headers	✅
`licenses`	Validate third-party license attribution list	✅
`metadata`	Validate metadata.csv metric definitions	✅
`models`	Validate configuration data models match spec.yaml	✅
`openmetrics`	Validate OpenMetrics integrations disable the metric limit	✅
`package`	Validate Python package metadata and naming	✅
`qa-label`	Validate the pull request declares whether it needs QA for the next Agent release	✅
`readmes`	Validate README files have required sections	✅
`saved-views`	Validate saved view JSON file structure and fields	✅
`version`	Validate version consistency between package and changelog	✅

View full run

* Add goal option to phase task * Little nits * Write goal_attempt_log in checkpoints even if the validation failed * Count tokens when GoalParseError is raised * Fix token counting * Clear the try/catch in run_goal_loop * Hoist imports in agent/build * Improve logging test * Add pattern validation to task name * Introduce FlowServices to group shared phase dependencies (#23858) * Add PipelineContext to simplify args * Rename to FlowServices

luisorofino added the qa/skip-qa Automatically skip this PR for the next QA label May 26, 2026

dd-octo-sts Bot added the ddev label May 26, 2026

luisorofino changed the title ~~Add goal option to phase task~~ Add goal-based validation gate to AgenticPhase tasks May 26, 2026

luisorofino force-pushed the loa/goal-task branch from eab3f8d to 11dba42 Compare May 26, 2026 15:48

chatgpt-codex-connector Bot reviewed May 27, 2026

View reviewed changes

Comment thread ddev/src/ddev/ai/phases/goal.py

luisorofino marked this pull request as ready for review May 27, 2026 13:18

luisorofino requested a review from a team as a code owner May 27, 2026 13:18

dd-octo-sts Bot added the team/agent-integrations label May 27, 2026

luisorofino force-pushed the loa/subagent-tool branch from c6bddc5 to a5757e5 Compare May 29, 2026 08:27

luisorofino force-pushed the loa/goal-task branch from 70da5e0 to a4543d3 Compare May 29, 2026 08:27

This was referenced May 29, 2026

Add spawn_subagent tool #23795

Closed

Phase 0 for OpenMetrics flow: Inspect Endpoint #23848

Merged

lucia-sb approved these changes Jun 1, 2026

View reviewed changes

Base automatically changed from loa/subagent-tool to loa/openmetrics-ai-gen June 1, 2026 08:31

luisorofino added 10 commits June 1, 2026 10:42

Add goal option to phase task

f613897

Little nits

776f710

Write goal_attempt_log in checkpoints even if the validation failed

9e508fb

Count tokens when GoalParseError is raised

f321c46

Fix token counting

59a5b0d

Clear the try/catch in run_goal_loop

6e27a89

Hoist imports in agent/build

3e710c3

Improve logging test

09325f2

Add pattern validation to task name

895ed9d

Introduce FlowServices to group shared phase dependencies (#23858)

1fa7c91

* Add PipelineContext to simplify args * Rename to FlowServices

luisorofino force-pushed the loa/goal-task branch from a4543d3 to 1fa7c91 Compare June 1, 2026 08:43

luisorofino merged commit a0405e1 into loa/openmetrics-ai-gen Jun 1, 2026
323 of 328 checks passed

luisorofino deleted the loa/goal-task branch June 1, 2026 08:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add goal-based validation gate to AgenticPhase tasks#23840

Add goal-based validation gate to AgenticPhase tasks#23840
luisorofino merged 10 commits into
loa/openmetrics-ai-genfrom
loa/goal-task

luisorofino commented May 26, 2026 •

edited

Loading

Uh oh!

datadog-datadog-prod-us1 Bot commented May 26, 2026 •

edited by datadog-official Bot

Loading

Uh oh!

codecov Bot commented May 26, 2026 •

edited

Loading

Uh oh!

luisorofino commented May 27, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

dd-octo-sts Bot commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

luisorofino commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Motivation

Review checklist (to be filled by reviewers)

Uh oh!

datadog-datadog-prod-us1 Bot commented May 26, 2026 • edited by datadog-official Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Warnings

Uh oh!

codecov Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

luisorofino commented May 27, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

dd-octo-sts Bot commented Jun 1, 2026

Validation Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

luisorofino commented May 26, 2026 •

edited

Loading

datadog-datadog-prod-us1 Bot commented May 26, 2026 •

edited by datadog-official Bot

Loading

codecov Bot commented May 26, 2026 •

edited

Loading