Skip to content

Add spawn_subagent tool#23795

Closed
luisorofino wants to merge 44 commits into
loa/openmetrics-ai-genfrom
loa/subagent-tool
Closed

Add spawn_subagent tool#23795
luisorofino wants to merge 44 commits into
loa/openmetrics-ai-genfrom
loa/subagent-tool

Conversation

@luisorofino
Copy link
Copy Markdown
Contributor

@luisorofino luisorofino commented May 21, 2026

What does this PR do?

Adds a spawn_subagent tool to the AI framework that lets a parent agent delegate a self-contained subtask to a fresh subagent. The parent supplies a system prompt, a user prompt, and an optional subset of tools. The subagent runs one ReAct loop and returns only its final assistant message — no history, no file-read state, and no metadata bleed through.

Supporting changes:

  • callbacks/file_logger.py — new FileLogger: an append-mode JSONL writer that records every ReAct event (agent responses, tool calls, compaction, errors) for a subagent run. Opening the same file on a phase rerun appends a new run's events, with the start event as the separator. Log directory is created lazily on first tool invocation so phases that declare spawn_subagent but never call it leave no empty folders.

  • agent/build.py — adds SubagentBuilder and updates AgentBuilder to a 4-arg callable (system_prompt, owner_id, subagent_builder, log_dir) so the subagent deps are passed at call time rather than baked into the closure. Adds build_subagent and make_subagent_builder. Extracts _build_agent_and_registry as a shared private helper used by both build_agent and build_subagent.

  • tools/registry.py — extends ToolContext with allowed_tool_names, subagent_builder, and log_dir; adds _spawn_subagent_factory and registers spawn_subagent in TOOL_MANIFEST.

  • phases/checkpoint.py — adds CheckpointManager.root as the canonical reference for the checkpoint directory; _ensure_dir and memory_path use it instead of self._path.parent.

  • phases/base.py — relaxes Phase.extra_init_kwargs to **kwargs so subclasses can declare only the deps they need without breaking the signature contract.

  • phases/agentic_phase.pyextra_init_kwargs constructs a SubagentBuilder closure when spawn_subagent is in the agent's tool list; _build_agent_and_process computes the per-phase log directory path at execute time and passes it through the AgentBuilder.

Motivation

A single agent context window has hard limits — both in tokens and in the scope of a single reasoning thread. spawn_subagent lets the parent offload parallelisable or isolated subtasks (e.g. "read and summarise this file", "run this validation and return the result") to fresh agents that start from a clean slate. Operator visibility into subagent runs is provided entirely through the JSONL log; the parent only ever sees the final text.

Key design invariants:

  • No recursion: build_subagent never forwards subagent_builder/log_dir, so a subagent that lists spawn_subagent in its tool set raises at registry construction time.
  • Subset enforcement: the tool validates that requested tool names are a subset of the parent's allowed tools before opening any log file or advancing the call counter.
  • Fail loudly, not silently: a failed mkdir or builder error surfaces as a ToolResult(success=False) with a message that includes the subagent label, so the parent agent can reason about which child failed when spawning in parallel.

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add qa/required if this PR needs QA validation, or qa/skip-qa if it does not. Exactly one of the two is required.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

@dd-octo-sts dd-octo-sts Bot added the ddev label May 21, 2026
@luisorofino luisorofino changed the title Implement spawn subagent tool ddev/ai: Add spawn_subagent tool May 21, 2026
@luisorofino luisorofino changed the title ddev/ai: Add spawn_subagent tool Add spawn_subagent tool May 21, 2026
@datadog-datadog-prod-us1

This comment has been minimized.

Base automatically changed from loa/agent-phase-refactor to loa/openmetrics-ai-gen May 21, 2026 15:46
@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

❌ Patch coverage is 97.94521% with 12 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (loa/openmetrics-ai-gen@4844389). Learn more about missing BASE report.

Additional details and impacted files
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@luisorofino luisorofino added the qa/skip-qa Automatically skip this PR for the next QA label May 22, 2026
@luisorofino luisorofino marked this pull request as ready for review May 22, 2026 12:51
@luisorofino luisorofino requested a review from a team as a code owner May 22, 2026 12:51
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4335328750

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread ddev/src/ddev/ai/agent/build.py Outdated
@luisorofino
Copy link
Copy Markdown
Contributor Author

@codex

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4a3636ddef

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread ddev/src/ddev/ai/tools/agents/spawn_subagent.py Outdated
Comment thread ddev/src/ddev/ai/tools/agents/spawn_subagent.py
* (docs) Style fixes for n8n integration

* Fix typo

* tiny edits
@luisorofino luisorofino force-pushed the loa/openmetrics-ai-gen branch from 713d4f8 to 4844389 Compare May 26, 2026 15:29
@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented May 26, 2026

Validation Report

All 21 validations passed.

Show details
Validation Description Status
agent-reqs Verify check versions match the Agent requirements file
ci Validate CI configuration and Codecov settings
codeowners Validate every integration has a CODEOWNERS entry
config Validate default configuration files against spec.yaml
dep Verify dependency pins are consistent and Agent-compatible
http Validate integrations use the HTTP wrapper correctly
imports Validate check imports do not use deprecated modules
integration-style Validate check code style conventions
jmx-metrics Validate JMX metrics definition files and config
labeler Validate PR labeler config matches integration directories
legacy-signature Validate no integration uses the legacy Agent check signature
license-headers Validate Python files have proper license headers
licenses Validate third-party license attribution list
metadata Validate metadata.csv metric definitions
models Validate configuration data models match spec.yaml
openmetrics Validate OpenMetrics integrations disable the metric limit
package Validate Python package metadata and naming
qa-label Validate the pull request declares whether it needs QA for the next Agent release
readmes Validate README files have required sections
saved-views Validate saved view JSON file structure and fields
version Validate version consistency between package and changelog

View full run

AAraKKe and others added 12 commits May 26, 2026 17:51
…3813)

* Validate stale Agent release requirements

* Skip unreleased integrations in Agent release output

* Tighten unreleased-integrations parsing and broaden tests

- Validate that by-agent-version-range keys contain the '..' separator and raise a clear ValueError naming the offending key.
- Drop the redundant else branch in exclude_unreleased_integrations and move the historical folder-name comment back next to normalize_catalog.
- mkdir(exist_ok=True) in the write_repo_config helper for consistency with neighbours.
- Parametrize test_agent_version_in_range_is_inclusive (now covers below/above bounds and the malformed-range error path).
- Add a direct unit test on get_unreleased_integrations that exercises by-integration and by-agent-version-range together.
- Add a clean-pass test for validate agent-reqs and pull the set_root teardown into an isolated_root yield fixture.
- Add changelog entries for ddev and datadog_checks_dev.

* Surface malformed version ranges as a clean ddev abort

- Drop the redundant empty parent table header in .ddev/config.toml; the two sub-tables imply it.
- Catch ValueError from agent_version_in_range at every command entry point that triggers the lookup (integrations, changelog, integrations_changelog) and surface it via app.abort so config authors get a clean message instead of a Python traceback.
- Document that exclude_unreleased_integrations accepts both raw and folder-normalized catalog keys.

* Scope stale-entry detection to whole-file invocations

When the user runs `ddev validate agent-reqs <check>`, only the requested check should be validated; previously the new stale-entry detection still scanned the whole requirements file and surfaced unrelated stale packages, defeating per-check pre-commit usage.
* Show exact run URL and add lifecycle comment to wheel promotion

- Extend dispatch_workflow with return_run_details so callers can get back the
  new run's html_url instead of a generic recent-runs link.
- ddev dep promote now prints the exact workflow run URL and suppresses noisy
  httpx request logs around the API calls.
- Replace the single success comment in dependency-wheel-promotion.yaml with a
  lifecycle comment that updates on start, success, and failure, scoped per
  (PR, head SHA) via a hidden marker so re-dispatches edit the same comment.

* Harden lifecycle comment chaining and github-script inputs

- Started-comment step now references find_comment.outputs.comment-id (the
  previous version pointed at its own step output, so re-dispatches for the
  same SHA would not have updated the existing comment).
- Pass inputs.head_sha into actions/github-script via env: HEAD_SHA and read
  process.env.HEAD_SHA in the script body, so a hostile workflow_dispatch
  input cannot break out of the JS string literal and execute arbitrary code.

* Type-narrow dispatch_workflow and bail out cleanly on missing run details

- Add Literal[True]/Literal[False] overloads to GitHubManager.dispatch_workflow
  so callers asking for run details get a non-nullable dict back at the type
  level.
- Replace the bare assert in ddev dep promote with an explicit app.abort, run
  the validity check before printing the success message, and keep the success
  output inside the httpx-suppression scope.
- Add ddev/changelog.d/23828.added so the PR-changelog check passes for the
  ddev source changes.
- Lift the github credentials setup into ddev/tests/cli/dep/conftest.py as an
  autouse fixture, hoist the test-side logging import, and add coverage for the
  no-run-details abort path and the failure-path httpx level restoration.
- Match the cleaner api_post.call_args.kwargs form already used in the
  companion test in tests/utils/test_github.py.

* Trim runtime imports and share httpx-debug fixture across promote tests

- Move Any and Literal under TYPE_CHECKING in github.py; they are only used
  inside annotations that PEP 563 keeps as strings, so they have no runtime
  cost. The overload decorator stays at module scope because it runs at class
  definition time.
- Add an httpx_at_debug fixture in tests/cli/dep/conftest.py and use it from
  both httpx-suppression tests so the get-logger/set-DEBUG/restore boilerplate
  lives in one place.

* Type-annotate the new ddev/tests/cli/dep fixtures
…#23144)

* feat(downloader): add TUFPointerDownloader for v2 pointer-file format

The new agent-integrations-tuf pipeline produces TUF targets as JSON
pointer files (targets/<project>/<version>.json) rather than the old
HTML simple index + in-toto approach. This commit adds:

- TUFPointerDownloader in download_v2.py: TUF-verifies the pointer
  file, then fetches and sha256-verifies the wheel from S3.
- DigestMismatch exception for sha256/length failures.
- --format v2 CLI flag: routes through TUFPointerDownloader.
  --unsafe-disable-verification carries forward; --type and
  --ignore-python-version are no-ops in v2 with a warning.
- 8 offline unit tests covering happy path, missing target, digest
  mismatch, length mismatch, and disable_verification mode.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(downloader): use --repository URL for wheel fetch, not pointer's baked value

The pointer file always contains the prod S3 repository URL. When
validating staging, the caller passes --repository <staging-url> to
point at the staging bucket; that URL should be used for both the TUF
metadata fetch AND the wheel download, not just the metadata.

Adds a test that asserts the wheel is fetched from the caller-supplied
URL even when the pointer contains a different (prod) repository value.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(downloader): resolve latest via S3 listing, drop latest.json reliance

Replace the ``latest.json`` rolling pointer fetch with an S3
``ListObjectsV2`` walk over ``targets/<project>/``: filter keys to PEP 440
stable versions and pick the maximum.  The chosen version is then fetched
through TUF as before, so the pointer file the client trusts is still
cryptographically verified.

Why list S3 instead of parsing the signed targets metadata: once
``path_hash_prefixes`` delegations are in use, a client cannot tell from
metadata alone which delegation signs the latest version of a given
project.  Listing the bucket sidesteps that — TUF still authoritatively
verifies the chosen version's pointer.

The publisher counterpart in agent-integrations-tuf drops ``latest.json``
entirely; see DataDog/agent-integrations-tuf PR #9.

- ``_resolve_latest_version`` lists ``targets/<project>/`` via the S3
  REST API (no boto3 dep), parses the XML response, follows the
  continuation-token pagination, and applies a PEP 440 stable filter
- ``get_pointer(project, version=None)`` resolves ``version`` itself
  before delegating to the TUF Updater
- 6 new offline tests cover max-version selection, pre-release/dev
  filtering, post-release support, the no-stable error, paginated
  listings, and non-pointer key skipping

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Revert "refactor(downloader): resolve latest via S3 listing, drop latest.json reliance"

This reverts commit 70688d8.

* feat(downloader): bundle 1.root.json; rename --format to --index; drop --root-json

- Bundle metadata/root_history/1.root.json from agent-integrations-tuf
  as a package resource; TUFPointerDownloader loads it via
  importlib.resources — no TOFU, no --root-json flag needed
- Rename --format v2 to --index (boolean flag); v1 remains the default
  when --index is absent
- Remove trust_anchor parameter from TUFPointerDownloader.__init__
- Drop --format and --root-json from instantiate_downloader (v1 path)
- Register 1.root.json as a wheel artifact in pyproject.toml
- Update tests to match new interface

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(downloader): rename --index to --v2

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* feat(downloader): default to v2 with v1 fallback; add prod URL constant

Without any flag the downloader now attempts v2 (against the prod S3
bucket) and falls back to v1 on any failure, so callers get the new
format automatically without code changes. Passing --v2 explicitly keeps
the strict v2 path with no fallback (used by the pipeline's validate-
staging step).

V2_REPOSITORY_URL is the prod bucket constant used for the default
repository value in _download_v2(); callers can still override it with
--repository.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* feat(downloader): resolve hash-prefixed targets via N.targets.json

The v2 TUF repository uses consistent-snapshot format: pointer files are
stored as {sha256}.{version}.json on S3. Two changes to support this:

1. _make_updater now sets UpdaterConfig(prefix_targets_with_hash=True) so
   the TUF Updater resolves hash-prefixed paths automatically when calling
   download_target().

2. get_pointer() now parses N.targets.json (after Updater.refresh()) to
   enumerate available versions for the project. This replaces the removed
   latest.json: when version=None, _resolve_version() scans all
   <project>/<ver>.json entries in targets metadata and returns the highest
   stable PEP 440 version.

The disable_verification path fetches the metadata chain (timestamp →
snapshot → targets) without verifying signatures to find the hash-prefixed
URL, then fetches the pointer directly.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* feat(downloader): resolve latest via latest pointer target

* Move v2 TUF root metadata

* Simplify v2 downloader implementation

* feat(downloader): add MissingVersion and MalformedPointerError exceptions

Dedicated types replace the prior reuse of TargetNotFoundError for argument
validation (which mislabeled the failure category) and the unchecked KeyError
raised on a malformed pointer JSON.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(downloader): harden v2 wheel fetch and pointer handling

- Add explicit 60s timeout to urllib.request.urlopen so a stalled wheel
  fetch does not hang the Agent installer indefinitely.
- Validate required pointer JSON keys (digest, length, wheel_path) and
  raise the new MalformedPointerError instead of an opaque KeyError.
- Raise MissingVersion (a CLIError subclass) when
  --unsafe-disable-verification is set without --version, so the v1
  fallback log reports the actual cause instead of "target not found".
- Extract _verify_content to drop the pointer-is-None sentinel and make
  the verified and direct-download branches structurally parallel.
- Add `from __future__ import annotations` so the PEP 604 unions stay
  compatible with the declared requires-python = ">=3.8".
- Move logging.basicConfig out of the constructor and into the CLI entry
  point (separate commit); the class no longer mutates the root logger.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(downloader): make v2/v1 fallback handle validation errors and --force

- Split _download_v2() into instantiate_v2_downloader() and
  run_v2_downloader() to mirror the v1 instantiate/run split and let the
  warning/validation branches be tested without patching sys.argv.
- Re-raise user-input errors (CLIError, MissingVersion) before the broad
  except so they propagate as-is instead of triggering a spurious v1
  retry and a misleading "v2 download failed" log line.
- Add --force as a no-op compat stub on the v2 parser so v1-only callers
  do not trip parse_args -> SystemExit and silently skip the fallback.
- Hoist `import logging` to module top (was lazy-imported in the except
  block) and own the verbose-to-level + logging.basicConfig setup that
  used to live inside TUFPointerDownloader.__init__.
- Drop the meaningless `--v2 default=True` re-declaration; rename
  underscore-prefixed argparse dests to plain names.
- Note in the fallback block that v1 offline tests now traverse v2 first
  on every invocation.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(downloader): broaden v2 coverage and parametrize failure categories

- Parametrize _v2_failure_category across all five (exc, category) cases
  and add DownloadError / TimeoutError coverage that the categorizer
  already handles but previous tests never asserted.
- Replace direct calls to TUFPointerDownloader._target_path with a
  parametrized test that drives get_pointer and asserts on
  Updater.get_targetinfo so the behavior, not the private helper, is
  what's pinned.
- Add failure-mode tests for malformed pointer JSON (one per required
  key), urllib HTTPError/URLError mid-download, and wheel_path without
  a leading slash so the URL-composition contract is visible.
- Update test_direct_download_requires_explicit_version to expect
  MissingVersion now that argument-validation no longer reuses
  TargetNotFoundError.
- Move @pytest.mark.offline from each class to a module-level
  pytestmark; drop the leading-underscore prefix on module constants
  to match AGENTS.md style.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* style(downloader): sort test imports per project ruff config

Ruff in CI uses the root ../pyproject.toml which treats datadog_checks as
first-party. Reorder the test imports to match.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* refactor(downloader): address PR #23144 review feedback

- exceptions.py: type-hint MalformedPointerError/DigestMismatch __init__;
  add LengthMismatch (split from the overloaded DigestMismatch).
- download_v2.py: drop underscore from WHEEL_FETCH_TIMEOUT_SECONDS and
  REQUIRED_POINTER_KEYS per AGENTS.md; validate wheel_path leading slash
  via MalformedPointerError; verify length first (cheap early-out)
  before the sha256 digest check.
- cli.py: add type hints on download(), _v2_parser(),
  instantiate_v2_downloader(), run_v2_downloader(); drop the unused
  _args parameter from run_v2_downloader; collapse the redundant
  (CLIError, MissingVersion) except clause to just CLIError.
- test_v2_downloader.py: assert MalformedPointerError when wheel_path
  lacks a leading slash; split TestLengthMismatch from TestDigestMismatch;
  cover instantiate_v2_downloader validation/warning branches and the
  cli.download() v2-then-v1 fallback orchestration; drop the inline
  Updater patch in TestDisableVerification in favour of the fixture.

* Fix v2 downloader blockers: narrow fallback, future import

Narrow the v1 fallback in download() to a tuple of network/lookup
errors. Previously every non-CLIError exception triggered v1 retry,
including DigestMismatch / LengthMismatch / MalformedPointerError —
i.e. integrity failures the v2 path is meant to surface were silently
masked. Now those propagate; only TargetNotFoundError, DownloadError,
TimeoutError, and urllib.error.URLError fall back.

Add `from __future__ import annotations` to cli.py: the new module
uses PEP 604 unions and PEP 585 subscripted generics at definition
time, which crash on Python 3.8/3.9 (pyproject.toml declares
requires-python = ">=3.8"). download_v2.py already had the import.

Add parametrized test pinning the new behavior — DigestMismatch,
LengthMismatch, and MalformedPointerError propagate without invoking
the v1 downloader.

Other review feedback (refactor download(), gate compat warnings on
--v2, validate pointer field types, split download() into verified /
direct, etc.) is deferred to a follow-up to keep this PR focused.

* Preserve v1 downloader fallback behavior

* Format v2 downloader tests

* Add v2 downloader reviewer test coverage

* Reuse v2 downloader test wheel name

* Restore unsafe v1 fallback regression test

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* [MOPU-288] Add related links to kubernetes monitor templates

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [MOPU-288] Add related links to nginx monitor templates

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [MOPU-288] Add related links to postgres and redis monitor templates

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix broken Infrastructure links in monitor templates

The /infrastructure?filters=... links pointed to a non-existent path
with an unsupported query param, and used template variables not in
each monitor's group-by.

- nginx (4xx, 5xx, upstream_peer_fails): remove (upstream is not a
  host/pod/container resource)
- k8s deployments_replicas, statefulset_replicas, pods_failed_state:
  remove (no host/pod template var in group-by)
- k8s node_unavailable: replace with Hosts page scoped to
  kube_cluster_name
- k8s pod_crashloopbackoff, pod_imagepullbackoff, pod_oomkilled,
  pods_restarting: replace with Pod Explorer scoped to pod_name

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* Remove live messages reading from kafka_consumer

This functionality moved to the kafka_actions integration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add changelog entry

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix lint

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t setting up analytics keys (#23852)

* README changes to add analytics information

* more readme updates

* more readme

* ascii

* Apply suggestions from code review

Co-authored-by: Eva Parish <eva.parish@datadoghq.com>

* Apply suggestion from @evazorro

---------

Co-authored-by: Eva Parish <eva.parish@datadoghq.com>
…-containerized Linux hosts (#23853)

* Update GPU README to document required system-probe.yaml flag and gpu.enabled for non-containerized hosts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Address review feedback: style and consistency fixes in GPU README

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…23850)

* Wait for the gamesim_primary index to be online before running tests

test_index_stats_metrics asserts metrics tagged with index_name:gamesim_primary,
which the Couchbase indexer only reports once the bundled GSI index for
gamesim-sample finishes building. Add a WaitFor against /api/v1/stats on port
9102 (Couchbase 7+) so the environment fixture blocks until the indexer is
actually reporting that keyspace.

Also fix the inverted exit condition in load_sample_bucket(): the loop was
breaking the moment the task appeared in /pools/default/tasks instead of
when it disappeared, so it returned while the sample load was still running.

* Confirm gamesim_primary build completes before yielding environment

Three follow-ups from review:

- Check `initial_build_progress == 100` on the keyspace; the indexer also
  publishes stats while an index is still building.
- Use `raise_for_status()` so 401/404/5xx surface immediately instead of
  collapsing into a 60s WaitFor timeout.
- Drop the >=7 conditional around the new WaitFor; the test matrix only
  covers 7.x and the test that needs the index is itself gated on 7+.
- Bail out of load_sample_bucket if the install response carries no
  matching task entry, instead of spinning on `None == None`.

* Require every gamesim_primary keyspace to report build progress 100

* Condense gamesim_primary_index_ready docstring to a one-liner

* Clarify gamesim_primary_index_ready and log keyspaces on the negative path

- Refactor to a list comprehension that separates 'no keyspace yet' from
  'still building' so each retry path is self-documenting.
- Print the keyspaces seen on the negative path so a future WaitFor timeout
  doesn't leave a debugging dead end.
- Document why load_sample_bucket bails out when the install response carries
  no matching task — the next gamesim_primary_index_ready WaitFor is the real
  readiness gate.
* Honor source ref in release dispatch

Keep release workflow tooling checked out separately from the source tree being released. This lets the workflow use current setup actions and release scripts while ddev tags and validates the requested source-repo-ref.

Add source-repo-branch to manual release dispatches so stable versus pre-release validation follows the branch that contains source-repo-ref instead of the branch used to launch the workflow.

* align source-repo-branch description with release-trigger.yml
* Fix Phase.on_error signature and test_orchestrator

* Tighten on_error's error type

* Log instead of raise in on_finalize
)

* Fix Phase.on_error signature and test_orchestrator

* Move callbacks from react/ to its own layer

* Tighten on_error's error type

* Update Callbacks docstring

* Fix typo

* Improve CallbackSet docsting
* Add cycles detection in flow config

* Add two edge-cases tests

* Remove _detect_cycle tests and add one for disjoined graphs

* Find all possible cycles in _detect_cycles

* Changes in _detect_cycles: change comment, add cycle limit and tighten test
* Add cache breakpoints to anthropic_client

* Fix little bugs

* Move imports only used for type checking to if TYPE_CHECKING
* Add ddev validate

* Drop ddev validate all and fix some bugs

* Use --sync for every subcommand

* Add ddev validate all again
…ifecycle base (#23663)

* refactor(ai/phases): introduce PhaseOutcome and abstract Phase.execute()

- Add PhaseOutcome dataclass (memory_text, token counts, extra_checkpoint)
- Add validate_config() classmethod to Phase (no-op default)
- Add execute() method that implements the agent pipeline (later to be overridden by AgentPhase)
- Rewrite process_message() to call execute() and assemble the checkpoint from PhaseOutcome

* refactor(ai/phases): extract AgentPhase from Phase

- Create agent_phase.py with AgentPhase(Phase) that owns the LLM pipeline:
  before_react/after_react hooks, run_tasks, execute()
- Move render_task_prompt and render_memory_prompt to agent_phase.py
- AgentPhase.validate_config enforces agent, known-agent, and non-empty tasks
- Phase.execute() now raises NotImplementedError — subclasses must implement it
- Strip base.py of all agent-specific code and imports
- Split test_base.py into lifecycle-only tests (using _StubPhase) and
  test_agent_phase.py for the agent-driven behaviour tests

* refactor(ai/phases): make PhaseConfig.agent and .tasks optional

- type default: "Phase" → "AgentPhase"
- agent: str (required) → str | None = None
- tasks: list[TaskConfig] (required) → list[TaskConfig] = []
- Remove at_least_one_task field validator (now enforced by AgentPhase.validate_config)
- FlowConfig.cross_references: skip unknown-agent check when agent is None
- orchestrator: guard agent_config lookup against None, import AgentConfig
- test_config.py: update type assertion, remove empty_tasks test, add
  test_flow_config_phase_without_agent_validates
- test_base.py / test_agent_phase.py: drop model_construct workarounds

* refactor(ai/phases): invoke Phase.validate_config from orchestrator

- Call phase_cls.validate_config(phase_id, config, agents) immediately after
  resolving the phase class in on_initialize — only for phases scheduled in flow:
- Orphan phases (defined but absent from flow:) are skipped before the call
- test_orchestrator.py: drop explicit type: Phase lines from fixtures (use AgentPhase default),
  assert AgentPhase is registered by discovery, add tests for validate_config
  invocation and orphan-skip behaviour

* Rename AgentPhase to AgenticPhase

* Split AgenticPhase's execute into smaller functions and added tests for them

* Move agent and client parameters to AgenticPhase and make Phase abstract

* Add e2e Phase contract test

* Move some tests from agentic phase to conftest

* Phase not registered and improve tests

* Prevent extra_checkpoint from overriding checkpoint_payload

* Make Phase and Orchestrator model-agnostic

* Add Phase.extra_init_kwargs and agent/build.py tests
@luisorofino luisorofino requested review from a team as code owners May 29, 2026 08:27
@luisorofino luisorofino requested review from FlorianVeaux and removed request for a team May 29, 2026 08:27
Copy link
Copy Markdown
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ddev qa/skip-qa Automatically skip this PR for the next QA team/agent-integrations

Projects

None yet

Development

Successfully merging this pull request may close these issues.