Skip to content

feat(agents,tools)!: SP-3 native human-in-the-loop tool approval#304

Merged
ancongui merged 2 commits into
mainfrom
feat/sp3-native-hitl-deferred-tools
Jun 22, 2026
Merged

feat(agents,tools)!: SP-3 native human-in-the-loop tool approval#304
ancongui merged 2 commits into
mainfrom
feat/sp3-native-hitl-deferred-tools

Conversation

@ancongui

Copy link
Copy Markdown
Contributor

Summary

SP-3 from the pydantic-ai modernization roadmap: re-base tool approval / human-in-the-loop onto pydantic-ai's native deferred-tools protocol, replacing the bespoke ApprovalGuard.

A tool marked requires_approval=True makes the agent run pause before executing it and return a DeferredToolRequests as result.output (detect with is_deferred(result)). The caller resumes with the human's decision:

result = await agent.run("Delete record 42.")
if is_deferred(result):
    approvals = {c.tool_call_id: True for c in result.output.approvals}  # True / ToolApproved(override_args=…) / ToolDenied(message=…)
    result = await agent.run(message_history=result.all_messages(),
                             deferred_tool_results=DeferredToolResults(approvals=approvals))

FireflyAgent auto-detects HITL (any approval-requiring tool / ToolKit / as_toolset(), or an ApprovalRequiredToolset) and widens its output union to allow the pause only then — non-HITL agents are unchanged. hitl=True forces it (e.g. for tools that defer dynamically by raising ApprovalRequired); approval_handler= resolves approvals inline (no pause) via a native HandleDeferredToolCalls capability.

Design

  • Three distinct HITL layers, kept separate by design: tool approval (native deferred-tools, agent layer — this PR), workflow human() / WorkflowInterrupt (journal-replay), pipeline Pause / approve_pause (checkpoint). Built after an Understand-workflow mapped all four current HITL surfaces against the native protocol.
  • Post-run cross-cutting code (memory persist, output-guard, validation, cache, logging, explainability) skips a paused DeferredToolRequests output — it is a control object, not a final answer.
  • Guard denials now raise ToolGuardError (a ToolError subclass — existing except ToolError unaffected).
  • Bug fix (caught by the edge-case tests): _guarded_execute now lets ApprovalRequired/CallDeferred propagate untouched (like ModelRetry), so a tool that defers dynamically from its body actually pauses instead of being wrapped in ToolError.

Breaking

  • ApprovalGuard + ApprovalCallback removed — replaced by the native protocol. Migration: docs/migration.md §6.

Tests & verification

  • 25 unit tests in test_hitl_deferred.py (pause/resume, deny, override_args, metadata round-trip, dynamic ApprovalRequired, ApprovalRequiredToolset e2e, multi-approval, inline handler None/resolve, middleware/memory skips, detection) + 6 in test_requires_approval.py.
  • 1486 PR-gate tests, ruff + ruff format clean, pyright 0 errors.
  • Live Anthropic e2e 10/10, including a real-model pause→approve→resume; plus ephemeral live checks of override_args, and dynamic ApprovalRequired approve + deny.

Docs

  • New "Human-in-the-Loop Tool Approval" sections in docs/tools.md (with a pause→approve→resume sequence diagram) and docs/agents.md; tutorial.md, architecture.md, migration.md updated; README Feature Highlights / Module Reference / Quick Start updated (drop ApprovalGuard, add HITL + SP-5 output modes); stale diagram/claims fixed.
  • Version 26.06.11, CHANGELOG.md + uv.lock updated.

Andres Contreras added 2 commits June 22, 2026 11:47
Re-base tool approval / HITL onto pydantic-ai's native deferred-tools protocol
(DeferredToolRequests/DeferredToolResults/ApprovalRequired/ToolApproved/ToolDenied),
replacing the bespoke ApprovalGuard.

Tools declare requires_approval=True (firefly_tool/BaseTool, threaded through
ToolKit.as_pydantic_tools/as_toolset). When the model calls such a tool the agent
run PAUSES before executing it and returns a DeferredToolRequests as result.output;
is_deferred(result) detects it. Resume with:
  agent.run(message_history=paused.all_messages(),
            deferred_tool_results=DeferredToolResults(approvals={id: True|ToolApproved|ToolDenied}))

FireflyAgent auto-detects HITL (requires_approval tool/ToolKit/as_toolset(), or an
ApprovalRequiredToolset in toolsets) and widens its output union to allow the pause
ONLY then — non-HITL agents are unchanged. hitl=True forces it; approval_handler=
resolves approvals inline (no pause) via a native HandleDeferredToolCalls capability.

Post-run cross-cutting code (memory persist, output-guard, validation, cache,
logging, explainability) skips a paused DeferredToolRequests output — it is a
control object, not a final answer. Guard denials now raise ToolGuardError (a
ToolError subclass).

BREAKING: ApprovalGuard + ApprovalCallback removed (migration: docs/migration.md §6).

Native symbols re-exported from .tools; is_deferred/ApprovalHandler from .agents.
Workflow human()/WorkflowInterrupt and pipeline Pause/approve_pause intentionally
left intact (three distinct HITL layers). Validated live against Anthropic
(real run pauses on approval-required tool; resume runs it exactly once).
Adds unit + nightly e2e coverage; docs (tools/agents/tutorial/architecture/migration)
updated.
… dynamic-approval fix

Documentation:
- README: drop removed ApprovalGuard from the "Five guard types" bullet (now four
  + ToolGuardError), add native HITL tool approval + SP-5 reasoning output modes;
  update Module Reference + Quick Start.
- docs/tools.md: add a pause -> approve -> resume mermaid sequence diagram.
- Fix stale diagrams/claims: tutorial.md dangling `CG --> AG` edge + guard lead-in;
  architecture.md "5 guards" -> "4 guards" + exceptions count 34 -> 42; docs index
  surfaces agent-layer HITL.

Fix (caught by new edge-case tests):
- BaseTool._guarded_execute now lets pydantic-ai ApprovalRequired / CallDeferred
  control signals propagate untouched (like ModelRetry) instead of wrapping them as
  ToolError. Without this, a tool that defers DYNAMICALLY (raising ApprovalRequired
  from its body) was swallowed and the run never paused. The static requires_approval
  path was unaffected.

Tests (12 new HITL edge cases in test_hitl_deferred.py):
- run_sync pause/resume; ToolApproved(override_args=) replacing args; metadata
  round-trip (ApprovalRequired -> DeferredToolRequests.metadata, and
  DeferredToolResults.metadata -> RunContext.tool_call_metadata/approved); dynamic
  ApprovalRequired + hitl=True (and UserError without it); ApprovalRequiredToolset
  end-to-end; OutputGuard/Explainability/Logging deferred-skip branches; multiple
  pending approvals (approve one / deny one); approval_handler returning None;
  raw pydantic Tool(requires_approval=) detection; memory resume mutual-exclusivity.

Verified: 1486 PR-gate tests, ruff/pyright clean. Live Anthropic e2e 10/10, plus
ephemeral live checks of override_args, dynamic ApprovalRequired approve+deny.
@ancongui ancongui enabled auto-merge June 22, 2026 10:06
import re
import time
from collections.abc import Sequence
from collections.abc import Awaitable, Callable, Sequence
@ancongui ancongui merged commit cc69fab into main Jun 22, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant