feat(agents,tools)!: SP-3 native human-in-the-loop tool approval#304
Merged
Conversation
added 2 commits
June 22, 2026 11:47
Re-base tool approval / HITL onto pydantic-ai's native deferred-tools protocol
(DeferredToolRequests/DeferredToolResults/ApprovalRequired/ToolApproved/ToolDenied),
replacing the bespoke ApprovalGuard.
Tools declare requires_approval=True (firefly_tool/BaseTool, threaded through
ToolKit.as_pydantic_tools/as_toolset). When the model calls such a tool the agent
run PAUSES before executing it and returns a DeferredToolRequests as result.output;
is_deferred(result) detects it. Resume with:
agent.run(message_history=paused.all_messages(),
deferred_tool_results=DeferredToolResults(approvals={id: True|ToolApproved|ToolDenied}))
FireflyAgent auto-detects HITL (requires_approval tool/ToolKit/as_toolset(), or an
ApprovalRequiredToolset in toolsets) and widens its output union to allow the pause
ONLY then — non-HITL agents are unchanged. hitl=True forces it; approval_handler=
resolves approvals inline (no pause) via a native HandleDeferredToolCalls capability.
Post-run cross-cutting code (memory persist, output-guard, validation, cache,
logging, explainability) skips a paused DeferredToolRequests output — it is a
control object, not a final answer. Guard denials now raise ToolGuardError (a
ToolError subclass).
BREAKING: ApprovalGuard + ApprovalCallback removed (migration: docs/migration.md §6).
Native symbols re-exported from .tools; is_deferred/ApprovalHandler from .agents.
Workflow human()/WorkflowInterrupt and pipeline Pause/approve_pause intentionally
left intact (three distinct HITL layers). Validated live against Anthropic
(real run pauses on approval-required tool; resume runs it exactly once).
Adds unit + nightly e2e coverage; docs (tools/agents/tutorial/architecture/migration)
updated.
… dynamic-approval fix Documentation: - README: drop removed ApprovalGuard from the "Five guard types" bullet (now four + ToolGuardError), add native HITL tool approval + SP-5 reasoning output modes; update Module Reference + Quick Start. - docs/tools.md: add a pause -> approve -> resume mermaid sequence diagram. - Fix stale diagrams/claims: tutorial.md dangling `CG --> AG` edge + guard lead-in; architecture.md "5 guards" -> "4 guards" + exceptions count 34 -> 42; docs index surfaces agent-layer HITL. Fix (caught by new edge-case tests): - BaseTool._guarded_execute now lets pydantic-ai ApprovalRequired / CallDeferred control signals propagate untouched (like ModelRetry) instead of wrapping them as ToolError. Without this, a tool that defers DYNAMICALLY (raising ApprovalRequired from its body) was swallowed and the run never paused. The static requires_approval path was unaffected. Tests (12 new HITL edge cases in test_hitl_deferred.py): - run_sync pause/resume; ToolApproved(override_args=) replacing args; metadata round-trip (ApprovalRequired -> DeferredToolRequests.metadata, and DeferredToolResults.metadata -> RunContext.tool_call_metadata/approved); dynamic ApprovalRequired + hitl=True (and UserError without it); ApprovalRequiredToolset end-to-end; OutputGuard/Explainability/Logging deferred-skip branches; multiple pending approvals (approve one / deny one); approval_handler returning None; raw pydantic Tool(requires_approval=) detection; memory resume mutual-exclusivity. Verified: 1486 PR-gate tests, ruff/pyright clean. Live Anthropic e2e 10/10, plus ephemeral live checks of override_args, dynamic ApprovalRequired approve+deny.
| import re | ||
| import time | ||
| from collections.abc import Sequence | ||
| from collections.abc import Awaitable, Callable, Sequence |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
SP-3 from the pydantic-ai modernization roadmap: re-base tool approval / human-in-the-loop onto pydantic-ai's native deferred-tools protocol, replacing the bespoke
ApprovalGuard.A tool marked
requires_approval=Truemakes the agent run pause before executing it and return aDeferredToolRequestsasresult.output(detect withis_deferred(result)). The caller resumes with the human's decision:FireflyAgentauto-detects HITL (any approval-requiring tool /ToolKit/as_toolset(), or anApprovalRequiredToolset) and widens its output union to allow the pause only then — non-HITL agents are unchanged.hitl=Trueforces it (e.g. for tools that defer dynamically by raisingApprovalRequired);approval_handler=resolves approvals inline (no pause) via a nativeHandleDeferredToolCallscapability.Design
human()/WorkflowInterrupt(journal-replay), pipelinePause/approve_pause(checkpoint). Built after an Understand-workflow mapped all four current HITL surfaces against the native protocol.DeferredToolRequestsoutput — it is a control object, not a final answer.ToolGuardError(aToolErrorsubclass — existingexcept ToolErrorunaffected)._guarded_executenow letsApprovalRequired/CallDeferredpropagate untouched (likeModelRetry), so a tool that defers dynamically from its body actually pauses instead of being wrapped inToolError.Breaking
ApprovalGuard+ApprovalCallbackremoved — replaced by the native protocol. Migration:docs/migration.md§6.Tests & verification
test_hitl_deferred.py(pause/resume, deny, override_args, metadata round-trip, dynamicApprovalRequired,ApprovalRequiredToolsete2e, multi-approval, inline handler None/resolve, middleware/memory skips, detection) + 6 intest_requires_approval.py.ruff formatclean, pyright 0 errors.override_args, and dynamicApprovalRequiredapprove + deny.Docs
docs/tools.md(with a pause→approve→resume sequence diagram) anddocs/agents.md;tutorial.md,architecture.md,migration.mdupdated; README Feature Highlights / Module Reference / Quick Start updated (dropApprovalGuard, add HITL + SP-5 output modes); stale diagram/claims fixed.CHANGELOG.md+uv.lockupdated.