Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,59 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

Copyright 2026 Firefly Software Foundation. Licensed under the Apache License 2.0.

## [26.06.11] - 2026-06-22

SP-3: human-in-the-loop tool approval re-based onto pydantic-ai native deferred-tools.

### Added

- **Native tool approval / HITL.** Tools can declare `requires_approval=True`
(`firefly_tool(...)`, `BaseTool`, and threaded through `ToolKit.as_pydantic_tools()`
/ `as_toolset()`). When the model calls such a tool, the agent run **pauses before
executing it** and returns a `DeferredToolRequests` as `result.output`. Detect with
the new `is_deferred(result)` helper; **resume** via
`agent.run(message_history=..., deferred_tool_results=DeferredToolResults(approvals={call_id: True | ToolApproved(override_args=...) | ToolDenied(message=...)}))`.
`FireflyAgent` auto-detects HITL (any approval-requiring tool/`ToolKit`/`as_toolset()`,
or an `ApprovalRequiredToolset` in `toolsets`) and widens its output union to allow the
pause **only then** — non-HITL agents are unchanged. Force with `hitl=True`.
- **Inline (non-pausing) approval.** `FireflyAgent(approval_handler=...)` resolves
approvals *inside* the run via a native `HandleDeferredToolCalls` capability — for
programmatic / policy-based auto-approval.
- **Native re-exports** from `fireflyframework_agentic.tools`: `DeferredToolRequests`,
`DeferredToolResults`, `ToolApproved`, `ToolDenied`, `ApprovalRequired` (plus the
already-exported `ApprovalRequiredToolset`). `is_deferred` and the `ApprovalHandler`
type are exported from `fireflyframework_agentic.agents`.

### Changed

- Post-run cross-cutting code now treats a paused run as a control object, not a final
answer: `_persist_memory`, the output-guard, validation, cache, logging, and
explainability middleware all **skip** a `DeferredToolRequests` output (preventing
corrupted memory turns, spurious `OutputGuardError`/`OutputReviewError`, and caching a
pause).
- Tool **guard denials** (validation / rate-limit / sandbox) now raise `ToolGuardError`
instead of a plain `ToolError`. `ToolGuardError` subclasses `ToolError`, so existing
`except ToolError` handlers are unaffected.
- `BaseTool._guarded_execute` now lets pydantic-ai's `ApprovalRequired` / `CallDeferred`
control signals propagate untouched (like `ModelRetry`), instead of wrapping them as
`ToolError`. This makes **dynamic** approval work — a tool body (with `takes_ctx=True`)
may `raise ApprovalRequired(metadata=...)` to defer that specific call; pair with
`FireflyAgent(hitl=True)` so the output union allows the pause.

### Removed (breaking)

- **`ApprovalGuard`** (and the `ApprovalCallback` alias). The bespoke guard-chain approval
(sync bool callback → `ToolError` on denial, no pause/resume/metadata) is replaced by the
native protocol above. Migration: `docs/migration.md` §6.

### Notes

- HITL stays three distinct layers by design: tool approval (native deferred-tools, agent
layer), workflow `human()` / `WorkflowInterrupt` (journal-replay), and pipeline `Pause`
/ `approve_pause` (checkpoint). They are not collapsed.
- Validated against a live Anthropic model: a `requires_approval` tool pauses the real run
(tool body does not execute), and resuming with approval runs it exactly once.

## [26.06.10] - 2026-06-22

SP-5: native structured-output modes for reasoning patterns.
Expand Down
29 changes: 22 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,9 +156,14 @@ create your own components; the framework discovers them via duck typing.

- **Tools** — `ToolProtocol` (duck-typed) and `BaseTool` (inheritance) let you choose
your extensibility style. `ToolBuilder` provides a fluent API for building tools
without subclassing. Five guard types (`ValidationGuard`, `RateLimitGuard`,
`ApprovalGuard`, `SandboxGuard`, `CompositeGuard`) intercept calls before execution.
Three composition patterns (`SequentialComposer`, `FallbackComposer`,
without subclassing. Four guard types (`ValidationGuard`, `RateLimitGuard`,
`SandboxGuard`, `CompositeGuard`) intercept calls before execution (a rejected guard
raises `ToolGuardError`). For **human-in-the-loop**, mark a tool `requires_approval=True`:
the agent run **pauses** before executing it and returns a `DeferredToolRequests`
(detected via `is_deferred(result)`), which you resume with `deferred_tool_results=` —
approving (`ToolApproved`), denying (`ToolDenied`), or auto-deciding inline via an
`approval_handler=`. The native deferred-tools types are re-exported from
`fireflyframework_agentic.tools`. Three composition patterns (`SequentialComposer`, `FallbackComposer`,
`ConditionalComposer`) build higher-order tools. `ToolKit` groups tools for
bulk registration. Nine built-in tools (calculator, datetime, filesystem, HTTP,
JSON, search, shell, text, database) are ready to attach to any agent.
Expand All @@ -177,7 +182,11 @@ create your own components; the framework discovers them via duck typing.
**Reflexion** (execute → critique → retry), **Tree of Thoughts** (branch →
evaluate → select), and **Goal Decomposition** (goal → phases → tasks).
All produce structured `ReasoningResult` with `ReasoningTrace`. Prompts are
slot-overridable. `OutputReviewer` can validate final outputs. `ReasoningPipeline`
slot-overridable. Each pattern's structured output is wrapped in a pydantic-ai
output mode — selected per-pattern via `output_mode=` or framework-wide via the
`reasoning_output_mode` config — `"tool"` (`ToolOutput`), `"native"` (provider
structured output), or `"prompted"` (`PromptedOutput`, portable to any model).
`OutputReviewer` can validate final outputs. `ReasoningPipeline`
chains patterns sequentially.

<p align="center">
Expand Down Expand Up @@ -536,6 +545,12 @@ async def lookup(query: str) -> str:
return f"Result for {query}"
```

> **Human-in-the-loop:** mark a tool `@firefly_tool(name=..., requires_approval=True)` and the
> agent run **pauses** before executing it — `run()` returns a `DeferredToolRequests`
> (detect with `is_deferred(result)`). Resume with
> `agent.run(message_history=paused.all_messages(), deferred_tool_results=DeferredToolResults(approvals={call_id: True}))`.
> Full detail in [docs/tools.md](docs/tools.md#human-in-the-loop-tool-approval).

### 4. Add Memory for Multi-Turn Conversations

```python
Expand Down Expand Up @@ -713,11 +728,11 @@ content processing, validation, explainability, and pipelines.
Detailed guides for each module:

- [Architecture](docs/architecture.md) — Design principles and layer diagram
- [Agents](docs/agents.md) — Lifecycle, registry, delegation, decorators
- [Agents](docs/agents.md) — Lifecycle, registry, delegation, decorators, human-in-the-loop approval
- [Template Agents](docs/templates.md) — Summarizer, classifier, extractor, conversational, router
- [Tools](docs/tools.md) — Protocol, builder, guards, composition, built-ins
- [Tools](docs/tools.md) — Protocol, builder, guards, composition, built-ins, native HITL approval (`requires_approval`, deferred resume)
- [Prompts](docs/prompts.md) — Templates, versioning, composition, validation
- [Reasoning Patterns](docs/reasoning.md) — 6 patterns, structured outputs, custom patterns
- [Reasoning Patterns](docs/reasoning.md) — 6 patterns, structured outputs, output modes (`output_mode`/`reasoning_output_mode`), custom patterns
- [Content](docs/content.md) — Chunking, compression, batch processing
- [Memory](docs/memory.md) — Conversation history, working memory, storage backends
- [Validation](docs/validation.md) — Rules, QoS guards, output reviewer
Expand Down
2 changes: 1 addition & 1 deletion docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ below it, keeping the dependency graph acyclic and each module independently tes
|---|---|
| **[Agents](agents.md)** | `FireflyAgent`, `AgentRegistry`, `AgentLifecycle`, `@firefly_agent` decorator, middleware stack (`AgentMiddleware`, `MiddlewareChain`, `Logging`/`PromptGuard`/`CostGuard`/`Observability`/`Explainability`/`Cache`/`OutputGuard`/`Validation`/`Retry`/`PromptCache` middleware), 7 delegation strategies (round-robin, capability, content-based, cost-aware, chain, fallback, weighted), `FallbackModelWrapper` / `run_with_fallback`, `ResultCache` |
| **[Template Agents](templates.md)** | Five factory functions: summarizer, classifier, extractor, conversational, router |
| **[Tools](tools.md)** | `ToolProtocol`, `BaseTool`, `ToolBuilder`, guards, composition, caching, 9 built-in tools; full-fidelity schemas via `ParameterSpec(python_type=…)`, `RunContext` opt-in (`takes_ctx`), `ToolKit.as_toolset()` + re-exported native combinators (`FilteredToolset`, `WrapperToolset`, `ApprovalRequiredToolset`, …) |
| **[Tools](tools.md)** | `ToolProtocol`, `BaseTool`, `ToolBuilder`, guards, composition, caching, 9 built-in tools; full-fidelity schemas via `ParameterSpec(python_type=…)`, `RunContext` opt-in (`takes_ctx`), `ToolKit.as_toolset()` + re-exported native combinators (`FilteredToolset`, `WrapperToolset`, `ApprovalRequiredToolset`, …); human-in-the-loop tool approval (`requires_approval` / `is_deferred` / `deferred_tool_results` / `approval_handler`) |
| **[Prompts](prompts.md)** | `PromptTemplate`, `PromptRegistry`, composers, validation, loaders |
| **[Content](content.md)** | `TextChunker`, `MarkdownChunker`, `DocumentSplitter`, `ImageTiler`, `BatchProcessor`, compression; binary normalization (`content.binary`, `[binary]` extra: `BinaryNormalizer`, office/PDF/image/archive/email converters) |
| **[Memory](memory.md)** | `ConversationMemory`, `WorkingMemory`, `MemoryManager`, `InMemoryStore` / `FileStore` / `SQLiteStore` backends, `MemoryScope`, LLM summarisation |
Expand Down
28 changes: 28 additions & 0 deletions docs/agents.md
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,34 @@ agent = FireflyAgent(

---

## Human-in-the-Loop Tool Approval

When a tool declares `requires_approval=True` (or an `ApprovalRequiredToolset` gates the
toolset), `run()` / `run_sync()` **pause before the tool executes** and return a
`DeferredToolRequests` as `result.output`. Detect this with `is_deferred(result)`, then
resume by calling the agent again with the paused messages and the human's decision:

```python
from fireflyframework_agentic.agents import FireflyAgent, is_deferred
from fireflyframework_agentic.tools import DeferredToolResults

result = await agent.run("Delete record 42.")
if is_deferred(result):
approvals = {c.tool_call_id: True for c in result.output.approvals} # True / ToolApproved / ToolDenied
result = await agent.run(
message_history=result.all_messages(),
deferred_tool_results=DeferredToolResults(approvals=approvals),
)
```

`FireflyAgent` auto-detects HITL from approval-requiring tools/toolsets (widening its output
union only then); force it with `hitl=True`, or resolve approvals inline without pausing via
`approval_handler=`. Post-run middleware (output guard, validation, cache) and memory all skip
a paused result — it is a control object, not a final answer. See
[Human-in-the-Loop Tool Approval](tools.md#human-in-the-loop-tool-approval) for the full guide.

---

## Agent Registry

The `AgentRegistry` is a singleton that maps agent names to `FireflyAgent` instances.
Expand Down
5 changes: 2 additions & 3 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ graph TD

subgraph Agent Layer
AGT["Agents<br/><small>FireflyAgent · AgentRegistry<br/>DelegationRouter · AgentLifecycle<br/>@firefly_agent · 5 templates · 11 middleware<br/>7 delegation strategies · FallbackModelWrapper<br/>ResultCache · run timeout</small>"]
TOOLS["Tools<br/><small>BaseTool · ToolBuilder · ToolKit · CachedTool<br/>5 guards · 3 composers · tool timeout<br/>ToolRegistry · 9 built-ins</small>"]
TOOLS["Tools<br/><small>BaseTool · ToolBuilder · ToolKit · CachedTool<br/>4 guards · 3 composers · tool timeout · HITL approval<br/>ToolRegistry · 9 built-ins</small>"]
PROMPTS["Prompts<br/><small>PromptTemplate · PromptRegistry<br/>3 composers · PromptValidator<br/>PromptLoader</small>"]
CONTENT["Content<br/><small>TextChunker · DocumentSplitter · MarkdownChunker<br/>ImageTiler · BatchProcessor<br/>ContextCompressor · SlidingWindowManager<br/>content.binary (BinaryNormalizer · office converters)</small>"]
MEM["Memory<br/><small>MemoryManager · ConversationMemory<br/>WorkingMemory · TokenEstimator<br/>InMemoryStore · FileStore · SQLiteStore<br/>summarization · create_llm_summarizer<br/>export/import · async wrappers</small>"]
Expand Down Expand Up @@ -166,7 +166,6 @@ classDiagram
ToolProtocol <|.. ConditionalComposer
GuardProtocol <|.. ValidationGuard
GuardProtocol <|.. RateLimitGuard
GuardProtocol <|.. ApprovalGuard
GuardProtocol <|.. SandboxGuard
GuardProtocol <|.. CompositeGuard
ReasoningPattern <|.. AbstractReasoningPattern
Expand Down Expand Up @@ -215,7 +214,7 @@ system. Every other module depends on at least one Core component.
from environment variables and `.env` files. It actively rejects removed serving/exposure
config fields (e.g. `otlp_endpoint`, `rbac_enabled`, `cors_allowed_origins`,
`cost_calculator`) with a `ValueError`.
- **exceptions.py** -- A structured exception hierarchy of 34 classes rooted at
- **exceptions.py** -- A structured exception hierarchy of 42 classes rooted at
`FireflyAgenticError`.
- **plugin.py** -- `PluginDiscovery` discovers and loads entry-point plugins at startup.
- **resilience/circuit_breaker.py** -- `CircuitBreaker` (with `CircuitState` and
Expand Down
48 changes: 48 additions & 0 deletions docs/migration.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,50 @@ await triage(args, runner=FireflyAgentRunner()) # both resolved from the regis

---

## 6. `ApprovalGuard` removed — human-in-the-loop is now native (breaking)

**Why.** Tool approval was a bespoke guard (`ApprovalGuard(callback)`) that ran inside
Firefly's guard chain and raised `ToolError` on a denied call — a synchronous, all-or-nothing
gate with no pause/resume, metadata, or per-call granularity, parallel to pydantic-ai's own
deferred-tools protocol. It has been **removed** in favour of the native protocol
(`requires_approval`, `DeferredToolRequests`/`DeferredToolResults`, `ApprovalRequired`,
`ApprovalRequiredToolset`).

```python
# Before — guard that blocks the run on denial
from fireflyframework_agentic.tools.guards import ApprovalGuard

async def approve(tool_name, kwargs) -> bool:
return await ask_admin(tool_name, kwargs)

@guarded(ApprovalGuard(callback=approve))
@firefly_tool("delete_record", description="Delete a record")
async def delete_record(record_id: str) -> str: ...

# After — native: the run PAUSES for sign-off, then resumes
from fireflyframework_agentic.agents import is_deferred
from fireflyframework_agentic.tools import DeferredToolResults

@firefly_tool("delete_record", description="Delete a record", requires_approval=True)
async def delete_record(record_id: str) -> str: ...

result = await agent.run("delete record 42")
if is_deferred(result):
approvals = {c.tool_call_id: await ask_admin(c) for c in result.output.approvals} # bool / ToolApproved / ToolDenied
result = await agent.run(message_history=result.all_messages(),
deferred_tool_results=DeferredToolResults(approvals=approvals))
```

For the old **inline, non-pausing** behaviour (a callback decides programmatically), pass
`FireflyAgent(approval_handler=...)` — wired as a native `HandleDeferredToolCalls` capability.
See [Human-in-the-Loop Tool Approval](tools.md#human-in-the-loop-tool-approval).

Also: guard denials (validation, rate-limit, sandbox) now raise `ToolGuardError` instead of a
plain `ToolError`. `ToolGuardError` **subclasses** `ToolError`, so existing `except ToolError`
handlers keep working.

---

## Checklist

- [ ] Replace every `type_annotation="..."` with `python_type=<real type>` in
Expand All @@ -144,3 +188,7 @@ await triage(args, runner=FireflyAgentRunner()) # both resolved from the regis
- [ ] Review workflows for global cost/budget effects now that sub-agents run through
`FireflyAgent`; pass `runner=DefaultAgentRunner()` if you want the old path.
- [ ] Import toolset combinators / `RunContext` from `fireflyframework_agentic.tools`.
- [ ] Replace `ApprovalGuard` with `requires_approval=True` + the native pause/resume flow
(`is_deferred()` + `deferred_tool_results=`), or an inline `approval_handler=`.
- [ ] If you matched on `ToolError` from guard denials specifically, note it is now the
`ToolGuardError` subclass (still caught by `except ToolError`).
Loading
Loading