Milestone: v0.3.0 | Tier: Strategic | Effort: Medium
Problem
Firewall.apply() enforces per-invocation budgets via budgets.py, but there is no cross-invocation budget tracking. An agent can exhaust the LLM context window by making many small invocations that individually fit within budget but cumulatively overflow.
This is the #2 problem the library was designed to solve (after security). The context firewall docs describe budget management but the implementation only works within a single invoke() call.
Proposed Change
1. BudgetManager class (src/agent_kernel/firewall/budgets.py)
class BudgetManager:
"""Tracks cumulative token usage across invocations within a session."""
def __init__(self, total_budget: int = 100_000, *, token_counter: TokenCounter | None = None):
...
def allocate(self, requested: int) -> int:
"""Allocate budget for an invocation. Returns actual allocation (may be less)."""
def record_usage(self, actual: int) -> None:
"""Record actual tokens consumed by an invocation."""
@property
def remaining(self) -> int: ...
@property
def usage_fraction(self) -> float: ...
def suggested_mode(self) -> ResponseMode:
"""Suggest response mode based on remaining budget."""
2. Adaptive response mode escalation
Based on cumulative budget consumption:
| Budget remaining |
Suggested mode |
| > 50% |
Caller's requested mode |
| 20%–50% |
table (if caller requested raw) |
| 5%–20% |
summary |
| < 5% |
handle_only |
3. Kernel integration
Kernel.__init__() accepts optional budget_manager: BudgetManager.
invoke() calls budget_manager.allocate() before firewall, record_usage() after.
- If budget is exhausted, raise
BudgetExhausted (new error type) instead of silently returning empty.
4. Token counting
- Default: character-based approximation (chars / 4).
- Optional:
tiktoken integration for accurate token counting (gated behind agent-kernel[tiktoken]).
- Pluggable via
TokenCounter protocol.
Acceptance Criteria
Affected Files
src/agent_kernel/firewall/budgets.py (BudgetManager, TokenCounter protocol)
src/agent_kernel/kernel.py (integrate BudgetManager into invoke())
src/agent_kernel/errors.py (add BudgetExhausted error)
tests/test_firewall.py (cross-invocation budget tests)
pyproject.toml (optional tiktoken dependency)
Milestone: v0.3.0 | Tier: Strategic | Effort: Medium
Problem
Firewall.apply()enforces per-invocation budgets viabudgets.py, but there is no cross-invocation budget tracking. An agent can exhaust the LLM context window by making many small invocations that individually fit within budget but cumulatively overflow.This is the #2 problem the library was designed to solve (after security). The context firewall docs describe budget management but the implementation only works within a single
invoke()call.Proposed Change
1.
BudgetManagerclass (src/agent_kernel/firewall/budgets.py)2. Adaptive response mode escalation
Based on cumulative budget consumption:
table(if caller requestedraw)summaryhandle_only3. Kernel integration
Kernel.__init__()accepts optionalbudget_manager: BudgetManager.invoke()callsbudget_manager.allocate()before firewall,record_usage()after.BudgetExhausted(new error type) instead of silently returning empty.4. Token counting
tiktokenintegration for accurate token counting (gated behindagent-kernel[tiktoken]).TokenCounterprotocol.Acceptance Criteria
BudgetManagertracks cumulative usage across multipleinvoke()callsremainingproperty returns correct remaining budget at all timesBudgetExhaustedraised when budget is fully consumedtiktokenintegration works when installed (optional)TokenCounterprotocol allows custom counting implementationsAffected Files
src/agent_kernel/firewall/budgets.py(BudgetManager, TokenCounter protocol)src/agent_kernel/kernel.py(integrate BudgetManager into invoke())src/agent_kernel/errors.py(add BudgetExhausted error)tests/test_firewall.py(cross-invocation budget tests)pyproject.toml(optional tiktoken dependency)