Skip to content

Cross-invocation context budget manager #44

@dgenio

Description

@dgenio

Milestone: v0.3.0 | Tier: Strategic | Effort: Medium

Problem

Firewall.apply() enforces per-invocation budgets via budgets.py, but there is no cross-invocation budget tracking. An agent can exhaust the LLM context window by making many small invocations that individually fit within budget but cumulatively overflow.

This is the #2 problem the library was designed to solve (after security). The context firewall docs describe budget management but the implementation only works within a single invoke() call.

Proposed Change

1. BudgetManager class (src/agent_kernel/firewall/budgets.py)

class BudgetManager:
    """Tracks cumulative token usage across invocations within a session."""
    
    def __init__(self, total_budget: int = 100_000, *, token_counter: TokenCounter | None = None):
        ...
    
    def allocate(self, requested: int) -> int:
        """Allocate budget for an invocation. Returns actual allocation (may be less)."""
    
    def record_usage(self, actual: int) -> None:
        """Record actual tokens consumed by an invocation."""
    
    @property
    def remaining(self) -> int: ...
    
    @property  
    def usage_fraction(self) -> float: ...
    
    def suggested_mode(self) -> ResponseMode:
        """Suggest response mode based on remaining budget."""

2. Adaptive response mode escalation

Based on cumulative budget consumption:

Budget remaining Suggested mode
> 50% Caller's requested mode
20%–50% table (if caller requested raw)
5%–20% summary
< 5% handle_only

3. Kernel integration

  • Kernel.__init__() accepts optional budget_manager: BudgetManager.
  • invoke() calls budget_manager.allocate() before firewall, record_usage() after.
  • If budget is exhausted, raise BudgetExhausted (new error type) instead of silently returning empty.

4. Token counting

  • Default: character-based approximation (chars / 4).
  • Optional: tiktoken integration for accurate token counting (gated behind agent-kernel[tiktoken]).
  • Pluggable via TokenCounter protocol.

Acceptance Criteria

  • BudgetManager tracks cumulative usage across multiple invoke() calls
  • After 80% budget consumption, response mode auto-escalates to more aggressive summarization
  • remaining property returns correct remaining budget at all times
  • BudgetExhausted raised when budget is fully consumed
  • Character-based token counting works without extra dependencies
  • tiktoken integration works when installed (optional)
  • TokenCounter protocol allows custom counting implementations

Affected Files

  • src/agent_kernel/firewall/budgets.py (BudgetManager, TokenCounter protocol)
  • src/agent_kernel/kernel.py (integrate BudgetManager into invoke())
  • src/agent_kernel/errors.py (add BudgetExhausted error)
  • tests/test_firewall.py (cross-invocation budget tests)
  • pyproject.toml (optional tiktoken dependency)

Metadata

Metadata

Assignees

No one assigned

    Labels

    complexity:averageModerate effort, some design neededphase:firewallContext firewall, budgets, redactionpriority:highCore functionalitysize:MMedium change, 50 to 200 linestype:featureNew functionality

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions