Skip to content

feat: initial scaffold and core implementation of agent-kernel#1

Merged
dgenio merged 19 commits into
mainfrom
copilot/init-agent-kernel-implementation
Mar 4, 2026
Merged

feat: initial scaffold and core implementation of agent-kernel#1
dgenio merged 19 commits into
mainfrom
copilot/init-agent-kernel-implementation

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 2, 2026

Implements agent-kernel from scratch — a capability-based security kernel for AI agents operating in large tool ecosystems (1000+ tools via MCP, A2A, internal APIs). Provides the authorization, execution, and audit layer sitting above raw tool execution and below the LLM context window.

Package structure (src/ layout, Python ≥ 3.10, Apache-2.0)

  • enums.pySafetyClass (READ/WRITE/DESTRUCTIVE), SensitivityTag (PII/PCI/SECRETS/NONE)
  • errors.py — 10-class exception hierarchy; no bare ValueError/KeyError anywhere
  • models.py — Core dataclasses: Capability, Principal, Frame, Handle, ActionTrace, Budgets, etc.
  • registry.pyCapabilityRegistry with deterministic keyword-overlap search (no LLM, no vector DB)
  • tokens.pyCapabilityToken + HMACTokenProvider (HMAC-SHA256); tokens bind principal_id + capability_id + constraints for confused-deputy prevention
  • policy.pyDefaultPolicyEngine: READ always allowed; WRITE requires justification ≥ 15 chars + writer|admin role; DESTRUCTIVE requires admin; PII/PCI enforces tenant attribute + allowed_fields; max_rows capped at 50 (user) / 500 (service)
  • router.pyStaticRouter with ordered fallback driver chains
  • drivers/InMemoryDriver (Python callables + 200-record deterministic billing dataset), HTTPDriver (httpx async)
  • firewall/Firewall transforms RawResult → Frame; four response modes (summary/table/handle_only/raw); enforces Budgets; regex-based PII/PCI redaction; deterministic summarisation
  • handles.pyHandleStore with TTL, lazy eviction, pagination (offset/limit), field selection, equality filtering
  • trace.py / kernel.pyTraceStore + Kernel main entry point wiring all components

Quickstart

kernel = Kernel(registry, router=StaticRouter(routes={"tasks.list": ["memory"]}))
kernel.register_driver(driver)

token = kernel.get_token(CapabilityRequest("tasks.list", goal="list tasks"), principal, justification="")
frame = await kernel.invoke(token, principal=principal, args={})
# frame.facts  →  ['Total rows: 20', 'Top keys: id, title, done', ...]
# frame.handle →  Handle(handle_id='...', total_rows=20, ...)

expanded = kernel.expand(frame.handle, query={"limit": 3, "fields": ["id", "title"]})
trace = kernel.explain(frame.action_id)   # full audit record

Testing & tooling

  • 107 pytest tests, 94% coverage across all modules
  • pyproject.toml (PEP 621, hatchling), Makefile (fmt/lint/type/test/example/ci)
  • GitHub Actions CI matrix: Python 3.10 / 3.11 / 3.12 with explicit permissions: contents: read
  • Three self-contained examples (no internet): basic_cli.py, billing_demo.py, http_driver_demo.py
  • Docs: architecture.md, security.md, integrations.md, capabilities.md, context_firewall.md
Original prompt

Create the initial scaffold and core implementation for agent-kernel, a Python library that implements a capability-based security kernel for AI agents operating in large tool ecosystems (1000+ tools via MCP, A2A, internal APIs).

This library sits ABOVE contextweaver (a context compilation library, available as a dependency) and provides the authorization, execution, and audit layer.

What this library does

  1. Capability Registry: register task-shaped capabilities (not raw tools) with safety classes and sensitivity tags.
  2. Capability Tokens: HMAC-signed, time-bounded, principal-scoped tokens that authorize specific actions.
  3. Policy Engine: role-based access control with confused-deputy prevention. READ/WRITE/DESTRUCTIVE safety classes, PII/PCI sensitivity handling.
  4. Drivers: pluggable execution layer (InMemoryDriver for testing, HTTPDriver for real APIs, protocol-agnostic MCP adapter interface).
  5. Context Firewall: transforms raw tool output into budgeted Frames (facts + table preview + handles). Never exposes raw output to the LLM by default.
  6. Audit Trail: every action is traced and explainable via kernel.explain(action_id).

Package details

  • Package name: agent_kernel
  • Python >= 3.10
  • pyproject.toml with PEP 621, src/ layout
  • License: Apache-2.0
  • Runtime deps: httpx (for HTTPDriver)
  • Dev deps: pytest, pytest-cov, pytest-asyncio, ruff, mypy
  • [tool.pytest.ini_options] asyncio_mode = "auto"

Repository structure

agent-kernel/
├── pyproject.toml
├── Makefile                    # fmt, lint, type, test, example, ci
├── LICENSE                     # Apache-2.0
├── README.md
├── CHANGELOG.md
├── CONTRIBUTING.md
├── AGENTS.md                   # AI agent instructions for working in this repo
├── .gitignore
├── .github/workflows/ci.yml   # Python 3.10, 3.11, 3.12: ruff + mypy + pytest
├── docs/
│   ├── architecture.md         # Component deep-dive + Mermaid diagram
│   ├── security.md             # Threat model, confused deputy, token scopes
│   ├── integrations.md         # MCP integration, custom drivers, capability mapping
│   ├── capabilities.md         # Designing good capabilities, naming conventions
│   └── context_firewall.md     # Budgets, frames, handles, redaction, expand
├── examples/
│   ├── basic_cli.py            # Full flow: request → grant → invoke → expand
│   ├── billing_demo.py         # InMemoryDriver with dataset, budgets, handles, pagination
│   └── http_driver_demo.py     # Local mini HTTP server + HTTPDriver (no internet needed)
├── src/
│   └── agent_kernel/
│       ├── __init__.py         # Public API exports + __version__
│       ├── py.typed            # PEP 561
│       ├── models.py           # Core dataclasses: Capability, CapabilityRequest, CapabilityGrant,
│       │                       #   Principal, PolicyDecision, RoutePlan, ImplementationRef,
│       │                       #   RawResult, Frame, Handle, Provenance, ActionTrace,
│       │                       #   Budgets, FieldSpec, ResponseMode
│       ├── enums.py            # SafetyClass (READ/WRITE/DESTRUCTIVE),
│       │                       #   SensitivityTag (PII/PCI/SECRETS/NONE)
│       ├── errors.py           # AgentKernelError, TokenExpired, TokenInvalid, TokenScopeError,
│       │                       #   PolicyDenied, DriverError, FirewallError, CapabilityNotFound,
│       │                       #   HandleNotFound, HandleExpired
│       ├── registry.py         # CapabilityRegistry: register, lookup, keyword-based request matching
│       ├── policy.py           # PolicyEngine protocol + DefaultPolicyEngine (rule-based):
│       │                       #   READ allowed, WRITE needs justification+role, DESTRUCTIVE needs admin,
│       │                       #   PII/PCI requires tenant attribute, max_rows enforcement
│       ├── tokens.py           # CapabilityToken dataclass, TokenProvider protocol,
│       │                       #   HMACTokenProvider (SHA-256, env secret, expiry, signature verify)
│       ├── router.py           # Router protocol + StaticRouter (first match + fallback)
│       ├── drivers/
│       │   ├── __init__.py
│       │   ├── base.py         # Driver protocol, ExecutionContext, RawResult
│       │   ├── memory.py       # InMemoryDriver (simulated capabilities with Python functions)
│       │   └── http.py         # HTTPDriver (httpx-based, timeouts, error mapping)
│       ├── firewall/
│       │   ├── __init__.py
│       │   ├── budgets.py      # Budgets dataclass (max_rows, max_fields, max_chars, max_depth)
│       │   ├── transform.py    # Firewall class: RawResult → Frame with budget enforcement
│       │   ├── redaction.py    # PII/PCI field redaction (email, phone, card_number, ssn)
│       │   └── summarize.py    # Deterministic summarization heuristics (no LLM):
│       │                       #   list-of-dicts → count+stats+top_keys,
│       │                       #   dict → keys+aggregates, str...

</details>



<!-- START COPILOT CODING AGENT TIPS -->
---

🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. [Learn more about Advanced Security.](https://gh.io/cca-advanced-security)

Copilot AI and others added 2 commits March 2, 2026 20:01
Co-authored-by: dgenio <12731907+dgenio@users.noreply.github.com>
Co-authored-by: dgenio <12731907+dgenio@users.noreply.github.com>
Copilot AI changed the title [WIP] Create initial scaffold for agent-kernel library feat: initial scaffold and core implementation of agent-kernel Mar 2, 2026
@dgenio dgenio requested a review from Copilot March 3, 2026 23:25
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Initial implementation of the agent-kernel library: a capability-based authorization + execution kernel for agents, including policy gating, HMAC-signed capability tokens, driver routing/execution, a context firewall (Frame/Handle), and an audit trail.

Changes:

  • Added core runtime modules (models, registry, policy, tokens, router, kernel, handle/trace stores, drivers, firewall).
  • Added a full pytest suite plus fixtures to validate end-to-end flows and security properties.
  • Added packaging/tooling, CI workflow, docs, and runnable examples.

Reviewed changes

Copilot reviewed 46 out of 47 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
.github/workflows/ci.yml CI matrix for lint/format/type/test/examples.
AGENTS.md Repo conventions and security/quality guidelines for agents.
CHANGELOG.md Project changelog scaffold.
CONTRIBUTING.md Contributor workflow and quality bar.
Makefile Local developer commands aligned with CI.
README.md Project overview, architecture, and quickstart.
docs/architecture.md High-level architecture and component diagram.
docs/capabilities.md Guidance for capability naming and design.
docs/context_firewall.md Firewall response modes, budgets, handles, redaction.
docs/integrations.md Driver integration guidance (MCP/HTTP/custom).
docs/security.md Threat model and security properties.
examples/basic_cli.py End-to-end demo of request → token → invoke → expand → explain.
examples/billing_demo.py Demo using deterministic billing dataset + budgets + expansion.
examples/http_driver_demo.py Demo running a local HTTP server with HTTPDriver.
pyproject.toml Packaging metadata + dependencies + ruff/mypy/pytest config.
src/agent_kernel/init.py Public API exports and version.
src/agent_kernel/drivers/init.py Driver subpackage exports.
src/agent_kernel/drivers/base.py Driver protocol + execution context.
src/agent_kernel/drivers/http.py Async HTTP execution driver based on httpx.
src/agent_kernel/drivers/memory.py In-memory driver + deterministic billing dataset factory.
src/agent_kernel/enums.py SafetyClass and SensitivityTag enums.
src/agent_kernel/errors.py Custom exception hierarchy.
src/agent_kernel/firewall/init.py Firewall subpackage exports.
src/agent_kernel/firewall/budgets.py Firewall budgets dataclass.
src/agent_kernel/firewall/redaction.py Regex + field-name based redaction utilities.
src/agent_kernel/firewall/summarize.py Deterministic summarization heuristics.
src/agent_kernel/firewall/transform.py Core RawResult → Frame transformer enforcing budgets/modes.
src/agent_kernel/handles.py HandleStore with TTL + expand (pagination/filters/fields).
src/agent_kernel/kernel.py Main orchestration: token verify → route → execute → firewall → trace.
src/agent_kernel/models.py Core dataclasses: Capability, Principal, Frame, Handle, ActionTrace, etc.
src/agent_kernel/policy.py DefaultPolicyEngine rules + constraint enforcement.
src/agent_kernel/py.typed PEP 561 marker for typed package.
src/agent_kernel/registry.py Capability registry + keyword-overlap search.
src/agent_kernel/router.py Static routing from capability_id → ordered driver chain.
src/agent_kernel/tokens.py CapabilityToken serialization + HMACTokenProvider signing/verify.
src/agent_kernel/trace.py TraceStore for in-memory audit traces.
tests/conftest.py Shared fixtures for kernel, principals, registry, drivers.
tests/test_drivers.py Driver unit tests (InMemoryDriver + HTTPDriver).
tests/test_firewall.py Firewall mode/budget/redaction behavior tests.
tests/test_handles.py HandleStore TTL/eviction/expand behavior tests.
tests/test_kernel.py Integration tests for full kernel flows + fallback + token scope.
tests/test_models.py Dataclass construction and serialization tests.
tests/test_policy.py DefaultPolicyEngine rule tests.
tests/test_registry.py CapabilityRegistry registration/search tests.
tests/test_router.py StaticRouter routing semantics tests.
tests/test_tokens.py HMACTokenProvider issuance/verify/tamper/expiry tests.
tests/test_trace.py TraceStore record/get/list tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread src/agent_kernel/registry.py
Comment thread tests/test_registry.py
Comment thread src/agent_kernel/drivers/http.py
Comment thread src/agent_kernel/handles.py Outdated
Comment thread src/agent_kernel/kernel.py
Comment thread src/agent_kernel/policy.py
Comment thread src/agent_kernel/firewall/transform.py Outdated
Comment thread src/agent_kernel/firewall/transform.py
Comment thread src/agent_kernel/kernel.py Outdated
Comment thread src/agent_kernel/kernel.py
@dgenio dgenio marked this pull request as ready for review March 4, 2026 07:08
@dgenio dgenio merged commit f50b245 into main Mar 4, 2026
6 checks passed
@dgenio dgenio mentioned this pull request Mar 6, 2026
8 tasks
dgenio added a commit that referenced this pull request May 11, 2026
…ackaging

Apply the 19 Copilot inline review findings on PR #67, grouped:

Packaging / optional deps (#1, #2, #12)
- Defer `yaml` and `tomllib`/`tomli` imports into the
  `DeclarativePolicyEngine.from_yaml` / `from_toml` loaders so
  `import agent_kernel` works without the `policy` extra installed.
  Missing parser → `PolicyConfigError` with an install hint.

Policy DSL parsing (#3, #4)
- Validate types of `roles` (list[str]), `attributes` (dict[str, str]),
  `min_justification` (int — bool rejected), and `constraints` (mapping)
  in `_parse_rule()`; raise `PolicyConfigError` with precise messages
  instead of silently producing misbehaving rules or crashing at
  evaluation time.

Policy DSL explain() (#5)
- Correctly report explicit deny rules that fully match (previously
  fell through to a misleading `no_matching_rule` fallback and dropped
  the rule's `reason`). Skip partial-match deny rules so the
  explanation focuses on the actionable allow rule rather than
  suggesting changes that would only trigger the deny.

Example policy files (#6, #7, #8, #9, #10, #11)
- Rename `default_action` → `default` (the parser reads `default`,
  the previous key was silently ignored).
- Express PII-with-tenant as an allow rule paired with default-deny;
  the prior `deny-pii-no-tenant` was inverted under first-match-wins.
- Move `allow-secrets-service` before `deny-secrets-non-service`;
  the deny was previously unreachable.
- Tighten `allow-read-*` / `allow-write-*` to `sensitivity: [NONE]`
  so PII reads route through the dedicated allow-pii rule.

Kernel dry-run (#13, #14, #17)
- Resolve `DryRunResult.operation` the same way drivers do
  (`args.get("operation", capability_id)`) so it matches what a driver
  would actually receive — instead of `capability.impl.operation`,
  which can diverge.
- Mirror the Firewall's admin-only gate for `raw` mode: non-admin
  principals see their requested `raw` downgraded to `summary` in
  `DryRunResult`, matching real-invoke behaviour. Prevents probing
  for raw availability via dry-run.

Docs / annotations (#15, #16, #18)
- `Kernel.explain_denial()` docstring no longer contradicts itself
  ("never raises" vs. `CapabilityNotFound`).
- `drivers/mcp.py` adds an explicit `_McpError: type[Exception] | None`
  annotation so mypy --strict is happy across the try/except branches.
- `DryRunResult.budget_remaining` docstring no longer references the
  unimplemented `BudgetManager`; documented as reserved for a future
  cross-invocation budget mechanism.

Protocol softening (#19)
- Split `explain()` out of `PolicyEngine` into a new
  `ExplainingPolicyEngine` protocol so downstream engines that
  implement only `evaluate()` keep satisfying `PolicyEngine`.
  `Kernel.explain_denial()` uses `getattr` and raises a clear
  `AgentKernelError` when the configured engine cannot explain.
  Both built-in engines satisfy the richer protocol.

Tests
- Add tests for: explicit-deny fully-matched explanation, partial-match
  deny skipping, every `_parse_rule` validation error, install-hint
  paths for `from_yaml` / `from_toml`, dry-run operation resolution,
  dry-run raw-mode downgrade for non-admin, raw preserved for admin,
  and explain_denial against an engine without `explain()`.

Docs
- `docs/agent-context/invariants.md` adds a "Dry-run response-mode
  parity" trap entry so future contributors keep dry-run in sync with
  the Firewall's admin gate and the driver's operation resolution.

CHANGELOG
- Documents all of the above under [Unreleased].

`make ci` equivalents: ruff format/check, mypy --strict, 306 passing
tests at 95% coverage, all three example scripts complete.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants