Add adapters/ with Claude Code, Cursor, and NAT reference implementations#22
Add adapters/ with Claude Code, Cursor, and NAT reference implementations#22bar-capsule wants to merge 9 commits into
Conversation
…ions Introduces a top-level adapters/ directory holding reference implementations that wire popular agent frameworks to an ACS Guardian through configuration only, with no agent code changes. Three working adapters: - adapters/claude-code/: shell-stdin adapter wired via Claude Code's settings.json. 13 unit/integration tests + 2 automated live tests that spawn `claude --print` against a project-level settings.json and exercise ALLOW and DENY paths end-to-end. - adapters/cursor/: shell-stdin adapter wired via Cursor's hooks.json. Schema sourced from Cursor's bundled create-hook skill docs. 13 unit tests against the shared example Guardian. Manual live-verification procedure documented in tests/live_verification.md (Cursor is a desktop app with no documented headless mode). - adapters/nat/: in-process Python FunctionMiddleware for NVIDIA Agent Toolkit (nvidia-nat-core 1.7.0). Registered via @register_middleware, inherits FunctionMiddlewareBaseConfig with name="acs_guardian". 7 integration tests against real NAT types + 5 live workflow tests exercising the actual function_middleware_invoke orchestration path. Shared infrastructure: - adapters/example-guardian/: minimal deterministic Guardian used by all three adapters' integration tests. Stdlib-only Python. Documented as a teaching artifact, not a production Guardian. - adapters/README.md: framework -> adapter -> Guardian flow diagram, 6-step walkthrough with concrete JSON payloads at every step, cross-adapter comparison table, behavior-contract explanation. Total tests: 40 automated tests passing across all adapters (13 + 2 claude-code, 13 cursor, 7 + 5 nat). Schema gaps between docs and reality were closed via the live tests for Claude Code and NAT; Cursor was manually verified through a reproduction procedure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
98470b4 to
8e65380
Compare
Normalize directory layout so all three adapters have identical
structure (file naming differs only where the framework's own naming
convention dictates, e.g. config file extensions):
adapters/<framework>/
├── README.md
├── acs_adapter.py (was: cursor_adapter.py, acs_middleware.py)
├── mapping.md
├── <config>.example (settings.json / hooks.json / workflow.yml)
└── tests/
├── __init__.py
├── test_adapter.py
├── test_live.py (was: test_live_claude_code.py, test_live_nat_workflow.py)
├── example_payloads.md (NEW: masked real-world payload examples)
└── live_verification.md (Cursor only — manual procedure)
Renames:
- adapters/cursor/cursor_adapter.py -> acs_adapter.py
- adapters/nat/acs_middleware.py -> acs_adapter.py
- adapters/claude-code/tests/test_live_claude_code.py -> test_live.py
- adapters/nat/tests/test_live_nat_workflow.py -> test_live.py
New config example files (parallels claude-code/settings.json.example):
- adapters/cursor/hooks.json.example
- adapters/nat/workflow.yml.example
New tests/example_payloads.md per adapter — masked real-world payload
examples sourced from actual sessions, with identifying fields replaced
by placeholders. Lets adopters see the actual schema each framework
emits (including fields not in the public docs for Claude Code and
Cursor) without committing any real session data. Each file uses a
consistent masking convention table at the bottom.
Cursor gets a placeholder tests/test_live.py that skips with a pointer
to live_verification.md, so the file naming stays identical across all
three adapters.
Top-level adapters/README.md now shows the consistent directory layout
as a code block so reviewers can see the parallel structure at a glance.
All 40 automated tests still passing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rocklambros
left a comment
There was a problem hiding this comment.
@bar-capsule I went deep on this branch before writing anything. Checked it out, ran the three suites, read each adapter against the v0.1.0 schemas. The config-only pattern is the right call, and the per-adapter docs are honest about what's deferred. I want to flag a gap before "reference implementation" sticks, because people copy reference adapters line for line, and a few of these would ride along into their deployments. Marking this request-changes so the wire-format and fail-open items get a look before merge, not as a veto on the direction.
I checked all of this against the open issues first (#10 through #19). Those are spec-level: capability resolution, HMAC key management, the conformance program. Nothing below is a dupe. This is adapter-implementation ground. Three of mine have a protocol-level twin and I'll point those out.
The one that matters most: the adapters don't emit the v0.1.0 wire format. In claude-code/acs_adapter.py the request puts acs_version, request_id, timestamp, and metadata at the top level, but request-envelope.json wants them inside params and sets additionalProperties: false. A schema-validating Guardian rejects every request. timestamp goes out as epoch millis (int(time.time() * 1000)) where the schema asks for an ISO-8601 string. The payload uses tool / name / arguments where the hook schema wants the payload wrapper with arguments as {value, provenance}. Same shape in the cursor and nat adapters. The tests stay green because example_guardian.py reads params.get("tool") too, so both sides agree with each other and disagree with the spec. Nothing validates an emitted envelope against acs_schema.json. One test that does would have caught all of it, and it's the highest-leverage thing you can add here.
The deny path fails open on anything it doesn't recognize. In translate_response, an unknown or empty decision returns {}, which Claude Code and Cursor both read as "proceed." ACS_DEFAULT_DENY only kicks in on an exception (Guardian unreachable), not on a Guardian that answers with a verdict the map doesn't know. So a v0.2 disposition, a typo, even a trailing space surviving .lower(), all proceed. NAT already does the right thing and blocks under default_deny. The other two should match it. One line each.
No signing at all (this one's adjacent to #11, not the same). The adapters don't HMAC the envelope, and the example Guardian neither verifies a signature nor checks for replays. The READMEs call this "deferred to transport," but conformance.md:28 lists the baseline signature as a Core MUST, and :67 says transport doesn't satisfy it. #11 is about key distribution and rotation once you're signing. This is the step before: the reference ships with no signing, so every copy starts from an unauthenticated channel. The default http://127.0.0.1:8787/acs in the config keeps that invisible until someone repoints the URL at another host.
Delegation walks around the gate. SubagentStart isn't in HOOK_MAP, Claude Code can't block on it anyway, and example_guardian.py allows Task by default. A subagent spawn isn't evaluated before it acts. That's the adapter-level version of #16, and it lands on the exact confused-deputy path the subagent hooks were promoted to cover. At minimum I'd surface it in mapping.md's "not mapped" list instead of leaving it silent.
On the tests: "40 tests, all passing" is true on your machine, not in CI. NAT's 12 tests skip when nvidia-nat-core isn't installed (I got Ran 12 tests in 0.000s, OK (skipped=12)), Cursor's live test is a skip placeholder, and no workflow runs the adapter tests at all (only sync_version.yml). Skips read as passes, so a regression that lets a denied call through lands green. There's no requirements.txt under adapters/, and the NAT install is unpinned (pip install nvidia-nat-core, no ==). The NAT deny test also catches ACSGuardianDenied and returns without asserting the call was actually aborted, so it passes as long as something raised. This cluster worries me second-most, because green tests on a security control are worse than no tests. Pin the dep, run the unit tests in CI, and have the deny tests assert a real side effect didn't happen (the file wasn't written, the counter stayed at 0).
Smaller stuff, and I'm less sure these are worth blocking on:
example_guardian.py's regex missesrm -fr /,rm --recursive --force /,rm -rf ~, andfind / -delete. It's labeled illustrative so I won't die on this hill, but it's the only thing a newcomer can run on day one, so I'd make it harder to fool or louder about being a toy.- A PostToolUse deny can't undo a side effect that already ran (Claude Code's own hook docs say PostToolUse can't block the action). Cursor's
beforeReadFilereturns{}, so a denied file read still happens. The pre-hooks are the only real gate, and the docs should say so plainly. - NAT's
_build_requestisn't inside the try/except, so a non-serializable kwarg throws beforedefault_denycan catch it. Andpost_invokeignores a result-side deny. mapping.mdand the code disagree on the deny shape for non-PreToolUse hooks. The doc says{"continue": false, "stopReason": ...}, the code emits{"decision": "block", "reason": ...}. One is wrong against Claude Code's contract.
None of this changes my read on the direction. I like where this is going, and the cross-adapter table in the README is genuinely useful. I'd hold the "reference implementation" label until the envelope matches the schema and the deny paths fail closed, since those are the parts people copy without reading the footnotes. Happy to send a PR with the schema-validation test, or pair on the envelope fix if that's faster.
Tracking: I cross-linked the delegation gap onto #16 and the no-signing gap onto #11 so the protocol-level and adapter-level views sit together. The rest of the findings here are adapter-specific with no matching issue.
(cc @afogel since the envelope and signing points touch conformance.md.)
|
Merge-order note: this should land after #21, and after the change-request items above are addressed.
No git conflict with #20 or #21 (this only touches |
Rock's PR Agent-Control-Standard#22 review caught that the three reference adapters and the example Guardian shared a wire format that diverged from specification/v0.1.0/request-envelope.json: acs_version / request_id / timestamp / metadata at the envelope's top level instead of inside params, timestamp as epoch milliseconds instead of ISO-8601 string, tool payload missing the required payload wrapper, arguments not wrapped per tool-call-request.json. Tests passed because the adapter and the example Guardian agreed with each other; the canonical spec was outside the test loop. This commit: - Restructures every adapter's envelope to nest the AcsParams fields inside params, ISO-8601 timestamps, metadata.{agent_id, session_id} populated, payload wrapped per the relevant hook schema, arguments wrapped as {value: ...} per tool-call-request.json:26-37. - Updates example_guardian.py to read from params.payload, gate the Task subagent tool by default, and expand the destructive-Bash regex set (rm -fr, --recursive --force, ~, --no-preserve-root, find / -delete / -exec rm, chmod 777 on system paths). - Fixes the fail-open-on-unknown-disposition bug in claude-code and cursor translate_response; NAT pre_invoke and post_invoke now default-deny on unknown verdicts. - NAT post_invoke now honors a Guardian deny verdict by clearing context.output and setting acs_post_invoke_redacted, matching Specification §6.4's output-redaction gate. - NAT _build_request is now inside the try/except in both pre_invoke and post_invoke so build errors apply the same fail posture as transport errors. - Adds tests/test_envelope_schema.py to each adapter. These validate every adapter-emitted envelope and per-hook payload against the canonical v0.1.0 JSON schemas loaded from $ACS_SPEC_DIR. They are hard-FAIL if the schemas are missing — not skipped — because spec validation is non-negotiable. - Adds .github/workflows/adapter_tests.yml to run the schema + round- trip + live tests per adapter on every push and PR, with the spec schemas pulled from upstream Agent-Control-Standard/ACS:main. - Pins nvidia-nat-core==1.7.0 (adapters/nat/requirements.txt) and jsonschema>=4.20,<5 (adapters/requirements-test.txt). - Updates each adapter's README conformance table to be MUST-honest against docs/spec/conformance.md: handshake, baseline HMAC-SHA256 integrity, replay nonce, system/ping, wrapped MCP are now marked ✗ not implemented, with citations. The previous "deferred to transport layer" claim for baseline integrity was inconsistent with conformance.md:28 and :67. Test counts (all pass, zero hidden skips): claude-code: 17 schema + 13 round-trip cursor: 36 schema + 13 round-trip nat: 6 schema + 7 round-trip + 5 live (NAT 1.7.0) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ke, ping, audit Critical-review pass against the spec, MUST by MUST. Closes the gaps the previous commit (fbd5d7c) acknowledged but did not address. Adapters now exercise the full ACS-Core wire surface, not just the envelope shape: - HMAC-SHA256 baseline signing (§10). HKDF-SHA256 derives a per-session key from ACS_HMAC_SECRET and session_id. Adapter signs every request; Guardian verifies every signed request and signs every response. JCS canonicalization with the signature field removed is the signed input, per §10. Unsigned/tampered → SIGNATURE_INVALID (-32004). - Real rolling chain in example_guardian per §8.2. Per-session previous_hash tracking; entry_hash = SHA-256(JCS(entry) || prev_hash). Each session's chain head is computed and published on the response, covered by the signature per §8.6. - Replay rejection (§10.3): duplicate request_id within a session → REPLAY_DETECTED (-32005). Per-session seen-id set. - Timestamp skew rejection (§10.3): timestamps outside the negotiated skew window (default 300_000ms) → TIMESTAMP_OUT_OF_WINDOW (-32006). - handshake/hello (§4): adapter sends ClientHello on first session call; ServerHello is cached in ~/.cache/acs-adapter-handshake/ keyed by (session_id, guardian_url). ServerHello carries on_decision_failure ("proceed", spec default per §6.4), skew_window_ms, profiles_accepted. Version mismatch → UNSUPPORTED_VERSION (-32001). - system/ping (§13): Guardian always returns allow, doesn't enter the chain, doesn't require a signature, doesn't consume replay state. - Fail-open audit events per §6.4: every adapter that proceeds without a decision emits a structured ACS_AUDIT JSON line so the bypass is visible, not silent. Deployments redirect or parse the line to feed a real audit sink. - Default fail posture flipped to fail-open with audit (spec default per §6.4). Fail-closed is now explicit opt-in via ACS_DEFAULT_DENY=1. The previous default was the opposite of the spec default. - request_id_ref in tool-call-result payloads (tool-call-result.json:19-23) is now populated by all three adapters via a deterministic uuid5 derivation, so toolCallResult correlates to its originating toolCallRequest on the Guardian side. Testing methodology gaps closed: - format_checker added to every schema-validation test so format: uuid and format: date-time constraints are enforced. Previously they were annotation-only and a malformed value would pass. - Response-envelope schema validation added: every Guardian response shape (allow, deny, handshake, ping, error) is validated against response-envelope.json. - New tests/test_spec_compliance.py (20 tests) targets the Core MUSTs: rolling chain, replay, skew, signing (sign/verify/tamper-detect), handshake (success + version mismatch), system/ping (always-allow + chain-bypass), response-envelope validation. Honesty fixes in the conformance documentation: - NAT downgraded from "Reference implementation" to "Partial reference" in adapters/README.md and adapters/nat/README.md. NAT alone emits steps/toolCallRequest + steps/toolCallResult only — not the full 6-hook minimum. A NAT deployment using only this adapter is not ACS-Core conformant on its own; documented explicitly. - Cursor per-hook honesty table added: subagentStart, subagentStop, preCompact payloads are schema-valid but contain synthetic uuid5/ sha256 placeholders because Cursor does not expose the required fields. The synthetic values satisfy the schema but do not carry the meaning the spec requires. - Conformance tables across all three adapter READMEs now mark items ✓ that are actually verified by a test, with the test name cited. Items previously marked ✗ (handshake, baseline integrity, system/ping, replay enforcement, fail-open audit) are now ✓ with citation. Shared helpers in adapters/_common/acs_common.py: - jcs_canonicalize, derive_session_key (HKDF-SHA256) - sign_envelope, verify_signature (HMAC-SHA256) - iso8601_now, coerce_uuid, parse_iso8601 - audit_event (structured ACS_AUDIT line emitter) - do_handshake (cached per-session ClientHello/ServerHello) - ping (system/ping helper) Test counts after this commit: claude-code: 13 round-trip + 17 envelope-schema = 30 + 2 fail-posture cursor: 13 round-trip + 36 envelope-schema = 49 + 1 fail-posture (1 manual skip is intentional) nat: 18 (7 unit + 5 live + 6 envelope-schema; require nvidia-nat-core==1.7.0 for the first 12) guardian: 20 spec-compliance (handshake, ping, chain, replay, skew, signing, response-envelope validation) All pass. Zero hidden skips. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ail-without-secret, full JCS
Responds to the post-fbd5d7c critical review by closing the gaps that
were documented as "still not compliant":
- NAT lifecycle middleware (A). ACSMiddleware now subscribes to NAT's
IntermediateStepManager on first pre_invoke. WORKFLOW_START fires
steps/sessionStart + steps/userMessage; WORKFLOW_END fires
steps/agentResponse + steps/sessionEnd. NAT alone now satisfies the
ACS-Core 6-hook taxonomy minimum (conformance.md:19); the previous
"partial reference" status is gone. Verified in nat/tests/test_lifecycle.py.
- Cursor subagent + preCompact, honest (B). Stops fabricating UUIDs and
hashes. Adapter now keeps a per-session state file (~/.cache/acs-adapter-session/)
recording each step's request_id. subagentStart populates three of four
required fields from real data — subagent_session_id (uuid5 of
parent_session+subagent_id, stable across hooks), parent_session_id
(the envelope's session_id, real), parent_step_id (last step the
adapter has actually seen). Only intent_derivation stays a defensible
hardcoded default ("derived_from_parent" for IDE-spawned subagents).
preCompact's entries_to_compact comes from real seen step_ids.
subagentStop is dropped from HOOK_MAP entirely: final_chain_hash is
unknowable from Cursor (no chain on its side), and fabricating it is
worse than omitting the hook.
- Guardian refuses to start without a signing secret (C). On
startup, the Guardian requires ACS_HMAC_SECRET or ACS_HMAC_SECRET_FILE.
ACS_DEV_MODE=1 overrides with a stderr warning that §10 baseline
integrity is not satisfied. README points operators at
`openssl rand -hex 32 > /etc/acs/hmac.key && chmod 600`. All test
harnesses opt into ACS_DEV_MODE explicitly.
- Full RFC 8785 JCS via rfc8785 package (D). acs_common.jcs_canonicalize
uses the rfc8785 PyPI package when installed (full RFC compliance
including number edge cases) and falls back to the sorted-keys +
compact-separators implementation otherwise. requirements-test.txt
lists rfc8785>=0.1,<5.
- Chain entries use the client's timestamp (E). append_to_chain takes
the request's params.timestamp (already skew-validated upstream)
instead of stamping its own iso8601_now(). External observers can
now fully recompute the chain from the request stream and the
published chain_hash.
- ACS_HMAC_SECRET_FILE support (F). load_hmac_secret() resolves to a
file path first, env var second. File path is preferred for
production deployments (no exposure in ps eauxw, child-process
envs, core dumps). Trailing whitespace stripped.
- Per-session state helper. acs_common.load_session_state /
save_session_state / record_step provide a small JSON file in
~/.cache/acs-adapter-session/ for adapters that need to accumulate
state across separate hook-process invocations (Cursor uses this
for last_step_id and seen_step_ids; the same primitive is available
to claude-code and NAT if they need it).
- Handshake re-reads env on every call. The Guardian's
signature_algorithms_supported in ServerHello reflects the current
ACS_HMAC_SECRET / _FILE value at handshake time, not at process
start. Operators can rotate or set the secret without a restart.
Test counts (all pass, zero hidden skips):
claude-code: 32 round-trip + envelope-schema
cursor: 48 (47 + 1 intentional manual-Cursor skip)
guardian: 20 spec-compliance (handshake, ping, chain, replay, skew, signing, response-envelope validation)
nat: 20 (test_adapter 7 + test_live 5 + test_envelope_schema 6 + test_lifecycle 2)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two passes folded into one commit because they share helper code. ## Pass A — Security review (SECURITY.md threat model + tests + fixes) 12-threat model documented in adapters/SECURITY.md. 5 code fixes for high-priority threats; 7 out-of-scope items documented honestly. - T5 SSRF via ACS_GUARDIAN_URL: validate_guardian_url() allowlist (http/https only); called from every adapter's call_guardian and from do_handshake / ping in _common. - T6 Guardian DoS via oversized request body: MAX_REQUEST_BODY_BYTES (1 MiB) cap, matching the handshake's max_payload_size_bytes advertisement. Refuses Content-Length > cap with HTTP 413. - T7 leaky HMAC secret file: _check_secret_file_perms() refuses mode & 0o077 != 0, wrong owner, or symlink. Raises SecretFilePermissionsError instead of silently using a leaked key. - T8 cache poisoning: save_session_state and do_handshake create cache dirs mode 0700 and files mode 0600 via os.open(O_CREAT, 0o600) so other local users can't read or poison adapter state. - T9 regex DoS via huge command: scan_destructive_bash_safely() refuses inputs > 8 KiB, emits audit event, returns "input_too_large" sentinel — caller MUST treat as suspicious. 16 new tests in _common/tests/test_security.py, each named for the specific attack it falsifies. ## Pass B — Harsh-reviewer audit + 3 production failure modes + fixes Stepped back from the work, audited as a hostile reviewer. Identified the 3 most-likely production failure modes; wrote tests that fail on the current code; fixed the bugs. - BUG Agent-Control-Standard#1 NAT _correlation_request_id collision: the old uuid5-from- (session, function, kwargs-hash) was deterministic. Two calls to the same tool with the same args (list_files(), get_status(), any idempotent tool — very common) produced identical request_ids and the Guardian's replay protection rejected the second with REPLAY_DETECTED. Fix: stash a fresh uuid4 on the context in pre_invoke; post_invoke reads it back. Per-call uniqueness + pre/ post correlation both preserved. - BUG Agent-Control-Standard#2 Guardian state lost on restart: GuardianState was RAM-only. A Guardian restart (deploy / OOM / autoscaling roll) wiped seen_request_ids, opening a replay window for every previously-sent envelope. §10.3 doesn't pause for the duration of a deploy. Fix: per-session state persisted to a JSON file under ACS_GUARDIAN_STATE_DIR (default ~/.cache/acs-guardian-state/), mode 0700/0600. State loads on first session-touch, persists on every mutation in check_replay and append_to_chain. - BUG Agent-Control-Standard#3 lifecycle subscription race: _ensure_lifecycle_subscribed was a check-then-set with no lock. Two parallel pre_invoke calls (normal in NAT) both saw _lifecycle_subscribed=False and both subscribed; every WORKFLOW event then fired its ACS lifecycle hook twice. Fix: threading.Lock around the check-then-set, with re-check inside the lock. 4 new tests in nat/tests/test_failure_modes.py: 3 for the failure modes above, 1 regression guard ensuring the BUG Agent-Control-Standard#1 fix preserves pre/post correlation (post_invoke's request_id_ref must equal pre_invoke's request_id, per tool-call-result.json:19-23). ## Test-strengthening: catching 2 mutations that previously slipped Two mutation tests passed previously because of weaknesses in the tests themselves: - RollingChain::test_chain_hash_links_consecutive_requests only asserted hashes differed. Dropping previous_hash from the chain still produced different per-request hashes, so the mutation slipped. Strengthened test_chain_is_recomputable now EXTERNALLY recomputes the expected chain hash across 3 entries (now possible because the Guardian uses the client's timestamp) and asserts byte-equality. Also asserts the published hash does NOT match the "no previous_hash" computation. - Cursor envelope-schema fixtures all used SESSION_UUID, so a skip-coercion mutation slipped. Added UuidCoercionForNonUuidCursorIds (2 tests) with conv-abc123 / chat_xyz / test-cc-session inputs, asserting the adapter coerces them to valid UUIDs and that the coercion is deterministic. ## Test counts after this commit (all green, zero hidden skips) _common: 16 security claude-code: 32 round-trip + envelope-schema cursor: 50 round-trip + envelope-schema + uuid-coercion (1 intentional manual-Cursor skip) example-guardian: 20 spec-compliance nat: 24 (test_adapter 7 + test_live 5 + test_envelope_schema 6 + test_lifecycle 2 + test_failure_modes 4) Total: 142 tests. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Each item has a falsifying test in adapters/_common/tests/test_edge_cases.py (17 tests total). Items not requiring code changes still have tests that codify the safe behavior so a future regression would be caught. ## Items 1-12 Agent-Control-Standard#1 rfc8785 JCS consistency — test confirms fallback matches the rfc8785 package byte-for-byte on every ACS envelope shape we ship. No code change needed; a mixed-install signature mismatch would surface as test failure. Agent-Control-Standard#2 Guardian regex DoS, server-side: _matches_destructive_bash now returns "too_large" for inputs > DESTRUCTIVE_SCAN_MAX_LEN (8 KiB). The Guardian denies with reason_codes=["input_too_large"] — fail-safe direction. Previously, _common had the cap but the Guardian iterated patterns directly, leaving the server unprotected. Agent-Control-Standard#3 HA Guardian replay window: persist() now takes an exclusive flock on a .lock sidecar, re-reads on-disk state, merges (union of seen_request_ids / seen_nonces with earliest-timestamp wins), and atomically writes. check_replay re-reads the state on every call so Guardian A's writes are visible to Guardian B within one request. Cross-instance replay window closed under shared ACS_GUARDIAN_STATE_DIR. Agent-Control-Standard#4 Unbounded seen_request_ids: switched to dict {rid: timestamp}. New evict_old_request_ids() drops entries older than 2 × skew window (replay impossible past skew anyway). check_replay calls eviction opportunistically every 100 inserts. Memory bound is now O(skew_window / inter-request-time), not unbounded. Backwards- compat for list-format state files preserved. Agent-Control-Standard#5 Handshake cache TTL: do_handshake skips cache files older than ACS_HANDSHAKE_CACHE_TTL_SECONDS (default 3600s). Operator config changes propagate within the TTL. Agent-Control-Standard#6 NAT id(context) collision: WeakKeyDictionary fallback for contexts that reject attribute assignment. Last-resort path (object isn't weak-referenceable either) returns a fresh uuid4 per call and emits an audit event — pre→post correlation is lost in that path, but no silent collision. Agent-Control-Standard#7 Unicode / NULL / surrogate round-trip: emoji, NULL bytes, multi- plane unicode all sign+verify cleanly. JCS handles them via UTF-8 encoding; no code change needed. Agent-Control-Standard#8 ISO 8601 parse resilience: parse_iso8601 already accepts Z suffix, timezone offsets, millisecond + microsecond precision. Test codifies the accepted shapes + asserts garbage is rejected. Agent-Control-Standard#9 ACS_GUARDIAN_HOST_ALLOWLIST: optional env-var allowlist that restricts validate_guardian_url to specific hostnames in addition to the http/https scheme check. Defense in depth against env-var attacks that smuggle a valid http:// URL to internal services. Agent-Control-Standard#10 Cursor session-state file collision: _session_state_path now accepts an optional workspace parameter folded into the hash key. Cursor adapter passes the workspace_path / cwd so two Cursor windows with the same non-UUID conversation_id can't share state. Agent-Control-Standard#11 Guardian envelope schema validation: if jsonschema + ACS_SPEC_DIR are available, every incoming envelope is validated against request-envelope.json before policy evaluation. Malformed envelopes rejected with -32600 Invalid Request. system/ping and handshake/hello exempt because their payload shapes differ. Agent-Control-Standard#12 State-file hash length: bumped _session_state_path and _handshake_cache_path hashes from sha256[:16] (64-bit) to full sha256 (256-bit). Eliminates birthday collisions over deployment lifetime. ## Test counts after this commit (all green, 1 intentional manual skip) _common: 33 (16 security + 17 edge-cases) claude-code: 32 cursor: 50 example-guardian: 20 nat: 24 Total: 159 tests. ## Side-effects of the fixes - Round-trip test fixtures updated to use real UUID session_ids (claude-code/test_adapter.py). Old "test-cc-session" fails the new Guardian-side envelope-schema check, which is correct — non-UUID session_ids never reached the Guardian from real Claude Code. - Cursor adapter wires workspace through to load/save/record session state for Agent-Control-Standard#10 (new _workspace helper). - example_guardian.py imports DESTRUCTIVE_SCAN_MAX_LEN from acs_common to keep the cap in one place. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Stops the 'we keep finding bugs' pattern by enumerating the contract
once and testing it. One file (adapters/test_acs_core_conformance.py),
one command, one MUST per test method, each docstring quoting the
spec line it falsifies.
Coverage maps directly to docs/spec/conformance.md ACS-Core (lines 13-26):
Core01_Handshake (§4)
handshake/hello returns ServerHello with required keys
version mismatch -> UNSUPPORTED_VERSION -32001
Core02_EnvelopeShape (§3, request-envelope.json)
valid envelope passes canonical schema validation (with format-checker)
jsonrpc must be literal "2.0"
no additional top-level fields allowed
params required {acs_version, request_id, timestamp, metadata, payload}
metadata required {agent_id, session_id}
request_id format: uuid
timestamp format: date-time
acs_version pattern semver
method matches namespace pattern
Core03_HookTaxonomyMinimum (conformance.md:19)
all six of sessionStart, userMessage, toolCallRequest,
toolCallResult, agentResponse, sessionEnd accepted
Core04_Dispositions (§6, response-envelope.json conditionals)
allow response shape
deny requires reasoning
modify requires reasoning + modifications (schema-level)
ask requires reasoning + ask_details
defer requires reasoning + defer_details
Core05_SessionContext (§8)
response carries chain_hash
chain_hash is lowercase 64-hex SHA-256
consecutive entries chain
distinct sessions have independent chain heads
chain externally recomputable from request stream
Core06_ReplayProtection (§10.3)
duplicate request_id -> -32005 REPLAY_DETECTED
timestamp outside skew -> -32006 TIMESTAMP_OUT_OF_WINDOW
same request_id across sessions is fine (per-session scope)
Core07_BaselineIntegrity (§10)
signed request accepted
unsigned request rejected when secret configured -> -32004
tampered request -> -32004 SIGNATURE_INVALID
response is signed and verifies with HKDF key
per-session HKDF distinct keys for distinct sessions, same for same
signature covers session_id (cross-session signature lift rejected)
Core08_DecisionHonoring (§6.4)
Guardian responds within negotiated timeout (wire-level)
fail-open emits ACS_AUDIT event (adapter-level, exercised via subprocess)
Core09_SystemPing (§13)
ping returns allow regardless of policy / signature / session state
ping does not require signature
ping payload includes {status, echo, server_timestamp}
ping does not consume replay slot
Core10_WrappedMcp (conformance.md:26)
protocols/MCP/* method namespace validates
Guardian returns structured envelope for MCP methods (partial — full
MCP wrapping is documented as a follow-up; namespace + envelope OK)
44/44 pass on the current reference implementation. Spec source defaults
to /tmp/acs-spec-source/specification/v0.1.0/; set ACS_SPEC_DIR to
override. Hard-fails (does not skip) if schemas are missing — spec
validation is non-negotiable.
Why this matters: previous test layers (per-adapter, schema, security,
edge-cases, failure-modes) covered properties but didn't give an
adopter a single yes/no on Core conformance. test_acs_core_conformance
is the contract. If an adopter forks this and modifies it, they run
the file and either keep their Core claim or know exactly which MUST
they broke.
adapters/README.md now leads with this command.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Self-audit caught six tests that were positive-only (asserted the
happy path but had no falsifier). A no-op validator or a non-chaining
chain would have passed several of them. Mutation-tested the
strengthened versions: each named bug is now caught with a clear
failure message citing the spec.
Strengthening per group:
Core02 Envelope shape
+ test_contradiction_validator_actually_works — a deliberately
broken envelope ({"jsonrpc": "2.0"} with no method/id/params)
MUST be rejected. Without this, a no-op validator passes every
drop-required-field test silently.
Core03 Hook taxonomy minimum
+ Reframed each of 6 hook tests as parametrized over a table of
(method, valid_payload, broken_payload, schema_file). Asserts
both that the valid case produces a KNOWN disposition (not
garbage) AND that the broken_payload is rejected by the
canonical hook payload schema. Falsifier-per-hook.
Core04 Dispositions
+ test_allow_response_without_required_envelope_fields_rejected —
synthesizes 5 broken allow responses (missing type, acs_version,
request_id, decision; bogus decision enum) and asserts each
fails response-envelope.json. Without this, the positive
"allow validates" test is tautological.
Core05 SessionContext + chain
* test_chain_externally_recomputable extended to THREE entries.
Old version only checked entry 1, which doesn't have a
previous_hash to fold in — a "chain that doesn't chain"
mutation produced the right value for the root and passed.
Now: entry 2's recomputed hash with previous_hash MUST match
what the Guardian published; AND must NOT match the computation
that ignores previous_hash. Entry 3 verifies transitive chaining.
Core08 Decision honoring
* Replaced wire-level "Guardian responds fast" check (wrong
property) with three adapter-side tests of the actual MUST:
- test_adapter_actually_applies_guardian_deny — Guardian
returns DENY, adapter MUST translate to deny, not allow.
- test_adapter_waits_for_a_slow_guardian — Guardian sleeps
1s, adapter MUST take at least 1s; an adapter that proceeds
without waiting is caught by elapsed-time check.
- test_fail_open_emits_audit_event — unchanged.
Core10 Wrapped MCP
* Strengthened the "no crash" test to also require the response
validates against response-envelope.json. A no-op Guardian
returning {} would no longer pass.
+ test_mcp_method_namespace_rejects_garbage_namespaces —
contradiction: methods outside the reserved namespaces (e.g.,
"arbitrary/method", "step/typo", "PROTOCOLS/upper") MUST be
rejected by the schema. Without this, "namespace pattern works"
is unverified.
Mutation-tested the strengthened suite. Four representative bugs were
injected one at a time; each was caught:
No-op _validate_request_envelope → 9 Core02 tests fail
compute_entry_hash ignores previous_hash → Core05 3-entry test fails at entry 2
Adapter silently fails open (no audit) → Core08 audit test fails
Adapter proceeds without waiting → Core08 deny + slow-guardian tests fail
Net: still 44 tests, but every one now has a verifiable falsifier.
Adopter who forks and breaks a Core MUST gets a precise failure with
spec citation, not a passing suite that papered over their bug.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Summary
Introduces a top-level
adapters/directory holding reference implementations that wire popular agent frameworks to an ACS Guardian through configuration only, with no agent code changes. Ships three working adapters with passing tests for each.What lands
adapters/claude-code/claude --printround-trip, ALLOW + DENY paths (test_live_claude_code.py)adapters/nat/function_middleware_invokeagainstnvidia-nat-core1.7.0 (test_live_nat_workflow.py)adapters/cursor/tests/live_verification.md(Cursor is a desktop app with no documented headless mode)adapters/example-guardian/Total: 40 automated tests + 1 documented manual verification procedure, all passing.
The adapter pattern
ACS-Core specifies what a hook event looks like on the wire and what the Guardian's decision looks like coming back. It does not dictate how a framework physically wires the interception in. Each adapter demonstrates the boundary choice for its framework:
hookSpecificOutput.permissionDecisionargv[1]permission(top-level, per-event) + exit code 2FunctionMiddlewareclassACSGuardianDenied(NAT 1.7.0) orInvocationAction.SKIP(NAT dev)All three send the same ACS JSON-RPC shape to the Guardian. The example Guardian (
adapters/example-guardian/example_guardian.py) is shared across all adapters — same wire format regardless of which framework emits the event.The top-level
adapters/README.mdcontains a step-by-step walkthrough with concrete JSON payloads at each step, a cross-adapter comparison table, and a flow diagram. Read that first.Claude Code adapter
Wire it up by editing
~/.claude/settings.json(seesettings.json.example); no code changes to your agent.Live verification:
tests/test_live_claude_code.pyspawnsclaude --printin a sub-process with a project-levelsettings.jsonwiring the adapter intoPreToolUse. Tests both:echocommand runs and the marker string appears in Claude Code's output.Both passing in ~18s.
Schema corrections discovered via the live test (real Claude Code differs from public docs):
hookSpecificOutput.permissionDecision = "deny", NOT top-leveldecision: "block".tool_response(object), NOTtool_output(string).tool_use_id,effort,duration_msnot mentioned in public docs.NAT adapter (NVIDIA Agent Toolkit)
Real NAT middleware class. Installs via
pip install nvidia-nat-core, configured in NAT workflow YAML:Live verification:
tests/test_live_nat_workflow.pyexercises NAT's actual orchestration method (FunctionMiddleware.function_middleware_invoke) — the same code path NAT's runtime calls when a function with middleware is invoked. Tests prove the load-bearing property: when the Guardian denies, the target function does not execute (the test's side-effect counter stays at 0).Covers: allow / deny / fail-closed / fail-open. 5/5 passing against
nvidia-nat-core1.7.0.Schema corrections discovered while building:
InvocationAction.SKIPis on the NAT dev branch, NOT in 1.7.0. Block by raising. Adapter feature-detects and prefers action-based path when available.FunctionMiddlewareBaseConfigwithname=class kwarg (NAT's TypedBaseModel registration). Plain PydanticBaseModelfails on@register_middleware.Cursor adapter
Real Cursor schema sourced from Cursor's own bundled
~/.cursor/skills-cursor/create-hook/SKILL.md. Maps all 20 documented Cursor hook events to ACSsteps/*methods.Cursor is a desktop application without a documented headless mode, so live verification is a documented manual procedure in
tests/live_verification.md. The procedure has been run end-to-end (5+ hooks flowed through the adapter, zero adapter errors); captured payloads from that reproduction are not committed because Cursor's events include session-identifying fields. Anyone with Cursor installed can reproduce.Why in-spec
adapters/and not separate reposSingle repo for spec + reference implementations on the first batch makes the spec evolve alongside the adapters that exercise it (the live tests on this PR found several real schema gaps between docs and actual behavior — that feedback loop is what makes the spec text trustworthy). When the pattern stabilizes and individual adapters need their own release cycle, splitting to separate repos is straightforward.
What this PR does NOT do
Running the tests
🤖 Generated with Claude Code