Skip to content

Add adapters/ with Claude Code, Cursor, and NAT reference implementations#22

Draft
bar-capsule wants to merge 9 commits into
Agent-Control-Standard:mainfrom
bar-capsule:bar/adapters
Draft

Add adapters/ with Claude Code, Cursor, and NAT reference implementations#22
bar-capsule wants to merge 9 commits into
Agent-Control-Standard:mainfrom
bar-capsule:bar/adapters

Conversation

@bar-capsule

@bar-capsule bar-capsule commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Summary

Introduces a top-level adapters/ directory holding reference implementations that wire popular agent frameworks to an ACS Guardian through configuration only, with no agent code changes. Ships three working adapters with passing tests for each.

What lands

Adapter Status Mapping Working adapter Tests Live verification
adapters/claude-code/ Reference implementation 13 unit + 2 automated live tests ✓ Automated: real claude --print round-trip, ALLOW + DENY paths (test_live_claude_code.py)
adapters/nat/ Reference implementation 7 integration + 5 live workflow tests ✓ Automated: exercises real function_middleware_invoke against nvidia-nat-core 1.7.0 (test_live_nat_workflow.py)
adapters/cursor/ Reference implementation 13 unit tests ✓ Manual reproduction procedure in tests/live_verification.md (Cursor is a desktop app with no documented headless mode)
adapters/example-guardian/ Shared test fixture n/a n/a n/a Used by all three adapters' integration tests

Total: 40 automated tests + 1 documented manual verification procedure, all passing.

The adapter pattern

ACS-Core specifies what a hook event looks like on the wire and what the Guardian's decision looks like coming back. It does not dictate how a framework physically wires the interception in. Each adapter demonstrates the boundary choice for its framework:

Adapter Interception mechanism Event dispatch Block mechanism
Claude Code Shell command per hook (settings.json) Event type in stdin JSON hookSpecificOutput.permissionDecision
Cursor Shell command per hook (hooks.json) Event type as argv[1] permission (top-level, per-event) + exit code 2
NAT In-process Python FunctionMiddleware class NAT's middleware pipeline Raise ACSGuardianDenied (NAT 1.7.0) or InvocationAction.SKIP (NAT dev)

All three send the same ACS JSON-RPC shape to the Guardian. The example Guardian (adapters/example-guardian/example_guardian.py) is shared across all adapters — same wire format regardless of which framework emits the event.

The top-level adapters/README.md contains a step-by-step walkthrough with concrete JSON payloads at each step, a cross-adapter comparison table, and a flow diagram. Read that first.

Claude Code adapter

Wire it up by editing ~/.claude/settings.json (see settings.json.example); no code changes to your agent.

Live verification: tests/test_live_claude_code.py spawns claude --print in a sub-process with a project-level settings.json wiring the adapter into PreToolUse. Tests both:

  • ALLOW path: benign echo command runs and the marker string appears in Claude Code's output.
  • DENY path: Guardian's destructive-Bash policy denies; Claude Code's response surfaces the block.

Both passing in ~18s.

Schema corrections discovered via the live test (real Claude Code differs from public docs):

  • PreToolUse output uses hookSpecificOutput.permissionDecision = "deny", NOT top-level decision: "block".
  • PostToolUse field is tool_response (object), NOT tool_output (string).
  • Real payloads include tool_use_id, effort, duration_ms not mentioned in public docs.

NAT adapter (NVIDIA Agent Toolkit)

Real NAT middleware class. Installs via pip install nvidia-nat-core, configured in NAT workflow YAML:

middleware:
  acs:
    _type: acs_guardian
    guardian_url: http://127.0.0.1:8787/acs
    default_deny: true

workflow:
  _type: react_agent
  middleware: [acs]

Live verification: tests/test_live_nat_workflow.py exercises NAT's actual orchestration method (FunctionMiddleware.function_middleware_invoke) — the same code path NAT's runtime calls when a function with middleware is invoked. Tests prove the load-bearing property: when the Guardian denies, the target function does not execute (the test's side-effect counter stays at 0).

Covers: allow / deny / fail-closed / fail-open. 5/5 passing against nvidia-nat-core 1.7.0.

Schema corrections discovered while building:

  • InvocationAction.SKIP is on the NAT dev branch, NOT in 1.7.0. Block by raising. Adapter feature-detects and prefers action-based path when available.
  • Middleware configs must inherit FunctionMiddlewareBaseConfig with name= class kwarg (NAT's TypedBaseModel registration). Plain Pydantic BaseModel fails on @register_middleware.

Cursor adapter

Real Cursor schema sourced from Cursor's own bundled ~/.cursor/skills-cursor/create-hook/SKILL.md. Maps all 20 documented Cursor hook events to ACS steps/* methods.

Cursor is a desktop application without a documented headless mode, so live verification is a documented manual procedure in tests/live_verification.md. The procedure has been run end-to-end (5+ hooks flowed through the adapter, zero adapter errors); captured payloads from that reproduction are not committed because Cursor's events include session-identifying fields. Anyone with Cursor installed can reproduce.

Why in-spec adapters/ and not separate repos

Single repo for spec + reference implementations on the first batch makes the spec evolve alongside the adapters that exercise it (the live tests on this PR found several real schema gaps between docs and actual behavior — that feedback loop is what makes the spec text trustworthy). When the pattern stabilizes and individual adapters need their own release cycle, splitting to separate repos is straightforward.

What this PR does NOT do

  • Does not modify any normative spec text. Adapters are reference implementations, not spec.
  • Does not introduce new profiles. The adapters exercise ACS-Core; profile-tier adapters (acs-audit, acs-crypto, etc.) layer on later.
  • Does not address ACS signing the outbound traffic. That's a deployment concern handled at the transport layer for these minimal adapters.

Running the tests

# Claude Code (requires `claude` CLI on PATH for the live tests)
cd adapters/claude-code && python3 -m unittest tests -v

# Cursor (unit tests only; manual reproduction per tests/live_verification.md)
cd adapters/cursor && python3 -m unittest tests -v

# NAT (requires pip install nvidia-nat-core)
cd adapters/nat && python -m unittest tests -v

🤖 Generated with Claude Code

…ions

Introduces a top-level adapters/ directory holding reference
implementations that wire popular agent frameworks to an ACS Guardian
through configuration only, with no agent code changes.

Three working adapters:

- adapters/claude-code/: shell-stdin adapter wired via Claude Code's
  settings.json. 13 unit/integration tests + 2 automated live tests
  that spawn `claude --print` against a project-level settings.json
  and exercise ALLOW and DENY paths end-to-end.

- adapters/cursor/: shell-stdin adapter wired via Cursor's hooks.json.
  Schema sourced from Cursor's bundled create-hook skill docs.
  13 unit tests against the shared example Guardian.
  Manual live-verification procedure documented in
  tests/live_verification.md (Cursor is a desktop app with no
  documented headless mode).

- adapters/nat/: in-process Python FunctionMiddleware for NVIDIA Agent
  Toolkit (nvidia-nat-core 1.7.0). Registered via @register_middleware,
  inherits FunctionMiddlewareBaseConfig with name="acs_guardian".
  7 integration tests against real NAT types + 5 live workflow tests
  exercising the actual function_middleware_invoke orchestration path.

Shared infrastructure:

- adapters/example-guardian/: minimal deterministic Guardian used by
  all three adapters' integration tests. Stdlib-only Python.
  Documented as a teaching artifact, not a production Guardian.

- adapters/README.md: framework -> adapter -> Guardian flow diagram,
  6-step walkthrough with concrete JSON payloads at every step,
  cross-adapter comparison table, behavior-contract explanation.

Total tests: 40 automated tests passing across all adapters
(13 + 2 claude-code, 13 cursor, 7 + 5 nat). Schema gaps between docs
and reality were closed via the live tests for Claude Code and NAT;
Cursor was manually verified through a reproduction procedure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bar-capsule bar-capsule changed the title Add adapters/ with Claude Code reference implementation, NAT + Cursor scaffolds Add adapters/ with Claude Code, Cursor, and NAT reference implementations Jun 15, 2026
Normalize directory layout so all three adapters have identical
structure (file naming differs only where the framework's own naming
convention dictates, e.g. config file extensions):

  adapters/<framework>/
  ├── README.md
  ├── acs_adapter.py             (was: cursor_adapter.py, acs_middleware.py)
  ├── mapping.md
  ├── <config>.example           (settings.json / hooks.json / workflow.yml)
  └── tests/
      ├── __init__.py
      ├── test_adapter.py
      ├── test_live.py           (was: test_live_claude_code.py, test_live_nat_workflow.py)
      ├── example_payloads.md    (NEW: masked real-world payload examples)
      └── live_verification.md   (Cursor only — manual procedure)

Renames:
- adapters/cursor/cursor_adapter.py -> acs_adapter.py
- adapters/nat/acs_middleware.py -> acs_adapter.py
- adapters/claude-code/tests/test_live_claude_code.py -> test_live.py
- adapters/nat/tests/test_live_nat_workflow.py -> test_live.py

New config example files (parallels claude-code/settings.json.example):
- adapters/cursor/hooks.json.example
- adapters/nat/workflow.yml.example

New tests/example_payloads.md per adapter — masked real-world payload
examples sourced from actual sessions, with identifying fields replaced
by placeholders. Lets adopters see the actual schema each framework
emits (including fields not in the public docs for Claude Code and
Cursor) without committing any real session data. Each file uses a
consistent masking convention table at the bottom.

Cursor gets a placeholder tests/test_live.py that skips with a pointer
to live_verification.md, so the file naming stays identical across all
three adapters.

Top-level adapters/README.md now shows the consistent directory layout
as a code block so reviewers can see the parallel structure at a glance.

All 40 automated tests still passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@rocklambros rocklambros left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bar-capsule I went deep on this branch before writing anything. Checked it out, ran the three suites, read each adapter against the v0.1.0 schemas. The config-only pattern is the right call, and the per-adapter docs are honest about what's deferred. I want to flag a gap before "reference implementation" sticks, because people copy reference adapters line for line, and a few of these would ride along into their deployments. Marking this request-changes so the wire-format and fail-open items get a look before merge, not as a veto on the direction.

I checked all of this against the open issues first (#10 through #19). Those are spec-level: capability resolution, HMAC key management, the conformance program. Nothing below is a dupe. This is adapter-implementation ground. Three of mine have a protocol-level twin and I'll point those out.

The one that matters most: the adapters don't emit the v0.1.0 wire format. In claude-code/acs_adapter.py the request puts acs_version, request_id, timestamp, and metadata at the top level, but request-envelope.json wants them inside params and sets additionalProperties: false. A schema-validating Guardian rejects every request. timestamp goes out as epoch millis (int(time.time() * 1000)) where the schema asks for an ISO-8601 string. The payload uses tool / name / arguments where the hook schema wants the payload wrapper with arguments as {value, provenance}. Same shape in the cursor and nat adapters. The tests stay green because example_guardian.py reads params.get("tool") too, so both sides agree with each other and disagree with the spec. Nothing validates an emitted envelope against acs_schema.json. One test that does would have caught all of it, and it's the highest-leverage thing you can add here.

The deny path fails open on anything it doesn't recognize. In translate_response, an unknown or empty decision returns {}, which Claude Code and Cursor both read as "proceed." ACS_DEFAULT_DENY only kicks in on an exception (Guardian unreachable), not on a Guardian that answers with a verdict the map doesn't know. So a v0.2 disposition, a typo, even a trailing space surviving .lower(), all proceed. NAT already does the right thing and blocks under default_deny. The other two should match it. One line each.

No signing at all (this one's adjacent to #11, not the same). The adapters don't HMAC the envelope, and the example Guardian neither verifies a signature nor checks for replays. The READMEs call this "deferred to transport," but conformance.md:28 lists the baseline signature as a Core MUST, and :67 says transport doesn't satisfy it. #11 is about key distribution and rotation once you're signing. This is the step before: the reference ships with no signing, so every copy starts from an unauthenticated channel. The default http://127.0.0.1:8787/acs in the config keeps that invisible until someone repoints the URL at another host.

Delegation walks around the gate. SubagentStart isn't in HOOK_MAP, Claude Code can't block on it anyway, and example_guardian.py allows Task by default. A subagent spawn isn't evaluated before it acts. That's the adapter-level version of #16, and it lands on the exact confused-deputy path the subagent hooks were promoted to cover. At minimum I'd surface it in mapping.md's "not mapped" list instead of leaving it silent.

On the tests: "40 tests, all passing" is true on your machine, not in CI. NAT's 12 tests skip when nvidia-nat-core isn't installed (I got Ran 12 tests in 0.000s, OK (skipped=12)), Cursor's live test is a skip placeholder, and no workflow runs the adapter tests at all (only sync_version.yml). Skips read as passes, so a regression that lets a denied call through lands green. There's no requirements.txt under adapters/, and the NAT install is unpinned (pip install nvidia-nat-core, no ==). The NAT deny test also catches ACSGuardianDenied and returns without asserting the call was actually aborted, so it passes as long as something raised. This cluster worries me second-most, because green tests on a security control are worse than no tests. Pin the dep, run the unit tests in CI, and have the deny tests assert a real side effect didn't happen (the file wasn't written, the counter stayed at 0).

Smaller stuff, and I'm less sure these are worth blocking on:

  • example_guardian.py's regex misses rm -fr /, rm --recursive --force /, rm -rf ~, and find / -delete. It's labeled illustrative so I won't die on this hill, but it's the only thing a newcomer can run on day one, so I'd make it harder to fool or louder about being a toy.
  • A PostToolUse deny can't undo a side effect that already ran (Claude Code's own hook docs say PostToolUse can't block the action). Cursor's beforeReadFile returns {}, so a denied file read still happens. The pre-hooks are the only real gate, and the docs should say so plainly.
  • NAT's _build_request isn't inside the try/except, so a non-serializable kwarg throws before default_deny can catch it. And post_invoke ignores a result-side deny.
  • mapping.md and the code disagree on the deny shape for non-PreToolUse hooks. The doc says {"continue": false, "stopReason": ...}, the code emits {"decision": "block", "reason": ...}. One is wrong against Claude Code's contract.

None of this changes my read on the direction. I like where this is going, and the cross-adapter table in the README is genuinely useful. I'd hold the "reference implementation" label until the envelope matches the schema and the deny paths fail closed, since those are the parts people copy without reading the footnotes. Happy to send a PR with the schema-validation test, or pair on the envelope fix if that's faster.

Tracking: I cross-linked the delegation gap onto #16 and the no-signing gap onto #11 so the protocol-level and adapter-level views sit together. The rest of the findings here are adapter-specific with no matching issue.

(cc @afogel since the envelope and signing points touch conformance.md.)

@rocklambros

Copy link
Copy Markdown

Merge-order note: this should land after #21, and after the change-request items above are addressed.

  • The adapters' conformance posture marks system/ping and Wrapped MCP as "not implemented." That story only holds once Slim ACS-Core: relax MODIFY, system/ping, and wrapped MCP to SHOULD #21 relaxes those to SHOULD; before that the floor lists them as MUST, so the reference would be claiming conformance against a floor it doesn't meet.
  • This PR has changes requested (wire format and the fail-open deny path), so it shouldn't merge until those are resolved regardless of ordering.

No git conflict with #20 or #21 (this only touches adapters/), so the sequencing is about consistency, not rebasing. Overall order I'd suggest: #21, then #20, then #22. @bar-capsule

bar-capsule and others added 7 commits June 17, 2026 15:19
Rock's PR Agent-Control-Standard#22 review caught that the three reference adapters and the
example Guardian shared a wire format that diverged from
specification/v0.1.0/request-envelope.json: acs_version / request_id /
timestamp / metadata at the envelope's top level instead of inside
params, timestamp as epoch milliseconds instead of ISO-8601 string,
tool payload missing the required payload wrapper, arguments not
wrapped per tool-call-request.json. Tests passed because the adapter
and the example Guardian agreed with each other; the canonical spec
was outside the test loop.

This commit:

- Restructures every adapter's envelope to nest the AcsParams fields
  inside params, ISO-8601 timestamps, metadata.{agent_id, session_id}
  populated, payload wrapped per the relevant hook schema, arguments
  wrapped as {value: ...} per tool-call-request.json:26-37.
- Updates example_guardian.py to read from params.payload, gate the
  Task subagent tool by default, and expand the destructive-Bash
  regex set (rm -fr, --recursive --force, ~, --no-preserve-root,
  find / -delete / -exec rm, chmod 777 on system paths).
- Fixes the fail-open-on-unknown-disposition bug in claude-code and
  cursor translate_response; NAT pre_invoke and post_invoke now
  default-deny on unknown verdicts.
- NAT post_invoke now honors a Guardian deny verdict by clearing
  context.output and setting acs_post_invoke_redacted, matching
  Specification §6.4's output-redaction gate.
- NAT _build_request is now inside the try/except in both pre_invoke
  and post_invoke so build errors apply the same fail posture as
  transport errors.
- Adds tests/test_envelope_schema.py to each adapter. These validate
  every adapter-emitted envelope and per-hook payload against the
  canonical v0.1.0 JSON schemas loaded from $ACS_SPEC_DIR. They are
  hard-FAIL if the schemas are missing — not skipped — because spec
  validation is non-negotiable.
- Adds .github/workflows/adapter_tests.yml to run the schema + round-
  trip + live tests per adapter on every push and PR, with the spec
  schemas pulled from upstream Agent-Control-Standard/ACS:main.
- Pins nvidia-nat-core==1.7.0 (adapters/nat/requirements.txt) and
  jsonschema>=4.20,<5 (adapters/requirements-test.txt).
- Updates each adapter's README conformance table to be MUST-honest
  against docs/spec/conformance.md: handshake, baseline HMAC-SHA256
  integrity, replay nonce, system/ping, wrapped MCP are now marked
  ✗ not implemented, with citations. The previous "deferred to
  transport layer" claim for baseline integrity was inconsistent with
  conformance.md:28 and :67.

Test counts (all pass, zero hidden skips):
  claude-code: 17 schema + 13 round-trip
  cursor:      36 schema + 13 round-trip
  nat:          6 schema + 7 round-trip + 5 live (NAT 1.7.0)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ke, ping, audit

Critical-review pass against the spec, MUST by MUST. Closes the gaps
the previous commit (fbd5d7c) acknowledged but did not address.

Adapters now exercise the full ACS-Core wire surface, not just the
envelope shape:

- HMAC-SHA256 baseline signing (§10). HKDF-SHA256 derives a per-session
  key from ACS_HMAC_SECRET and session_id. Adapter signs every request;
  Guardian verifies every signed request and signs every response. JCS
  canonicalization with the signature field removed is the signed
  input, per §10. Unsigned/tampered → SIGNATURE_INVALID (-32004).
- Real rolling chain in example_guardian per §8.2. Per-session
  previous_hash tracking; entry_hash = SHA-256(JCS(entry) || prev_hash).
  Each session's chain head is computed and published on the response,
  covered by the signature per §8.6.
- Replay rejection (§10.3): duplicate request_id within a session
  → REPLAY_DETECTED (-32005). Per-session seen-id set.
- Timestamp skew rejection (§10.3): timestamps outside the negotiated
  skew window (default 300_000ms) → TIMESTAMP_OUT_OF_WINDOW (-32006).
- handshake/hello (§4): adapter sends ClientHello on first session call;
  ServerHello is cached in ~/.cache/acs-adapter-handshake/ keyed by
  (session_id, guardian_url). ServerHello carries on_decision_failure
  ("proceed", spec default per §6.4), skew_window_ms, profiles_accepted.
  Version mismatch → UNSUPPORTED_VERSION (-32001).
- system/ping (§13): Guardian always returns allow, doesn't enter the
  chain, doesn't require a signature, doesn't consume replay state.
- Fail-open audit events per §6.4: every adapter that proceeds without
  a decision emits a structured ACS_AUDIT JSON line so the bypass is
  visible, not silent. Deployments redirect or parse the line to feed
  a real audit sink.
- Default fail posture flipped to fail-open with audit (spec default
  per §6.4). Fail-closed is now explicit opt-in via ACS_DEFAULT_DENY=1.
  The previous default was the opposite of the spec default.
- request_id_ref in tool-call-result payloads (tool-call-result.json:19-23)
  is now populated by all three adapters via a deterministic uuid5
  derivation, so toolCallResult correlates to its originating
  toolCallRequest on the Guardian side.

Testing methodology gaps closed:

- format_checker added to every schema-validation test so format: uuid
  and format: date-time constraints are enforced. Previously they were
  annotation-only and a malformed value would pass.
- Response-envelope schema validation added: every Guardian response
  shape (allow, deny, handshake, ping, error) is validated against
  response-envelope.json.
- New tests/test_spec_compliance.py (20 tests) targets the Core MUSTs:
  rolling chain, replay, skew, signing (sign/verify/tamper-detect),
  handshake (success + version mismatch), system/ping (always-allow
  + chain-bypass), response-envelope validation.

Honesty fixes in the conformance documentation:

- NAT downgraded from "Reference implementation" to "Partial reference"
  in adapters/README.md and adapters/nat/README.md. NAT alone emits
  steps/toolCallRequest + steps/toolCallResult only — not the full
  6-hook minimum. A NAT deployment using only this adapter is not
  ACS-Core conformant on its own; documented explicitly.
- Cursor per-hook honesty table added: subagentStart, subagentStop,
  preCompact payloads are schema-valid but contain synthetic uuid5/
  sha256 placeholders because Cursor does not expose the required
  fields. The synthetic values satisfy the schema but do not carry
  the meaning the spec requires.
- Conformance tables across all three adapter READMEs now mark items
  ✓ that are actually verified by a test, with the test name cited.
  Items previously marked ✗ (handshake, baseline integrity, system/ping,
  replay enforcement, fail-open audit) are now ✓ with citation.

Shared helpers in adapters/_common/acs_common.py:
- jcs_canonicalize, derive_session_key (HKDF-SHA256)
- sign_envelope, verify_signature (HMAC-SHA256)
- iso8601_now, coerce_uuid, parse_iso8601
- audit_event (structured ACS_AUDIT line emitter)
- do_handshake (cached per-session ClientHello/ServerHello)
- ping (system/ping helper)

Test counts after this commit:
  claude-code: 13 round-trip + 17 envelope-schema = 30 + 2 fail-posture
  cursor:      13 round-trip + 36 envelope-schema = 49 + 1 fail-posture (1 manual skip is intentional)
  nat:         18 (7 unit + 5 live + 6 envelope-schema; require nvidia-nat-core==1.7.0 for the first 12)
  guardian:    20 spec-compliance (handshake, ping, chain, replay, skew, signing, response-envelope validation)

  All pass. Zero hidden skips.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ail-without-secret, full JCS

Responds to the post-fbd5d7c critical review by closing the gaps that
were documented as "still not compliant":

- NAT lifecycle middleware (A). ACSMiddleware now subscribes to NAT's
  IntermediateStepManager on first pre_invoke. WORKFLOW_START fires
  steps/sessionStart + steps/userMessage; WORKFLOW_END fires
  steps/agentResponse + steps/sessionEnd. NAT alone now satisfies the
  ACS-Core 6-hook taxonomy minimum (conformance.md:19); the previous
  "partial reference" status is gone. Verified in nat/tests/test_lifecycle.py.

- Cursor subagent + preCompact, honest (B). Stops fabricating UUIDs and
  hashes. Adapter now keeps a per-session state file (~/.cache/acs-adapter-session/)
  recording each step's request_id. subagentStart populates three of four
  required fields from real data — subagent_session_id (uuid5 of
  parent_session+subagent_id, stable across hooks), parent_session_id
  (the envelope's session_id, real), parent_step_id (last step the
  adapter has actually seen). Only intent_derivation stays a defensible
  hardcoded default ("derived_from_parent" for IDE-spawned subagents).
  preCompact's entries_to_compact comes from real seen step_ids.
  subagentStop is dropped from HOOK_MAP entirely: final_chain_hash is
  unknowable from Cursor (no chain on its side), and fabricating it is
  worse than omitting the hook.

- Guardian refuses to start without a signing secret (C). On
  startup, the Guardian requires ACS_HMAC_SECRET or ACS_HMAC_SECRET_FILE.
  ACS_DEV_MODE=1 overrides with a stderr warning that §10 baseline
  integrity is not satisfied. README points operators at
  `openssl rand -hex 32 > /etc/acs/hmac.key && chmod 600`. All test
  harnesses opt into ACS_DEV_MODE explicitly.

- Full RFC 8785 JCS via rfc8785 package (D). acs_common.jcs_canonicalize
  uses the rfc8785 PyPI package when installed (full RFC compliance
  including number edge cases) and falls back to the sorted-keys +
  compact-separators implementation otherwise. requirements-test.txt
  lists rfc8785>=0.1,<5.

- Chain entries use the client's timestamp (E). append_to_chain takes
  the request's params.timestamp (already skew-validated upstream)
  instead of stamping its own iso8601_now(). External observers can
  now fully recompute the chain from the request stream and the
  published chain_hash.

- ACS_HMAC_SECRET_FILE support (F). load_hmac_secret() resolves to a
  file path first, env var second. File path is preferred for
  production deployments (no exposure in ps eauxw, child-process
  envs, core dumps). Trailing whitespace stripped.

- Per-session state helper. acs_common.load_session_state /
  save_session_state / record_step provide a small JSON file in
  ~/.cache/acs-adapter-session/ for adapters that need to accumulate
  state across separate hook-process invocations (Cursor uses this
  for last_step_id and seen_step_ids; the same primitive is available
  to claude-code and NAT if they need it).

- Handshake re-reads env on every call. The Guardian's
  signature_algorithms_supported in ServerHello reflects the current
  ACS_HMAC_SECRET / _FILE value at handshake time, not at process
  start. Operators can rotate or set the secret without a restart.

Test counts (all pass, zero hidden skips):
  claude-code: 32 round-trip + envelope-schema
  cursor:      48 (47 + 1 intentional manual-Cursor skip)
  guardian:    20 spec-compliance (handshake, ping, chain, replay, skew, signing, response-envelope validation)
  nat:         20 (test_adapter 7 + test_live 5 + test_envelope_schema 6 + test_lifecycle 2)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two passes folded into one commit because they share helper code.

## Pass A — Security review (SECURITY.md threat model + tests + fixes)

12-threat model documented in adapters/SECURITY.md. 5 code fixes for
high-priority threats; 7 out-of-scope items documented honestly.

- T5 SSRF via ACS_GUARDIAN_URL: validate_guardian_url() allowlist
  (http/https only); called from every adapter's call_guardian and
  from do_handshake / ping in _common.
- T6 Guardian DoS via oversized request body: MAX_REQUEST_BODY_BYTES
  (1 MiB) cap, matching the handshake's max_payload_size_bytes
  advertisement. Refuses Content-Length > cap with HTTP 413.
- T7 leaky HMAC secret file: _check_secret_file_perms() refuses
  mode & 0o077 != 0, wrong owner, or symlink. Raises
  SecretFilePermissionsError instead of silently using a leaked key.
- T8 cache poisoning: save_session_state and do_handshake create
  cache dirs mode 0700 and files mode 0600 via os.open(O_CREAT, 0o600)
  so other local users can't read or poison adapter state.
- T9 regex DoS via huge command: scan_destructive_bash_safely()
  refuses inputs > 8 KiB, emits audit event, returns
  "input_too_large" sentinel — caller MUST treat as suspicious.

16 new tests in _common/tests/test_security.py, each named for the
specific attack it falsifies.

## Pass B — Harsh-reviewer audit + 3 production failure modes + fixes

Stepped back from the work, audited as a hostile reviewer. Identified
the 3 most-likely production failure modes; wrote tests that fail on
the current code; fixed the bugs.

- BUG Agent-Control-Standard#1 NAT _correlation_request_id collision: the old uuid5-from-
  (session, function, kwargs-hash) was deterministic. Two calls to
  the same tool with the same args (list_files(), get_status(), any
  idempotent tool — very common) produced identical request_ids and
  the Guardian's replay protection rejected the second with
  REPLAY_DETECTED. Fix: stash a fresh uuid4 on the context in
  pre_invoke; post_invoke reads it back. Per-call uniqueness + pre/
  post correlation both preserved.
- BUG Agent-Control-Standard#2 Guardian state lost on restart: GuardianState was RAM-only.
  A Guardian restart (deploy / OOM / autoscaling roll) wiped
  seen_request_ids, opening a replay window for every previously-sent
  envelope. §10.3 doesn't pause for the duration of a deploy. Fix:
  per-session state persisted to a JSON file under
  ACS_GUARDIAN_STATE_DIR (default ~/.cache/acs-guardian-state/),
  mode 0700/0600. State loads on first session-touch, persists on
  every mutation in check_replay and append_to_chain.
- BUG Agent-Control-Standard#3 lifecycle subscription race: _ensure_lifecycle_subscribed
  was a check-then-set with no lock. Two parallel pre_invoke calls
  (normal in NAT) both saw _lifecycle_subscribed=False and both
  subscribed; every WORKFLOW event then fired its ACS lifecycle hook
  twice. Fix: threading.Lock around the check-then-set, with re-check
  inside the lock.

4 new tests in nat/tests/test_failure_modes.py: 3 for the failure
modes above, 1 regression guard ensuring the BUG Agent-Control-Standard#1 fix preserves
pre/post correlation (post_invoke's request_id_ref must equal
pre_invoke's request_id, per tool-call-result.json:19-23).

## Test-strengthening: catching 2 mutations that previously slipped

Two mutation tests passed previously because of weaknesses in the
tests themselves:

- RollingChain::test_chain_hash_links_consecutive_requests only
  asserted hashes differed. Dropping previous_hash from the chain
  still produced different per-request hashes, so the mutation slipped.
  Strengthened test_chain_is_recomputable now EXTERNALLY recomputes
  the expected chain hash across 3 entries (now possible because the
  Guardian uses the client's timestamp) and asserts byte-equality.
  Also asserts the published hash does NOT match the "no
  previous_hash" computation.
- Cursor envelope-schema fixtures all used SESSION_UUID, so a
  skip-coercion mutation slipped. Added
  UuidCoercionForNonUuidCursorIds (2 tests) with conv-abc123 /
  chat_xyz / test-cc-session inputs, asserting the adapter coerces
  them to valid UUIDs and that the coercion is deterministic.

## Test counts after this commit (all green, zero hidden skips)

  _common:            16 security
  claude-code:        32 round-trip + envelope-schema
  cursor:             50 round-trip + envelope-schema + uuid-coercion
                      (1 intentional manual-Cursor skip)
  example-guardian:   20 spec-compliance
  nat:                24 (test_adapter 7 + test_live 5 +
                          test_envelope_schema 6 + test_lifecycle 2 +
                          test_failure_modes 4)

  Total: 142 tests.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Each item has a falsifying test in adapters/_common/tests/test_edge_cases.py
(17 tests total). Items not requiring code changes still have tests that
codify the safe behavior so a future regression would be caught.

## Items 1-12

Agent-Control-Standard#1  rfc8785 JCS consistency — test confirms fallback matches the
    rfc8785 package byte-for-byte on every ACS envelope shape we ship.
    No code change needed; a mixed-install signature mismatch would
    surface as test failure.

Agent-Control-Standard#2  Guardian regex DoS, server-side: _matches_destructive_bash now
    returns "too_large" for inputs > DESTRUCTIVE_SCAN_MAX_LEN (8 KiB).
    The Guardian denies with reason_codes=["input_too_large"] —
    fail-safe direction. Previously, _common had the cap but the
    Guardian iterated patterns directly, leaving the server unprotected.

Agent-Control-Standard#3  HA Guardian replay window: persist() now takes an exclusive flock
    on a .lock sidecar, re-reads on-disk state, merges (union of
    seen_request_ids / seen_nonces with earliest-timestamp wins), and
    atomically writes. check_replay re-reads the state on every call
    so Guardian A's writes are visible to Guardian B within one
    request. Cross-instance replay window closed under shared
    ACS_GUARDIAN_STATE_DIR.

Agent-Control-Standard#4  Unbounded seen_request_ids: switched to dict {rid: timestamp}.
    New evict_old_request_ids() drops entries older than 2 × skew
    window (replay impossible past skew anyway). check_replay calls
    eviction opportunistically every 100 inserts. Memory bound is now
    O(skew_window / inter-request-time), not unbounded. Backwards-
    compat for list-format state files preserved.

Agent-Control-Standard#5  Handshake cache TTL: do_handshake skips cache files older than
    ACS_HANDSHAKE_CACHE_TTL_SECONDS (default 3600s). Operator config
    changes propagate within the TTL.

Agent-Control-Standard#6  NAT id(context) collision: WeakKeyDictionary fallback for
    contexts that reject attribute assignment. Last-resort path
    (object isn't weak-referenceable either) returns a fresh uuid4
    per call and emits an audit event — pre→post correlation is
    lost in that path, but no silent collision.

Agent-Control-Standard#7  Unicode / NULL / surrogate round-trip: emoji, NULL bytes, multi-
    plane unicode all sign+verify cleanly. JCS handles them via
    UTF-8 encoding; no code change needed.

Agent-Control-Standard#8  ISO 8601 parse resilience: parse_iso8601 already accepts Z suffix,
    timezone offsets, millisecond + microsecond precision. Test
    codifies the accepted shapes + asserts garbage is rejected.

Agent-Control-Standard#9  ACS_GUARDIAN_HOST_ALLOWLIST: optional env-var allowlist that
    restricts validate_guardian_url to specific hostnames in addition
    to the http/https scheme check. Defense in depth against env-var
    attacks that smuggle a valid http:// URL to internal services.

Agent-Control-Standard#10 Cursor session-state file collision: _session_state_path now
    accepts an optional workspace parameter folded into the hash key.
    Cursor adapter passes the workspace_path / cwd so two Cursor
    windows with the same non-UUID conversation_id can't share state.

Agent-Control-Standard#11 Guardian envelope schema validation: if jsonschema + ACS_SPEC_DIR
    are available, every incoming envelope is validated against
    request-envelope.json before policy evaluation. Malformed envelopes
    rejected with -32600 Invalid Request. system/ping and
    handshake/hello exempt because their payload shapes differ.

Agent-Control-Standard#12 State-file hash length: bumped _session_state_path and
    _handshake_cache_path hashes from sha256[:16] (64-bit) to full
    sha256 (256-bit). Eliminates birthday collisions over deployment
    lifetime.

## Test counts after this commit (all green, 1 intentional manual skip)

  _common:            33 (16 security + 17 edge-cases)
  claude-code:        32
  cursor:             50
  example-guardian:   20
  nat:                24

  Total: 159 tests.

## Side-effects of the fixes

- Round-trip test fixtures updated to use real UUID session_ids
  (claude-code/test_adapter.py). Old "test-cc-session" fails the new
  Guardian-side envelope-schema check, which is correct — non-UUID
  session_ids never reached the Guardian from real Claude Code.
- Cursor adapter wires workspace through to load/save/record session
  state for Agent-Control-Standard#10 (new _workspace helper).
- example_guardian.py imports DESTRUCTIVE_SCAN_MAX_LEN from acs_common
  to keep the cap in one place.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Stops the 'we keep finding bugs' pattern by enumerating the contract
once and testing it. One file (adapters/test_acs_core_conformance.py),
one command, one MUST per test method, each docstring quoting the
spec line it falsifies.

Coverage maps directly to docs/spec/conformance.md ACS-Core (lines 13-26):

  Core01_Handshake (§4)
    handshake/hello returns ServerHello with required keys
    version mismatch -> UNSUPPORTED_VERSION -32001

  Core02_EnvelopeShape (§3, request-envelope.json)
    valid envelope passes canonical schema validation (with format-checker)
    jsonrpc must be literal "2.0"
    no additional top-level fields allowed
    params required {acs_version, request_id, timestamp, metadata, payload}
    metadata required {agent_id, session_id}
    request_id format: uuid
    timestamp format: date-time
    acs_version pattern semver
    method matches namespace pattern

  Core03_HookTaxonomyMinimum (conformance.md:19)
    all six of sessionStart, userMessage, toolCallRequest,
    toolCallResult, agentResponse, sessionEnd accepted

  Core04_Dispositions (§6, response-envelope.json conditionals)
    allow response shape
    deny requires reasoning
    modify requires reasoning + modifications (schema-level)
    ask requires reasoning + ask_details
    defer requires reasoning + defer_details

  Core05_SessionContext (§8)
    response carries chain_hash
    chain_hash is lowercase 64-hex SHA-256
    consecutive entries chain
    distinct sessions have independent chain heads
    chain externally recomputable from request stream

  Core06_ReplayProtection (§10.3)
    duplicate request_id -> -32005 REPLAY_DETECTED
    timestamp outside skew -> -32006 TIMESTAMP_OUT_OF_WINDOW
    same request_id across sessions is fine (per-session scope)

  Core07_BaselineIntegrity (§10)
    signed request accepted
    unsigned request rejected when secret configured -> -32004
    tampered request -> -32004 SIGNATURE_INVALID
    response is signed and verifies with HKDF key
    per-session HKDF distinct keys for distinct sessions, same for same
    signature covers session_id (cross-session signature lift rejected)

  Core08_DecisionHonoring (§6.4)
    Guardian responds within negotiated timeout (wire-level)
    fail-open emits ACS_AUDIT event (adapter-level, exercised via subprocess)

  Core09_SystemPing (§13)
    ping returns allow regardless of policy / signature / session state
    ping does not require signature
    ping payload includes {status, echo, server_timestamp}
    ping does not consume replay slot

  Core10_WrappedMcp (conformance.md:26)
    protocols/MCP/* method namespace validates
    Guardian returns structured envelope for MCP methods (partial — full
    MCP wrapping is documented as a follow-up; namespace + envelope OK)

44/44 pass on the current reference implementation. Spec source defaults
to /tmp/acs-spec-source/specification/v0.1.0/; set ACS_SPEC_DIR to
override. Hard-fails (does not skip) if schemas are missing — spec
validation is non-negotiable.

Why this matters: previous test layers (per-adapter, schema, security,
edge-cases, failure-modes) covered properties but didn't give an
adopter a single yes/no on Core conformance. test_acs_core_conformance
is the contract. If an adopter forks this and modifies it, they run
the file and either keep their Core claim or know exactly which MUST
they broke.

adapters/README.md now leads with this command.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Self-audit caught six tests that were positive-only (asserted the
happy path but had no falsifier). A no-op validator or a non-chaining
chain would have passed several of them. Mutation-tested the
strengthened versions: each named bug is now caught with a clear
failure message citing the spec.

Strengthening per group:

  Core02 Envelope shape
    + test_contradiction_validator_actually_works — a deliberately
      broken envelope ({"jsonrpc": "2.0"} with no method/id/params)
      MUST be rejected. Without this, a no-op validator passes every
      drop-required-field test silently.

  Core03 Hook taxonomy minimum
    + Reframed each of 6 hook tests as parametrized over a table of
      (method, valid_payload, broken_payload, schema_file). Asserts
      both that the valid case produces a KNOWN disposition (not
      garbage) AND that the broken_payload is rejected by the
      canonical hook payload schema. Falsifier-per-hook.

  Core04 Dispositions
    + test_allow_response_without_required_envelope_fields_rejected —
      synthesizes 5 broken allow responses (missing type, acs_version,
      request_id, decision; bogus decision enum) and asserts each
      fails response-envelope.json. Without this, the positive
      "allow validates" test is tautological.

  Core05 SessionContext + chain
    * test_chain_externally_recomputable extended to THREE entries.
      Old version only checked entry 1, which doesn't have a
      previous_hash to fold in — a "chain that doesn't chain"
      mutation produced the right value for the root and passed.
      Now: entry 2's recomputed hash with previous_hash MUST match
      what the Guardian published; AND must NOT match the computation
      that ignores previous_hash. Entry 3 verifies transitive chaining.

  Core08 Decision honoring
    * Replaced wire-level "Guardian responds fast" check (wrong
      property) with three adapter-side tests of the actual MUST:
        - test_adapter_actually_applies_guardian_deny — Guardian
          returns DENY, adapter MUST translate to deny, not allow.
        - test_adapter_waits_for_a_slow_guardian — Guardian sleeps
          1s, adapter MUST take at least 1s; an adapter that proceeds
          without waiting is caught by elapsed-time check.
        - test_fail_open_emits_audit_event — unchanged.

  Core10 Wrapped MCP
    * Strengthened the "no crash" test to also require the response
      validates against response-envelope.json. A no-op Guardian
      returning {} would no longer pass.
    + test_mcp_method_namespace_rejects_garbage_namespaces —
      contradiction: methods outside the reserved namespaces (e.g.,
      "arbitrary/method", "step/typo", "PROTOCOLS/upper") MUST be
      rejected by the schema. Without this, "namespace pattern works"
      is unverified.

Mutation-tested the strengthened suite. Four representative bugs were
injected one at a time; each was caught:

  No-op _validate_request_envelope          → 9 Core02 tests fail
  compute_entry_hash ignores previous_hash  → Core05 3-entry test fails at entry 2
  Adapter silently fails open (no audit)    → Core08 audit test fails
  Adapter proceeds without waiting          → Core08 deny + slow-guardian tests fail

Net: still 44 tests, but every one now has a verifiable falsifier.
Adopter who forks and breaks a Core MUST gets a precise failure with
spec citation, not a passing suite that papered over their bug.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@bar-capsule bar-capsule marked this pull request as draft June 18, 2026 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants