Skip to content

[anatrace-core-integration] anatrace-core integration (provenance swap + behavioral attestation)#322

Merged
rpatricksmith merged 14 commits into
mainfrom
feature/anatrace-core-integration
Jun 14, 2026
Merged

[anatrace-core-integration] anatrace-core integration (provenance swap + behavioral attestation)#322
rpatricksmith merged 14 commits into
mainfrom
feature/anatrace-core-integration

Conversation

@rpatricksmith

Copy link
Copy Markdown
Collaborator

Anatomia Proof — anatrace-core integration (provenance swap + behavioral attestation)

PASS · 26/26 assertions satisfied · 10/12 ACs met · 1 deviation

Contract Compliance

ID Says Status
A001 Provenance counts come from the published anatrace engine, pinned to a known version ✅ SATISFIED
A002 Anatomia no longer hand-parses agent transcripts ✅ SATISFIED
A003 Each provenance record states which engine version produced it ✅ SATISFIED
A004 Deriving the same session twice produces an identical record ✅ SATISFIED
A005 A captured session records a fingerprint of the transcript it was derived from ✅ SATISFIED
A006 An unreadable transcript still records who ran, never guessed numbers ✅ SATISFIED
A007 Older proof records written before this change still load ✅ SATISFIED
A008 Codex sessions now report the files they changed ✅ SATISFIED
A009 Raw transcript text never leaks into a committed provenance record ✅ SATISFIED
A010 An unpriced model is shown as unpriced, never as a guessed cost ✅ SATISFIED
A011 The price-table version shown is the one actually used to compute cost ✅ SATISFIED
A012 The transcript engine adds no network capability to the CLI ✅ SATISFIED
A013 Behavioral coverage is declared by the trusted launcher, not inferred ✅ SATISFIED
A014 A claim about sub-agents we did not capture is reported unverifiable, never satisfied ✅ SATISFIED
A015 A behavior we could not observe is never reported as satisfied ✅ SATISFIED
A016 A runtime test assertion is never faked as a behavioral pass ✅ SATISFIED
A017 The system never claims to have watched a sub-agent it did not capture ✅ SATISFIED
A018 Every agent session gets its own behavioral record — rework is never collapsed ✅ SATISFIED
A019 Each behavioral record is tied to the session that produced it ✅ SATISFIED
A020 Each behavioral record states which engine version judged it ✅ SATISFIED
A021 Each behavioral record states how much of the session it could actually check ✅ SATISFIED
A022 A broken transcript can never break saving your work ✅ SATISFIED
A023 Secrets in a session command never reach committed proof ✅ SATISFIED
A024 A behavioral concern is recorded as evidence — it never flips a passing run to failing ✅ SATISFIED
A025 The proof shows how the session behaved, in its own section ✅ SATISFIED
A026 When coverage was incomplete, the proof says so loudly ✅ SATISFIED

Deviations

A014: "A claim about sub-agents we did not capture is reported unverifiable, never satisfied" → "The soundness test for the delegate-inclusive arm augments a real adapter-extracted claim's subject to { kind: 'agent', selector: 'this', delegates: 'include' } before feeding it to runCompliance, rather than obtaining a delegate-inclusive claim directly from anatomiaAdapter.extract."
Reason: Probing core's actual behavior showed anatomiaAdapter does not emit any subject.delegates: 'include' claim from the current Anatomia agent-defs — every extracted claim carries an absent subject (the legacy flat session union). The spec's Step-1 plan assumed a delegate-inclusive claim would be available from the real mandate; it is not. The base mandate is still genuine adapter output; only the WHO-axis of one claim is set to the value the published ClaimSubject type defines.


Generated by Anatomia · Ship with proof.

Summary

  • Adds deterministic, coverage-aware behavioral attestation — verdicts about how an agent session behaved (egress, file-scope, verify-independence) — as the mirror of the existing provenance pipeline: a save-time producer writes one committed record per transcript, ana work complete assembles them onto the proof entry, and ana proof renders a new Session Attestation section.
  • The single correctness invariant — over-stated coverage must never produce satisfied — is built first and test-first in buildRootLaneContext: it declares a trusted-launcher root-only boundary and lets the published anatrace-core engine reconcile it against observed lineage, never fabricating a captured delegate lane.
  • Verdicts are evidence, never a gate: a violated verdict renders with a red glyph but never changes a proof's PASS/FAIL. Every record is scrubDeep'd before commit, so no transcript bytes (and no tokens) reach git history.
  • The producer is total: a malformed/unreadable transcript, an adapter exception, or a runCompliance failure leaves ana artifact save intact with the record simply absent. One record per transcript (keyed {role}-{session_id}) — rework is never collapsed.
  • Both harnesses are exercised: a Codex fixture drives the codex-blind channel path.

Pipeline Artifacts

  • Scope: .ana/plans/active/anatrace-core-integration/scope.md
  • Spec: .ana/plans/active/anatrace-core-integration/spec.md
  • Build Report: .ana/plans/active/anatrace-core-integration/build_report_2.md
  • Verify Report: .ana/plans/active/anatrace-core-integration/verify_report_2.md

Verification

  • Result: PASS
  • Phases: 2 verified
  • Tests: See verify report

Co-authored-by: Ana build@anatomia.dev

rpatricksmith and others added 14 commits June 13, 2026 16:28
Exact pin (AC1) so provenance derive + pricing source from the published
engine. Single transitive runtime dep is yaml (lockfile verified).

Co-authored-by: Ana <build@anatomia.dev>
pricing.ts becomes a re-export surface for core's PRICES, PRICE_TABLE_VERSION,
computeCost, and the TokenCounts/PriceEntry/CostResult types — the local table
and computeCost body are deleted. The table is byte-identical at 0.2.0, so no
displayed cost changes (AC6).

proof.ts threads { priceTable: PRICES } into both computeCost sites (:292,:464)
and sources the displayed table version from the returned CostResult instead of
the per-record stamp, so the label always matches the table actually used (AC6,
A011). The stamped version stays in committed JSON as a historical fact.

Golden card: the AC6 fix surfaces the real 10-char version "2026-06-08" (the
fixtures used a synthetic short "v3"); two stress fixtures are trimmed so the
realistic TOTAL footer fits 80 columns (rendering untouched, per spec).

Co-authored-by: Ana <build@anatomia.dev>
…gine

Delete the hand-rolled Claude/Codex derivers and their regex helpers
(deriveClaude, deriveCodex, readTranscriptLines, durationFromTimestamps,
parseTestCounts, toolResultText, readNumber, readObject). deriveTranscript now
reads the transcript bytes once, wraps them in a core NamedBlob, and runs
parseSession + deriveCounts — both synchronous (A002). readString stays
(readPendingPointer uses it). ProvenanceCounts is re-exported from core so
types/proof.ts keeps its import path; core's type adds derive_version (="3").

captureProvenanceAtSave reads the bytes once and stamps transcript_hash
(sha256 byte-identity attestation) on the SessionProvenance wrapper alongside
captured_at — present iff the transcript was readable, omitted (with derived)
when not. All core calls live inside the existing total try-catch (A13).

Closes session-capture-C12 (parseTestCounts best-effort), Codex files_touched=0,
and empty harness_version — all now delegated to core.

Tests re-baselined to invariants against core's actual output. Where core
re-baselines a literal it is documented inline (Codex input 300→220: core
subtracts cached from gross; duration 30000→28000: span of folded events).
Claude fixtures gain message.id (core dedups token usage by id) and tool_use_id
links (core gates test counts behind a command tool) — with those, the counts
match the old derive. Codex fixture gains a real patch_apply_end so files_touched
is derived > 0 (A008). New: transcript_hash present/absent honesty (A005/A006),
derive_version stamp (A003), legacy-record backward-compat read (A007). Test
fixtures across work/work-merge/proof/proof-card-golden gain derive_version to
satisfy core's required field.

Co-authored-by: Ana <build@anatomia.dev>
…freedom

Extend the capture-path no-network enforcement to the engine's own dependency
tree: read the installed anatrace-core package.json and assert its runtime deps
are a subset of { yaml } (A012) — fails loudly with the offending dep name if a
future bump adds a network-capable transitive dependency. Also assert the
anatrace-core dependency is pinned exact (A001).

Co-authored-by: Ana <build@anatomia.dev>
Co-authored-by: Ana <build@anatomia.dev>
Co-authored-by: Ana <build@anatomia.dev>
buildRootLaneContext builds the MandateEvaluationContext core evaluates
against — the single soundness hinge. Declares a trusted-launcher root-only
boundary and reconciles it against extractLineage so root is captured and
observed delegates stay uncaptured; never fabricates captured:true.

Adversarial soundness suite (test-first): delegate-inclusive negative →
unverifiable; unobserved Codex channel never satisfied (codex-blind);
runtime contract-matcher never satisfied (runtime-scoped); no delegate lane
ever captured. Reasons asserted by set membership, never a single literal.

Co-authored-by: Ana <build@anatomia.dev>
…re-boundary env

types/proof.ts: ComplianceAttestation (one durable, scrubbed behavioral record
per transcript) + ComplianceVerdictRecord, and an optional compliance?[] field
on ProofChainEntry — optional, never gates, proof valid without it (mirrors
process?). run.ts: buildCaptureEnv emits ANA_CAPTURE_BOUNDARY: 'root', the
trusted launcher declaring which lanes it captured (read by buildRootLaneContext).

Co-authored-by: Ana <build@anatomia.dev>
captureComplianceAtSave: the save-time producer, mirror of captureProvenance-
AtSave. Resolves the session (reads but never consumes the pointer — provenance
owns deletion and runs after), parses it, builds the mandate from the role's
agent-def + the work item's contract.yaml, hands core a sound root-only context,
runs runCompliance, and writes one compact scrubDeep'd record per transcript at
compliance/{role}-{session_id}.json. Total: any failure → null, save intact.

assembleComplianceAttestations: reads committed compliance/*.json from the
completed dir, skips unparseable, orders deterministically, never throws.

Tests: per-transcript keying (two sessions → two records), version/hashes/
coverage/framework shape, runtime contract-matcher never satisfied, secret
scrubbed from committed record, unreadable transcript → no record/no throw,
capture-off and missing-agent-def → no record, Codex exercised (codex-blind),
reader skip-unparseable + deterministic order.

Co-authored-by: Ana <build@anatomia.dev>
…sites

captureComplianceAtSave runs immediately BEFORE captureProvenanceAtSave at each
save site (provenance consumes the pending pointer; Codex has no env fallback
once it's gone). Its file is staged into the SAME separate non-artifact path
list as provenance — kept out of the no-changes guard, git reset on the no-op
path, folded into the commit pathspec only when artifacts actually changed
(cross-machine-provenance-C1).

Co-authored-by: Ana <build@anatomia.dev>
…proof entry

writeProofChain assembles the committed compliance records (capture-on only)
and conditionally spreads compliance[] onto the entry, alongside process. Emits
a loud chalk.yellow warning when any record has incomplete coverage — evidence,
never a gate (a violated verdict never changes proof.result).

Co-authored-by: Ana <build@anatomia.dev>
…ection

formatHumanReadable renders a Session Attestation section after Provenance when
entry.compliance is present: per-transcript satisfied/violated/unverifiable
counts, a coverage line, abbreviated mandate/transcript hashes, compact scrubbed
detail for notable verdicts, and a loud warning on incomplete coverage. The new
render helper is module-private (learn-session-memory-C1). Presentation only — a
violated verdict renders with a red glyph but never changes the PASS/FAIL
headline.

Tests: section render with counts/coverage/abbreviated hashes; violated verdict
leaves PASS unchanged; incomplete record renders the loud warning; no records →
no section; reworked roles get a stable index.

Co-authored-by: Ana <build@anatomia.dev>
Co-authored-by: Ana <build@anatomia.dev>
Co-authored-by: Ana <build@anatomia.dev>
@vercel

vercel Bot commented Jun 14, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
anatomia Ready Ready Preview, Comment Jun 14, 2026 12:01am

Request Review

@rpatricksmith rpatricksmith merged commit 8ff2a1b into main Jun 14, 2026
5 checks passed
@rpatricksmith rpatricksmith deleted the feature/anatrace-core-integration branch June 14, 2026 00:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant