verifiable-agent-demo

一个最小可运行 demo，展示 AI / Agent 工作流如何从 intent 生成 trace、evidence bundle、replay verdict 和 audit receipt。
A minimal runnable demo for auditable AI agent workflows.

This repository is the walkthrough demo for the execution-evidence path. It is the guided walkthrough surface across the stack, not the canonical architecture hub and not the canonical evidence-profile spec.

What this demo shows

这个仓库不是理论入口，而是“能跑出来”的可信 AI 工作流演示。它把一次 Agent 执行拆成可以复核的 artifact chain：

intent 输入：说明 Agent 要做什么
policy / rule reference：说明执行前参考了什么规则或治理约束
execution trace：记录执行过程和事件链
evidence bundle：把执行证据打包为可交付对象
replay / verification result：给出复核或回放结果
audit receipt：把本次执行的关键证据收束成审计收据

核心目标不是让 AI 回答更多内容，而是让一次 AI / Agent 执行过程可追踪、可复核、可审计。

Navigation

Evidence -> agent-evidence
Architecture -> digital-biosphere-architecture
Audit -> aro-audit
Governance -> token-governor

Quick Start

1. 最小本地路径

python3 -m demo.agent

默认输出写入 artifacts/demo_output/，包括：

interaction/intent.json
interaction/action.json
interaction/result.json
evidence/example_audit.json
evidence/result.json

2. 脚本包装路径

bash scripts/run_demo.sh

这个 wrapper 会刷新 artifacts/demo_output/ 下的本地 demo 输出。

3. Enterprise sandbox artifact chain

python3 examples/enterprise_sandbox_demo/run.py

这个路径展示从 intent 到 audit receipt 的更完整闭环，输出目录为 artifacts/enterprise_sandbox_demo/。

The receipt for the enterprise sandbox chain is checked through the canonical ARO surface aro_audit.receipt_validation with the minimal profile.

Generated Artifacts

Enterprise sandbox demo 会生成：

intent.json
policy.json
trace.jsonl
sep.bundle.json
replay_verdict.json
audit_receipt.json

这些 artifact 对应一条最小审计链：意图输入、策略约束、执行轨迹、证据包、回放判断、审计收据。

Why it matters

很多 Agent demo 只展示“模型能完成任务”。verifiable-agent-demo 展示的是另一件事：任务完成之后，是否还能说明它为什么被允许执行、执行时发生了什么、输出能否被复核，以及审计者能拿到什么证据。

这正是可信 AI / Agent Evidence / LangChain 工作流进入生产流程时需要补上的部分。

Screenshots / GIF

Audit evidence demo output:

Planned follow-up captures:

assets/demo-run.gif
assets/artifact-chain.png

See assets/README.md for the capture checklist.

For hiring managers

这个仓库证明我能把 AI Agent 工作流从 PoC 做成可交付、可复核、可审计的最小闭环：有 intent、有规则、有 trace、有 evidence bundle、有 replay verdict、有 audit receipt。

Current scope

verifiable-agent-demo 是执行证据路径的 walkthrough demo。它不是 canonical architecture hub、不是 canonical evidence-profile spec，也不是 audit control plane。

如果你只想看能跑的闭环，从本 README 的 Quick Start 开始。如果你想看 evidence profile 和 validator，去 agent-evidence。如果你想看更完整的架构地图，去 digital-biosphere-architecture。

Shared doctrine:

Sandbox controls execution; portable evidence verifies execution.

Governance decides what should be allowed.
Execution integrity proves what actually happened.
Audit evidence exports artifacts for independent review.

flowchart LR
    Persona["Persona (POP)"] --> Intent["Intent Object (AIP)"]
    Intent --> Governance["Governance Check"]
    Governance --> Trace["Execution Trace"]
    Trace --> Audit["Audit Evidence (ARO)"]

Existing demo paths

Fastest external demo path:

bash scripts/run_demo.sh
make killer-demo
python3 -m http.server --directory docs 8000

Existing CrewAI demo path:

bash scripts/setup_framework_venv.sh
.venv/bin/python crew/crew_demo.py

Environment notes:

Python 3 is sufficient for the minimal local path.
Refresh the tracked deterministic sample bundle with python3 scripts/refresh_demo_samples.py.
The optional CrewAI and LangChain paths should run from a git-ignored local .venv/ created by scripts/setup_framework_venv.sh.
The pinned framework helper environment currently uses crewai 1.10.1, langchain 1.2.12, and langchain-core 1.2.18.
CrewAI currently requires Python <3.14.
Both demo paths use deterministic local mock data and do not require external API calls.

Documentation

Research evaluation annex

The repository also includes a paper-ready evaluation harness for Execution Evidence Architecture for Agentic Software Systems: From Intent Objects to Verifiable Audit Receipts.

Primary entry points:

make eval-baseline
make eval-evidence
make eval-external-baseline
make eval-framework-pair
make eval-langchain-pair
make eval-ablation
make falsification-checks
make human-review-kit
make review-sample
make compare
make paper-eval
make top-journal-pack

The evaluation material is useful for deeper technical review, but it is secondary to the runnable demo path above.

Minimal reference surface

interaction/ for explicit interaction objects
evidence/ for audit and result artifacts
demo/ and crew/ for runnable entry points
integration/ for persona, intent, and ARO adapters
examples/enterprise_sandbox_demo/ for the intent-to-receipt artifact chain
docs/spec/ for schema notes and example payloads

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.github		.github
adapters		adapters
artifacts		artifacts
assets		assets
crew		crew
demo		demo
docs		docs
evaluation		evaluation
evidence		evidence
examples		examples
integration		integration
interaction		interaction
outreach		outreach
paper/latex		paper/latex
paper_eval		paper_eval
poster		poster
schemas		schemas
scripts		scripts
submission/ase2026		submission/ase2026
tests		tests
verifiable_agent		verifiable_agent
.gitignore		.gitignore
DEMO_RECEIPT_VALIDATION_REPORT.md		DEMO_RECEIPT_VALIDATION_REPORT.md
Makefile		Makefile
README.md		README.md
README.zh-CN.md		README.zh-CN.md
requirements-frameworks.txt		requirements-frameworks.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

verifiable-agent-demo

What this demo shows

Navigation

Quick Start

1. 最小本地路径

2. 脚本包装路径

3. Enterprise sandbox artifact chain

Generated Artifacts

Why it matters

Screenshots / GIF

For hiring managers

Current scope

Existing demo paths

Documentation

Research evaluation annex

Minimal reference surface

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

verifiable-agent-demo

What this demo shows

Navigation

Quick Start

1. 最小本地路径

2. 脚本包装路径

3. Enterprise sandbox artifact chain

Generated Artifacts

Why it matters

Screenshots / GIF

For hiring managers

Current scope

Existing demo paths

Documentation

Research evaluation annex

Minimal reference surface

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors 1

Languages

Packages