agent-reliability-receipts

Worked examples of the agent-reliability plugin for Claude Code, run against public fixtures, with the tool's own output committed as the receipt. The point of this repo is simple: do not describe what the reliability skills do, show them doing it on something anyone can clone and rerun.

How it is laid out

fixtures/   systems built to be diagnosed, each labeled as a teaching fixture
receipts/   the tool's own output from running a skill against a fixture

A fixture is a small, self-contained, openly synthetic system with a real reliability problem. A receipt is what the plugin produced when pointed at that fixture: a verification script and a diagnostic report, committed unedited.

Start here

fixtures/invoice-extraction is an extractor that scores 100% on evaluation and silently drops to about 86% on format-shifted production input. Two commands, no model and no GPU, run from the repo root:

python3 fixtures/invoice-extraction/generate.py    # regenerate the data and metrics, seeded
python3 receipts/invoice-extraction/verify.py      # re-derive every number in the report

Then read receipts/invoice-extraction/REPORT.md for the autopsy the plugin wrote. verify.py re-scores the raw data from scratch rather than trusting the committed metrics, and exits non-zero if a single figure fails to reproduce, so the receipt checks itself.

Why fixtures, not a live system

Every number here is reproducible by anyone, on any machine, with no model and no hardware. That is the whole value of a receipt: you do not take the claim on faith, you run it. The fixtures are deliberately constructed and say so plainly, which is the honest way to demonstrate a tool, no hidden setup, no staged catch.

What is here, and what is landing

Skill	Fixture	Receipt
production-autopsy	invoice-extraction	shipped
calibration-guard	a confidence-emitting fixture	landing next
trajectory-eval	a multi-step agent fixture	landing next

One airtight receipt now, the other two on the way. Each lands the same way: the skill run against a public fixture, its output committed unedited.

The plugin

Install and docs: https://github.com/ByteStack-Labs/claude-plugins

Built by Jesse Moses at ByteStack Labs.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
docs/assets		docs/assets
fixtures/invoice-extraction		fixtures/invoice-extraction
receipts		receipts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

agent-reliability-receipts

How it is laid out

Start here

Why fixtures, not a live system

What is here, and what is landing

The plugin

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

agent-reliability-receipts

How it is laid out

Start here

Why fixtures, not a live system

What is here, and what is landing

The plugin

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages