Universal audit harnesses for coding agents.
DeterminAgents is a portable library of self-discovering audit prompts. Hand one to an agent pointed at a repo; the agent discovers the project layout, runs a structured audit, and writes a report with severity, evidence, and concrete next steps.
What you get:
- Portable audits with no hardcoded service names or repo paths
- Repeatable harnesses: discovery phases, rubrics, report templates, and follow-up workflows
- Decision-ready artifacts in
docs/reports/, including optional system digests viaauto-report
Use it when a codebase is large enough that structure beats ad-hoc prompting.
Simple prompt + good harness > clever prompt + no harness.
Most of the lift in agentic work comes from the surrounding scaffolding — phases, severity rubrics, report formats, context overlays, fault-injection harnesses — not from clever wording inside the prompt. Every audit here is deliberately plain English; the structure around it does the work, and the structure is what gives the agent something to loop against until success criteria are met. The principle applies recursively to the library's own docs: trust over enumeration, defaults over variants. (See the v0.4 simplification commit for what that looked like applied to DeterminAgents itself.)
When this isn't worth it. A 200-line script or a one-off prototype doesn't need a phased audit with a severity rubric and a report file. Use direct prompts for trivial work.
curl -fsSL https://raw.githubusercontent.com/iansherr/determinagents/main/install.sh | shInstalls to ~/.determinagents/ (override with $DETERMINAGENTS_HOME) and a determinagents shim to ~/.local/bin/ (override with $DETERMINAGENTS_BIN).
determinagents version # what's installed
determinagents doctor # check the install is healthy
determinagents update # check for updates, show diff, apply with confirmation
determinagents materialize # install slash commands for your host tool
determinagents completions <shell> # print tab-completion script (bash, zsh, fish)
determinagents uninstall # remove the library (prompts for confirmation)
determinagents help # full command listAfter determinagents update: audit content (phases, severity rubrics, the doc bodies themselves) flows through automatically — materialized slash commands are thin pointers that re-read the audit doc each time they run. Re-run determinagents materialize only when a new behavior is added (new slash command), the shared invocation header changes, or the hub command template changes. Each release that requires re-materialization will say so at the top of its CHANGELOG entry.
Materialization now defaults to one canonical command family: use /determinagents for onboarding and /determinagents <behavior> [flags] for direct runs.
Troubleshooting wrong invocation: if you run determinagents --help while trying to execute an audit, you are on the installer CLI surface. Return to your host tool and run /determinagents <behavior> [flags] (example: /determinagents error-handling).
To pin a branch (e.g., dev for unreleased work):
curl -fsSL https://raw.githubusercontent.com/iansherr/determinagents/dev/install.sh | sh -s -- --branch=devAfter installing, the lowest-friction path:
-
Pick a repo — yours or anyone's. The read-only audits don't modify code.
-
Run an audit.
STUB_AND_COMPLETENESSis a good starter: it surfaces phantom endpoints, dead handlers, and silent failures on most codebases without needing build infrastructure. Hand this prompt to your coding agent (Claude Code, Cursor, Gemini, etc.):Run audits/STUB_AND_COMPLETENESS.md from $DETERMINAGENTS_HOME against this repo. Report to docs/reports/STUB_AUDIT_<YYYY-MM-DD>.md. -
Read the report at
docs/reports/. Every finding has a file:line, severity, and suggested fix. The report's## Next stepssection contains paste-ready follow-up prompts. -
Optional next moves: capture project-specific calibrations in
docs/determinagents/AUDIT_CONTEXT.mdso future audits skip known-false-positives (seespecs/BOOTSTRAP.md); or runaudits/RESOLVE_FROM_REPORT.mdto work through findings with per-finding approval and one commit per fix.
Once that loop is comfortable, browse the audits table below for other audits to try, or INVOCATIONS.md for canonical paste-ready prompts. To install as slash commands in your host tool, see INSTALL.md.
| Need | Run |
|---|---|
| Find stubs, phantom paths, and incomplete implementations | /determinagents stub |
| Run a security sweep | /determinagents security |
| Trace where a user action breaks | /determinagents data-flow --target=<flow> |
| Review runtime capacity and resource pressure | /determinagents resource-capacity |
| Find god-files and propose extraction seams | /determinagents structural-entropy |
| Find regression-prone complexity hotspots | /determinagents regression-surface |
| Don't know what to run — let it pick | /determinagents next |
| Set up and discover recursive improvement loops | /determinagents init-loops |
| Orchestrate recursive improvement loops automatically | /determinagents loop-orchestrator |
| Create a weekly or post-change system digest | /determinagents auto-report --mode=baseline |
| Work through report findings with approval gates | /determinagents resolve --report=<path> |
Reports are meant to be read by humans and reused by agents. A typical finding looks like:
### P1: Save failures are logged but not surfaced to the user
- Evidence: src/api/client.ts:42
- Impact: failed writes can leave the UI showing stale success state
- Suggested fix: return a typed error from saveProfile() and render retryable error state in SavePanel
- Next step: /determinagents resolve --report=docs/reports/ERROR_HANDLING_2026-05-11.mddeterminagents/
├── README.md # this file
├── INVOCATIONS.md # paste-ready prompts for every behavior
├── INSTALL.md # how an agent installs this library into a host tool
├── audits/ # the runnable docs
│ ├── STUB_AND_COMPLETENESS.md
│ ├── SECURITY_PENTEST.md
│ ├── DATA_FLOW_TRACE.md
│ ├── ERROR_HANDLING.md
│ ├── TEST_GAPS.md
│ ├── DOCS_DRIFT.md
│ ├── UX_DESIGN_AUDIT.md
│ ├── RESOURCE_CAPACITY.md
│ ├── SCENARIO_CHAINER.md # meta: chains findings into simulations
│ ├── STRUCTURAL_ENTROPY.md
│ ├── PICK_NEXT.md # meta: recommends what to run next
│ ├── RESOLVE_FROM_REPORT.md # mutating: works through report findings
│ ├── STRUCTURAL_REFACTOR.md # mutating: executes structural-entropy seams
│ ├── SECURITY_HUNT.md # mutating: agentic vulnerability hunting
│ ├── DATA_FLOW_VERIFY.md # mutating: observed-vs-theorized data flow
│ ├── TESTING_CREATOR.md # mutating: writes new tests
│ └── HARNESS_CREATOR.md # mutating: generates verification harnesses
└── specs/ # conventions and per-project artifact specs
├── FORMAT.md # how to author a new audit; harness conventions
├── HARNESS_STUBS.md # boilerplate for common harnesses
├── BOOTSTRAP.md # how to generate AUDIT_CONTEXT.md (cold + warm)
├── FEATURE_REGISTRY.md # spec for the per-project feature registry
├── AUTOMATED_REPORTING.md # project-facing system digest orchestrator
├── SIGNAL_SCHEMA.md # JSON schema for auto-report signal output
├── AUDIT_CONTEXT_TEMPLATE.md # minimal starting overlay (Global only)
├── AUDIT_CONTEXT_SECTIONS.md # catalog of audit-specific sections (copy as needed)
└── MAINTENANCE.md # maintainer-only: keep the library current (refresh / integrate / brainstorm)
| Audit | Finds |
|---|---|
| audits/STUB_AND_COMPLETENESS.md | Phantom endpoints, dead handlers, silent error swallowing, compiled-without-source files |
| audits/SECURITY_PENTEST.md | Auth bypass, injection, IDOR, hardcoded secrets, JWT issues, exposed internals |
| audits/DATA_FLOW_TRACE.md | Where a user action breaks between UI, network, handler, and DB |
| audits/ERROR_HANDLING.md | Silent catches, missing error UI, errors logged but not surfaced |
| audits/TEST_GAPS.md | Scenarios the test suite would miss — error paths, edge cases, integration boundaries |
| audits/DOCS_DRIFT.md | Claims in README and docs that the code no longer matches |
| audits/UX_DESIGN_AUDIT.md | CSS that violates DESIGN.md tokens — colors, spacing, radii, motion, typography |
| audits/DESIGN_HANDOFF_AUDIT.md | Audit design handoff bundles against target code, bypassing misleading READMEs |
| audits/RESOURCE_CAPACITY.md | Runtime-agnostic capacity and resource-pressure risks across k8s, docker/compose, bare metal, or unraid-style deployments |
| audits/STRUCTURAL_ENTROPY.md | God-files and god-modules. Severity is driven by responsibility count, fan-in/out, and change velocity — not LOC alone. Outputs seam proposals consumed by STRUCTURAL_REFACTOR.md |
| audits/REGRESSION_SURFACE.md | Regression-prone complexity hotspots: overlapping responsibilities, fragile error handlers, fallback ladders |
| audits/PICK_NEXT.md | Meta-audit. Recommends which audit to run next based on report staleness, recent git history, and AUDIT_CONTEXT.md cadence preferences. Writes no report by default |
Most audits run in 30–180 minutes at default scope, scaling with codebase size. Each audit doc supports --phases=N,M and --max-time=Xm to scope tighter.
SECURITY_PENTEST.md is the static half of security. For serious vulnerability discovery in codebases with build/test infrastructure, also use the agentic SECURITY_HUNT.md below.
| Doc | What it does | Prerequisites |
|---|---|---|
| audits/RESOLVE_FROM_REPORT.md | Works through findings in any audit report — one at a time, with per-finding approval, separate commits, and verification | An audit report exists at docs/reports/; clean working tree |
| audits/STRUCTURAL_REFACTOR.md | Executes seam proposals from a STRUCTURAL_ENTROPY report. Per-seam loop with contract-before-code gate; one contract commit + one move commit per seam; before/after dependency artifacts |
STRUCTURAL_ENTROPY report exists; disposable workspace; tests cover the target file |
| audits/SECURITY_HUNT.md | Agentic vulnerability hunting — agent gets execution capability to verify or refute bug hypotheses against one target file/function. Inspired by Mozilla's Firefox-hardening pipeline | Project builds locally; sanitizers configured; disposable workspace; AUDIT_CONTEXT.md SECURITY_HUNT section configured |
| audits/DATA_FLOW_VERIFY.md | Drives a real user flow end-to-end and observes wire traffic + DB state. The "observed" counterpart to DATA_FLOW_TRACE.md's "inferred" — catches silent layer drift static analysis misses |
Disposable workspace; app runs locally; AUDIT_CONTEXT.md DATA_FLOW_VERIFY section configured |
| audits/TESTING_CREATOR.md | Implements tests across four tiers — Adversarial, Chaos, Simulation, Forensics — beyond what TEST_GAPS.md covers |
Run TEST_GAPS.md and SECURITY_PENTEST.md first |
| audits/HARNESS_CREATOR.md | Deterministically generates verification harnesses (Playwright, Docker, Fuzzing) to prove/refute audit findings | An audit report exists; disposable workspace |
| audits/RECURSIVE_IMPROVEMENT.md | Autonomously design, execute, and evaluate experiments to improve a specific metric or solve an open-ended problem. Generates hypotheses, mutates code, and verifies against a harness. | Measurable goal; deterministic harness exists; disposable workspace |
Two read-only audits — ERROR_HANDLING.md and STUB_AND_COMPLETENESS.md — include an optional mutating Phase 6 (fault injection and endpoint verification respectively) that follows the harness conventions in specs/FORMAT.md. Use scope +harness to enable.
The standard workflow: run an audit (read-only) → review the report → run RESOLVE_FROM_REPORT to work through findings → re-run the audit to verify clean state. For security-sensitive fixes, optionally chain into TESTING_CREATOR Tier 1 (Adversarial) afterwards to add executable coverage.
These describe an artifact each project generates its own instance of.
| Spec | Project artifact | Purpose |
|---|---|---|
| specs/FEATURE_REGISTRY.md | docs/determinagents/FEATURE_REGISTRY.md |
Living catalog of every testable feature with URL, auth, steps, pass criteria, tags |
| specs/AUDIT_CONTEXT_TEMPLATE.md | docs/determinagents/AUDIT_CONTEXT.md |
Minimal starting overlay (Global only). Audit-specific sections come from AUDIT_CONTEXT_SECTIONS.md — copied in only when filled. |
| specs/AUTOMATED_REPORTING.md | docs/reports/SYSTEM_DIGEST_<YYYY-MM-DD>.md plus optional docs/reports/signals/SYSTEM_DIGEST_<YYYY-MM-DD>.json |
Read-only synthesis harness that turns existing audit reports and explicit runtime snapshots into decision-ready system digests. JSON follows SIGNAL_SCHEMA.md. |
Supporting docs: specs/FORMAT.md (audit authoring spec), specs/BOOTSTRAP.md (overlay generator workflow).
Every audit:
- Is read-only by default. Three mutating docs (
RESOLVE_FROM_REPORT.md,TESTING_CREATOR.md, andHARNESS_CREATOR.md) declare this prominently in their purpose sections. - Has phases so you can scope: run Phase 1 only for a quick pass, all phases for a deep pass.
- Classifies findings by severity (P0/P1/P2/P3) with concrete criteria.
- Emits a report with file:line references and concrete fixes — never "fix this."
- Reports go to
docs/reports/(in the target project) with a date-stamped name (e.g.,STUB_AUDIT_2026-05-09.md). - Reads
docs/determinagents/AUDIT_CONTEXT.mdfirst if it exists, to apply project-specific calibrations.
audits/UX_DESIGN_AUDIT.md assumes a DESIGN.md exists at the project root as the source of truth for design tokens. If your project doesn't have one, generate it first using the Google design.md spec:
- Spec & format: https://github.com/google-labs-code/design.md
- Overview: https://stitch.withgoogle.com/docs/design-md/overview/
- Format: https://stitch.withgoogle.com/docs/design-md/format/
The bootstrap prompt for DESIGN.md is in INVOCATIONS.md. The other six audits do not require DESIGN.md.
Thank you to Mozilla Security for publicly sharing Behind the Scenes: Hardening Firefox (May 2026). Their description of the agentic-harness pipeline, the inner-loop framing — "there is a bug in this part of the code, please find it and build a testcase" — and the severity-by-defect-class rubric directly shaped audits/SECURITY_HUNT.md and the broader v0.3 / v0.4 design. Open writeups from teams doing real production work like this is how the rest of us learn.
Thank you also to the frontier model engineers who keep saying — out loud, against the cultural reflex of secrecy and the collective instinct to grind for the perfect prompt — that working with an agent to improve a prompt produces better prompts than working alone. This library is an outgrowth of that practice: a personal collection of prompts that worked, refined over time, until the scaffolds of a standard set became visible. The spec emerged from the pattern, not the other way around. The hope now is that publishing it helps others skip a few of the same steps.
And to Andrej Karpathy, whose observation that "LLMs are exceptionally good at looping until they meet specific goals — don't tell it what to do, give it success criteria and watch it go" is the cleanest one-line statement of why the harness, not the prompt, does most of the work. (See also Forrest Chang's andrej-karpathy-skills for a compact CLAUDE.md distillation of the same observations.)
Orchestrated by Ian Sherr at Time Worthy Media.