Skip to content

iansherr/determinagents

Repository files navigation

DeterminAgents

Universal audit harnesses for coding agents.

DeterminAgents is a portable library of self-discovering audit prompts. Hand one to an agent pointed at a repo; the agent discovers the project layout, runs a structured audit, and writes a report with severity, evidence, and concrete next steps.

What you get:

  • Portable audits with no hardcoded service names or repo paths
  • Repeatable harnesses: discovery phases, rubrics, report templates, and follow-up workflows
  • Decision-ready artifacts in docs/reports/, including optional system digests via auto-report

Use it when a codebase is large enough that structure beats ad-hoc prompting.

Design principle

Simple prompt + good harness > clever prompt + no harness.

Most of the lift in agentic work comes from the surrounding scaffolding — phases, severity rubrics, report formats, context overlays, fault-injection harnesses — not from clever wording inside the prompt. Every audit here is deliberately plain English; the structure around it does the work, and the structure is what gives the agent something to loop against until success criteria are met. The principle applies recursively to the library's own docs: trust over enumeration, defaults over variants. (See the v0.4 simplification commit for what that looked like applied to DeterminAgents itself.)

When this isn't worth it. A 200-line script or a one-off prototype doesn't need a phased audit with a severity rubric and a report file. Use direct prompts for trivial work.

Install

curl -fsSL https://raw.githubusercontent.com/iansherr/determinagents/main/install.sh | sh

Installs to ~/.determinagents/ (override with $DETERMINAGENTS_HOME) and a determinagents shim to ~/.local/bin/ (override with $DETERMINAGENTS_BIN).

determinagents version             # what's installed
determinagents doctor              # check the install is healthy
determinagents update              # check for updates, show diff, apply with confirmation
determinagents materialize         # install slash commands for your host tool
determinagents completions <shell> # print tab-completion script (bash, zsh, fish)
determinagents uninstall           # remove the library (prompts for confirmation)
determinagents help                # full command list

After determinagents update: audit content (phases, severity rubrics, the doc bodies themselves) flows through automatically — materialized slash commands are thin pointers that re-read the audit doc each time they run. Re-run determinagents materialize only when a new behavior is added (new slash command), the shared invocation header changes, or the hub command template changes. Each release that requires re-materialization will say so at the top of its CHANGELOG entry.

Materialization now defaults to one canonical command family: use /determinagents for onboarding and /determinagents <behavior> [flags] for direct runs.

Troubleshooting wrong invocation: if you run determinagents --help while trying to execute an audit, you are on the installer CLI surface. Return to your host tool and run /determinagents <behavior> [flags] (example: /determinagents error-handling).

To pin a branch (e.g., dev for unreleased work):

curl -fsSL https://raw.githubusercontent.com/iansherr/determinagents/dev/install.sh | sh -s -- --branch=dev

First run

After installing, the lowest-friction path:

  1. Pick a repo — yours or anyone's. The read-only audits don't modify code.

  2. Run an audit. STUB_AND_COMPLETENESS is a good starter: it surfaces phantom endpoints, dead handlers, and silent failures on most codebases without needing build infrastructure. Hand this prompt to your coding agent (Claude Code, Cursor, Gemini, etc.):

    Run audits/STUB_AND_COMPLETENESS.md from $DETERMINAGENTS_HOME against
    this repo. Report to docs/reports/STUB_AUDIT_<YYYY-MM-DD>.md.
    
  3. Read the report at docs/reports/. Every finding has a file:line, severity, and suggested fix. The report's ## Next steps section contains paste-ready follow-up prompts.

  4. Optional next moves: capture project-specific calibrations in docs/determinagents/AUDIT_CONTEXT.md so future audits skip known-false-positives (see specs/BOOTSTRAP.md); or run audits/RESOLVE_FROM_REPORT.md to work through findings with per-finding approval and one commit per fix.

Once that loop is comfortable, browse the audits table below for other audits to try, or INVOCATIONS.md for canonical paste-ready prompts. To install as slash commands in your host tool, see INSTALL.md.

Choose a behavior

Need Run
Find stubs, phantom paths, and incomplete implementations /determinagents stub
Run a security sweep /determinagents security
Trace where a user action breaks /determinagents data-flow --target=<flow>
Review runtime capacity and resource pressure /determinagents resource-capacity
Find god-files and propose extraction seams /determinagents structural-entropy
Find regression-prone complexity hotspots /determinagents regression-surface
Don't know what to run — let it pick /determinagents next
Set up and discover recursive improvement loops /determinagents init-loops
Orchestrate recursive improvement loops automatically /determinagents loop-orchestrator
Create a weekly or post-change system digest /determinagents auto-report --mode=baseline
Work through report findings with approval gates /determinagents resolve --report=<path>

Reports are meant to be read by humans and reused by agents. A typical finding looks like:

### P1: Save failures are logged but not surfaced to the user
- Evidence: src/api/client.ts:42
- Impact: failed writes can leave the UI showing stale success state
- Suggested fix: return a typed error from saveProfile() and render retryable error state in SavePanel
- Next step: /determinagents resolve --report=docs/reports/ERROR_HANDLING_2026-05-11.md

Layout

determinagents/
├── README.md            # this file
├── INVOCATIONS.md       # paste-ready prompts for every behavior
├── INSTALL.md           # how an agent installs this library into a host tool
├── audits/              # the runnable docs
│   ├── STUB_AND_COMPLETENESS.md
│   ├── SECURITY_PENTEST.md
│   ├── DATA_FLOW_TRACE.md
│   ├── ERROR_HANDLING.md
│   ├── TEST_GAPS.md
│   ├── DOCS_DRIFT.md
│   ├── UX_DESIGN_AUDIT.md
│   ├── RESOURCE_CAPACITY.md
│   ├── SCENARIO_CHAINER.md     # meta: chains findings into simulations
│   ├── STRUCTURAL_ENTROPY.md
│   ├── PICK_NEXT.md            # meta: recommends what to run next
│   ├── RESOLVE_FROM_REPORT.md  # mutating: works through report findings
│   ├── STRUCTURAL_REFACTOR.md  # mutating: executes structural-entropy seams
│   ├── SECURITY_HUNT.md        # mutating: agentic vulnerability hunting
│   ├── DATA_FLOW_VERIFY.md     # mutating: observed-vs-theorized data flow
│   ├── TESTING_CREATOR.md      # mutating: writes new tests
│   └── HARNESS_CREATOR.md      # mutating: generates verification harnesses
└── specs/               # conventions and per-project artifact specs
    ├── FORMAT.md                  # how to author a new audit; harness conventions
    ├── HARNESS_STUBS.md           # boilerplate for common harnesses
    ├── BOOTSTRAP.md               # how to generate AUDIT_CONTEXT.md (cold + warm)
    ├── FEATURE_REGISTRY.md        # spec for the per-project feature registry
    ├── AUTOMATED_REPORTING.md     # project-facing system digest orchestrator
    ├── SIGNAL_SCHEMA.md           # JSON schema for auto-report signal output
    ├── AUDIT_CONTEXT_TEMPLATE.md  # minimal starting overlay (Global only)
    ├── AUDIT_CONTEXT_SECTIONS.md  # catalog of audit-specific sections (copy as needed)
    └── MAINTENANCE.md             # maintainer-only: keep the library current (refresh / integrate / brainstorm)

Available audits (read-only)

Audit Finds
audits/STUB_AND_COMPLETENESS.md Phantom endpoints, dead handlers, silent error swallowing, compiled-without-source files
audits/SECURITY_PENTEST.md Auth bypass, injection, IDOR, hardcoded secrets, JWT issues, exposed internals
audits/DATA_FLOW_TRACE.md Where a user action breaks between UI, network, handler, and DB
audits/ERROR_HANDLING.md Silent catches, missing error UI, errors logged but not surfaced
audits/TEST_GAPS.md Scenarios the test suite would miss — error paths, edge cases, integration boundaries
audits/DOCS_DRIFT.md Claims in README and docs that the code no longer matches
audits/UX_DESIGN_AUDIT.md CSS that violates DESIGN.md tokens — colors, spacing, radii, motion, typography
audits/DESIGN_HANDOFF_AUDIT.md Audit design handoff bundles against target code, bypassing misleading READMEs
audits/RESOURCE_CAPACITY.md Runtime-agnostic capacity and resource-pressure risks across k8s, docker/compose, bare metal, or unraid-style deployments
audits/STRUCTURAL_ENTROPY.md God-files and god-modules. Severity is driven by responsibility count, fan-in/out, and change velocity — not LOC alone. Outputs seam proposals consumed by STRUCTURAL_REFACTOR.md
audits/REGRESSION_SURFACE.md Regression-prone complexity hotspots: overlapping responsibilities, fragile error handlers, fallback ladders
audits/PICK_NEXT.md Meta-audit. Recommends which audit to run next based on report staleness, recent git history, and AUDIT_CONTEXT.md cadence preferences. Writes no report by default

Most audits run in 30–180 minutes at default scope, scaling with codebase size. Each audit doc supports --phases=N,M and --max-time=Xm to scope tighter.

SECURITY_PENTEST.md is the static half of security. For serious vulnerability discovery in codebases with build/test infrastructure, also use the agentic SECURITY_HUNT.md below.

Available creators (mutating — writes code)

Doc What it does Prerequisites
audits/RESOLVE_FROM_REPORT.md Works through findings in any audit report — one at a time, with per-finding approval, separate commits, and verification An audit report exists at docs/reports/; clean working tree
audits/STRUCTURAL_REFACTOR.md Executes seam proposals from a STRUCTURAL_ENTROPY report. Per-seam loop with contract-before-code gate; one contract commit + one move commit per seam; before/after dependency artifacts STRUCTURAL_ENTROPY report exists; disposable workspace; tests cover the target file
audits/SECURITY_HUNT.md Agentic vulnerability hunting — agent gets execution capability to verify or refute bug hypotheses against one target file/function. Inspired by Mozilla's Firefox-hardening pipeline Project builds locally; sanitizers configured; disposable workspace; AUDIT_CONTEXT.md SECURITY_HUNT section configured
audits/DATA_FLOW_VERIFY.md Drives a real user flow end-to-end and observes wire traffic + DB state. The "observed" counterpart to DATA_FLOW_TRACE.md's "inferred" — catches silent layer drift static analysis misses Disposable workspace; app runs locally; AUDIT_CONTEXT.md DATA_FLOW_VERIFY section configured
audits/TESTING_CREATOR.md Implements tests across four tiers — Adversarial, Chaos, Simulation, Forensics — beyond what TEST_GAPS.md covers Run TEST_GAPS.md and SECURITY_PENTEST.md first
audits/HARNESS_CREATOR.md Deterministically generates verification harnesses (Playwright, Docker, Fuzzing) to prove/refute audit findings An audit report exists; disposable workspace
audits/RECURSIVE_IMPROVEMENT.md Autonomously design, execute, and evaluate experiments to improve a specific metric or solve an open-ended problem. Generates hypotheses, mutates code, and verifies against a harness. Measurable goal; deterministic harness exists; disposable workspace

Two read-only audits — ERROR_HANDLING.md and STUB_AND_COMPLETENESS.md — include an optional mutating Phase 6 (fault injection and endpoint verification respectively) that follows the harness conventions in specs/FORMAT.md. Use scope +harness to enable.

The standard workflow: run an audit (read-only) → review the report → run RESOLVE_FROM_REPORT to work through findings → re-run the audit to verify clean state. For security-sensitive fixes, optionally chain into TESTING_CREATOR Tier 1 (Adversarial) afterwards to add executable coverage.

Per-project specs

These describe an artifact each project generates its own instance of.

Spec Project artifact Purpose
specs/FEATURE_REGISTRY.md docs/determinagents/FEATURE_REGISTRY.md Living catalog of every testable feature with URL, auth, steps, pass criteria, tags
specs/AUDIT_CONTEXT_TEMPLATE.md docs/determinagents/AUDIT_CONTEXT.md Minimal starting overlay (Global only). Audit-specific sections come from AUDIT_CONTEXT_SECTIONS.md — copied in only when filled.
specs/AUTOMATED_REPORTING.md docs/reports/SYSTEM_DIGEST_<YYYY-MM-DD>.md plus optional docs/reports/signals/SYSTEM_DIGEST_<YYYY-MM-DD>.json Read-only synthesis harness that turns existing audit reports and explicit runtime snapshots into decision-ready system digests. JSON follows SIGNAL_SCHEMA.md.

Supporting docs: specs/FORMAT.md (audit authoring spec), specs/BOOTSTRAP.md (overlay generator workflow).

Conventions

Every audit:

  • Is read-only by default. Three mutating docs (RESOLVE_FROM_REPORT.md, TESTING_CREATOR.md, and HARNESS_CREATOR.md) declare this prominently in their purpose sections.
  • Has phases so you can scope: run Phase 1 only for a quick pass, all phases for a deep pass.
  • Classifies findings by severity (P0/P1/P2/P3) with concrete criteria.
  • Emits a report with file:line references and concrete fixes — never "fix this."
  • Reports go to docs/reports/ (in the target project) with a date-stamped name (e.g., STUB_AUDIT_2026-05-09.md).
  • Reads docs/determinagents/AUDIT_CONTEXT.md first if it exists, to apply project-specific calibrations.

Companion: DESIGN.md

audits/UX_DESIGN_AUDIT.md assumes a DESIGN.md exists at the project root as the source of truth for design tokens. If your project doesn't have one, generate it first using the Google design.md spec:

The bootstrap prompt for DESIGN.md is in INVOCATIONS.md. The other six audits do not require DESIGN.md.

Acknowledgements

Thank you to Mozilla Security for publicly sharing Behind the Scenes: Hardening Firefox (May 2026). Their description of the agentic-harness pipeline, the inner-loop framing — "there is a bug in this part of the code, please find it and build a testcase" — and the severity-by-defect-class rubric directly shaped audits/SECURITY_HUNT.md and the broader v0.3 / v0.4 design. Open writeups from teams doing real production work like this is how the rest of us learn.

Thank you also to the frontier model engineers who keep saying — out loud, against the cultural reflex of secrecy and the collective instinct to grind for the perfect prompt — that working with an agent to improve a prompt produces better prompts than working alone. This library is an outgrowth of that practice: a personal collection of prompts that worked, refined over time, until the scaffolds of a standard set became visible. The spec emerged from the pattern, not the other way around. The hope now is that publishing it helps others skip a few of the same steps.

And to Andrej Karpathy, whose observation that "LLMs are exceptionally good at looping until they meet specific goals — don't tell it what to do, give it success criteria and watch it go" is the cleanest one-line statement of why the harness, not the prompt, does most of the work. (See also Forrest Chang's andrej-karpathy-skills for a compact CLAUDE.md distillation of the same observations.)


Orchestrated by Ian Sherr at Time Worthy Media.

About

Universal audit prompts for AI coding agents. Drop them in any repo, get a structured report. No project-specific config — agent does discovery first.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors