GitHub - gm2211/specify: Web capture toolkit for spec-based functional verification


   ███████╗██████╗ ███████╗ ██████╗██╗███████╗██╗   ██╗
   ██╔════╝██╔══██╗██╔════╝██╔════╝██║██╔════╝╚██╗ ██╔╝
   ███████╗██████╔╝█████╗  ██║     ██║█████╗   ╚████╔╝
   ╚════██║██╔═══╝ ██╔══╝  ██║     ██║██╔══╝    ╚██╔╝
   ███████║██║     ███████╗╚██████╗██║██║        ██║
   ╚══════╝╚═╝     ╚══════╝ ╚═════╝╚═╝╚═╝        ╚═╝

   Write specs. Validate behavior. Ship with evidence.

Specify turns functional requirements into machine-verifiable specs and runs an autonomous agent against them. Define what your app should do — pages, flows, assertions, API contracts — and Specify tells you what's met, what's not, and what's untested. Every assertion shows its work: expected value, actual value, raw output.

Cooperative QA: the agent runs, you watch the activity stream in the browser, flag what looks wrong, and the next run remembers. Per-spec memory, session transcripts, and a confidence model accumulate into optional learned skills when explicitly enabled.

No opinions about your test framework. No lock-in. Just structured truth.

Install

npm install
npm run build
(cd webapp && npm install && npm run build)   # builds the review UI into dist/webapp

The wrapper script at ./specify auto-builds on first run.

Quality Gates

npm run typecheck         # TypeScript for CLI, daemon, agent, and scripts
npm run typecheck:webapp  # TypeScript for the React review UI
npm run lint              # ESLint with TypeScript, SonarJS, security, Unicorn, and React rules
npm run format            # Prettier check
npm run quality           # Typecheck + lint

Local SonarQube/SonarCloud scanning is configured with sonar-project.properties. Start or point at a SonarQube server, ensure a Java runtime is installed for sonar-scanner, set SONAR_TOKEN when required, and run:

npm run sonar -- -Dsonar.host.url=http://localhost:9000

Quickstart

# 1. Generate a contract from existing capture data (or run `specify capture` first)
specify spec generate --input ./captures/my-app --output app.spec.yaml

# 2. Verify the implementation
specify verify --spec app.spec.yaml --url http://localhost:3000

# 3. Review results in the browser — flag what looks wrong, the next run remembers
specify review --spec app.spec.yaml

specify review opens the webapp shown above. Click any timeline event to flag it; flags become observations the agent reads as preamble next run.

Commands

Command	What
`spec generate`	Generate a spec from a capture directory
`capture`	Agent-driven capture from a live system (`--url`) or code (`--from code`)
`compare`	Live side-by-side comparison of remote vs local targets
`review`	Browser UI: narrative, activity stream, feedback, skill drafts
`verify`	Verify against a live target (`--url`) — emits a structured report
`replay`	Replay captured traffic against a target and diff results
`spec lint`	Structural validation (no captures needed)
`spec guide`	Authoring guide for LLM spec writers
`schema`	Emit JSON Schema for spec, report, or commands
`mcp`	MCP server — any LLM client can use Specify as a tool
`daemon`	Long-running HTTP inbox; other agents push verify/capture/compare jobs
`review --background` / `review --stop`	Daemonize or stop the review webapp
`human`	Interactive chat REPL

Run specify <cmd> --help for full flags. Source: src/cli/commands-manifest.ts.

Reports you can trust

Every validation report includes expected vs actual evidence for every assertion. No "100% passed, trust me" — you get the raw output, the exact match, and the assertion logic.

Formats: JSON (machine), Markdown (diff-friendly), HTML (interactive, filterable, single file).

| Status | Type           | Expected          | Actual                              |
|--------|----------------|-------------------|-------------------------------------|
| ✅     | text_contains  | spec validate     | ..."name": "spec validate", ...     |
| ✅     | json_path      | 0.1.0             | 0.1.0                               |
| ❌     | json_schema    | matches schema    | /items: must have >= 5 items        |

The learning loop

Specify is more than a one-shot verifier. Every run reads, writes, and refines state under <spec_dir>/.specify/:

.specify/
  memory/<spec_id>/<target_key>.json   # learned rows: quirks, playbooks, observations
  sessions.db                          # SQLite + FTS5 transcripts of every session
  confidence.json                      # accept/override tally per behavior
  specify.observations.yaml            # per-spec observations (user feedback + reflection)
  skill-drafts/<id>.md                 # optional learned-skill drafts
  skills/<name>/SKILL.md               # approved skills, replayed when enabled
  verify/verify-result.json            # latest agent run result

Memory rows (src/agent/memory-provider.ts, src/agent/memory.ts) persist across runs, scoped strictly by (spec_id, target_key) so staging and prod never cross-contaminate. The agent injects them into the next prompt as a preamble; subsequent runs read/update via memory_record + memory_list MCP tools.

Three context layers (src/agent/memory-layers.ts) are merged into every system prompt: user (~/.specify/memory.md), project (SPECIFY.md or CLAUDE.md), and per-spec (specify.observations.yaml). Missing layers are silently skipped.

Sessions store (src/agent/session-store.ts) indexes every event in SQLite with FTS5 so the agent (and you) can search prior runs by content.

Confidence model (src/agent/confidence-store.ts) tallies accept vs override per behavior id. The autonomy preset (ask_everything / ask_uncertain / autonomous) decides whether to ask before flagging, run silently, or skip.

Pattern miner → skill drafts (src/agent/pattern-miner.ts, src/agent/skill-synthesizer.ts) is experimental and disabled by default. Set SPECIFY_ENABLE_LEARNED_SKILLS=true to expose draft review endpoints and inject approved .specify/skills/<name>/SKILL.md entries into future runs.

Optional dialectic provider (src/agent/honcho-provider.ts) — when HONCHO_URL is set, an external dialectic user-model service is used instead of the file-backed memory provider. Optional env vars: HONCHO_APP (default specify), HONCHO_USER (default $USER), HONCHO_TOKEN. Without those vars, Specify uses the file-backed provider.

Cooperative QA via the review webapp

specify review --spec app.spec.yaml boots a Hono server with a React UI. The UI subscribes to a WebSocket of agent events and lets you flag rows inline.

Each flag is one of: note, important_pattern, missed_check, false_positive, ignore_pattern, file_bug. Behaviour (src/agent/feedback.ts):

writes an observation into specify.observations.yaml with source: user_feedback and the originating session id
updates the confidence store (important_pattern / file_bug reinforce; missed_check / false_positive / ignore_pattern override)
on file_bug, best-effort spawns bd create if available
when SPECIFY_ENABLE_LEARNED_SKILLS=true, important_pattern feedback can prompt the active agent to apply the same check to sibling behaviors

When SPECIFY_ENABLE_LEARNED_SKILLS=true, approved skill drafts surface in a dedicated panel:

MCP — use Specify from any LLM

# Local (stdio)
specify mcp

# Remote (HTTP)
specify mcp --http --port 8080

Claude Desktop / Cursor / Claude Code config:

{ "mcpServers": { "specify": { "command": "specify", "args": ["mcp"] } } }

Tools exposed include spec authoring helpers and bridge tools for the daemon (daemon_verify, daemon_submit, daemon_status).

Daemon — background agent

Run Specify as a long-lived background process. Idle = 0 tokens. Other agents (or chat bots, webhooks, CI runners) push jobs into an HTTP inbox; each job spawns an Agent SDK run, streams progress, and writes its structured result to disk.

specify daemon --port 4100
# → listens on 127.0.0.1:4100
# → writes a bearer token to ~/.specify/daemon.token on first start

Submit a verify job from any agent:

TOKEN=$(cat ~/.specify/daemon.token)

curl -s -H "Authorization: Bearer $TOKEN" \
     -H 'Content-Type: application/json' \
     -d '{"task":"verify","prompt":"Verify http://localhost:3000 against the spec.","spec":"/abs/path/spec.yaml","url":"http://localhost:3000"}' \
     http://127.0.0.1:4100/inbox
# → {"id":"msg_ab12","status":"queued","stream":"/inbox/msg_ab12/stream"}

# Stream agent events for this message (SSE)
curl -N -H "Authorization: Bearer $TOKEN" \
     http://127.0.0.1:4100/inbox/msg_ab12/stream

# Poll the final result (includes path to on-disk verify-result.json)
curl -s -H "Authorization: Bearer $TOKEN" \
     http://127.0.0.1:4100/inbox/msg_ab12

Endpoints (all require Authorization: Bearer <token> except /health):

Method	Path	Purpose
GET	`/health`	Liveness + active session count
POST	`/inbox`	Generic: `{task, prompt, spec?, url?, mode?, session?}`
GET	`/inbox`	Recent messages
GET	`/inbox/:id`	Status + result + `resultPath`
GET	`/inbox/:id/stream`	SSE stream of agent events
GET	`/events/stream`	SSE stream of all daemon events
GET	`/sessions`	Active persistent sessions
POST	`/sessions/:id/close`	Close a persistent session

Dispatch modes:

stateless (default) — fresh SDK run per message, bounded cost. Concurrent jobs run in forked worker processes up to --max-workers (default 2), each with its own Playwright/Chromium.
attach — injects into a persistent SDK session keyed by session. Holds context across messages; idle still uses 0 tokens. Always in-process, serial per session.

Live inspector: GET / on the daemon serves a zero-build HTML page that streams agent events, lists recent messages, and shows structured results. Prompts for the token on first load.

Deploy as a QA agent in Kubernetes

Specify ships a container image and a Terraform module so it can run as a long-lived QA agent inside a cluster. One pod per spec, PVC-backed memory that survives restarts, and pluggable triggers (k8s informer, webhook, or both).

module "qa" {
  source = "github.com/gm2211/specify//deploy/terraform/modules/specify-qa?ref=main"

  name      = "renzo-qa"
  namespace = "qa"

  target_url  = "http://renzo.app.svc.cluster.local:8080"
  spec_inline = file("${path.module}/specify.spec.yaml")

  discovery = { mode = "watch", namespaces = ["app"] }

  report_slack_webhook     = var.slack_webhook_url
  anthropic_api_key_secret = "anthropic-api-key"
}

Group	Pick exactly one
Target	`target_url` · `target_dns` · `target_cluster_ip` · `target_from_configmap`
Spec	`spec_inline` · `spec_url` (+ optional bearer) · `spec_git`
Discovery	`webhook` (default) · `watch` · `both` · `none`
Reports	`report_file_dir` (default) + optional `report_slack_webhook`

Self-describing install for agents. specify deploy describe --format=json emits a structured manifest: image coordinates, module ref, oneof groups, required Secrets, outputs, and an agent_install_recipe. Drop specify deploy print-tf <preset> into a consumer repo for a working .tf skeleton (minimal, watch-mode, webhook-mode, gitops-spec).

specify deploy describe --format=json | jq .
specify deploy print-tf watch-mode > specify-qa.tf

Worked examples live in deploy/terraform/examples/: minimal, watch-mode, gitops-spec. Each example is a runnable terraform apply directory with a per-example README.

The pod's /work PVC keeps everything the daemon learns:

Path	Content
`/work/.specify/memory/<spec_id>/<target>.json`	learned memory rows
`/work/.specify/sessions.db`	session SQLite + FTS5
`/work/.specify/skill-drafts/`	optional learned-skill drafts
`/work/.specify/skills/`	active skills replayed when `SPECIFY_ENABLE_LEARNED_SKILLS=true`
`/work/reports/`	per-run JSON reports (file sink)

See deploy/terraform/modules/specify-qa/README.md for the full input / output reference.

Spec format

YAML or JSON. Human-readable, machine-verifiable.

version: "1.0"
name: "My App"
description: "Behavioral contract for My App"

pages:
  - id: dashboard
    path: /dashboard
    title: "Dashboard"
    visual_assertions:
      - type: element_exists
        selector: "nav.sidebar"
        description: "Navigation sidebar is present"
    expected_requests:
      - method: GET
        url_pattern: "/api/v1/stats"
    scenarios:
      - id: user-login
        description: "User logs in and sees dashboard"
        steps:
          - action: fill
            selector: "#email"
            value: "{{email}}"
          - action: click
            selector: "button[type=submit]"
          - action: wait_for_navigation
            url_pattern: "/dashboard"
          - action: assert_visible
            selector: ".welcome-message"

variables:
  base_url: "${TARGET_BASE_URL}"

Self-verifying

Specify eats its own dogfood. The repo includes specify.spec.yaml — a spec for Specify itself — validated on every release.

License

GPL-3.0

Name		Name	Last commit message	Last commit date
Latest commit History 144 Commits
.beads		.beads
.claude-plugin		.claude-plugin
.claude		.claude
.githooks		.githooks
.github/workflows		.github/workflows
assets		assets
deploy/terraform		deploy/terraform
docs		docs
scripts		scripts
src		src
templates		templates
webapp		webapp
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc.json		.prettierrc.json
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
eslint.config.js		eslint.config.js
package-lock.json		package-lock.json
package.json		package.json
sonar-project.properties		sonar-project.properties
specify		specify
specify.spec.yaml		specify.spec.yaml
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Install

Quality Gates

Quickstart

Commands

Reports you can trust

The learning loop

Cooperative QA via the review webapp

MCP — use Specify from any LLM

Daemon — background agent

Deploy as a QA agent in Kubernetes

Spec format

Self-verifying

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Install

Quality Gates

Quickstart

Commands

Reports you can trust

The learning loop

Cooperative QA via the review webapp

MCP — use Specify from any LLM

Daemon — background agent

Deploy as a QA agent in Kubernetes

Spec format

Self-verifying

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages