GitHub - globulus/checkirai: Checkir AI is a spec-driven verification runtime using local LLMs

  / __|  | || |   | __|   / __|  | |/ /   |_ _|   | _ \     o O O  /   \   |_ _|
 | (__   | __ |   | _|   | (__   | ' <     | |    |   /    o       | - |    | |
  \___|  |_||_|   |___|   \___|  |_|\_\   |___|   |_|_\   TS__[O]  |_|_|   |___|
_|"""""|_|"""""|_|"""""|_|"""""|_|"""""|_|"""""|_|"""""| {======|_|"""""|_|"""""|
"`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-'./o--000'"`-0-0-'"`-0-0-'

Local LLMs + MCP-backed tools = test your builds locally without paying token costs! Parse specs, plan probes, collect evidence and return requirement-level verdicts.

Checkir AI is a spec-driven verification runtime: it reads a human-readable spec, plans probes, runs tools (including MCP-backed capabilities) and returns requirement-level verdicts — pass, fail, inconclusive, or blocked — with evidence you can inspect offline. LLM-assisted phases default to Ollama on your machine; you can also use a remote provider (OpenAI-compatible HTTP API) for normalization and judging. Tool hosts connect over MCP the same way Cursor or Claude Code talks to other servers.

Why?

Tokens and API calls add up. Re-running the same “does the UI match the spec?” loop through a cloud model is slow and expensive.
Repeatable runs write structured reports, SQLite state and artifacts under a known output root — ideal for CI, dashboards and agent loops.. You can also repeat a run from any phase, further saving time and costs.
Local-first verification keeps sensitive URLs, traces, and artifacts on your machine while still using an LLM where judgment helps (planning, interpretation). Optional remote LLMs use your API key and base URL (see docs/USAGE.md and checkirai.config.json) - but the main idea is still to be able to run this locally.
CLI and MCP as the integration surface lets Cursor, Claude Code and other hosts treat verification as a first-class tool alongside Chrome DevTools, filesystem, etc.

Note that doing the same task (spec-based software verification using tools) with a remote agent is almost certainly going to be faster, but each and every run incurrs a cost local checks don't. Plus, unlike with a remote agent, you only need to do the spec -> IR -> executive plan pipeline once: the local artifacts allow you to re-run the judgement/verification phase solely, further conserving resources.

Use cases

Agent implement → verify → fix: After a coding agent changes an app, call verify (or the MCP verify_spec tool) and feed failures back into the next edit.
Human acceptance checks: Maintain a markdown spec next to the repo; run verification before merge or release.
Exploratory “what would we test?”: Use suggest_probe_plan over MCP to plan probes without executing a full run.
Local model hygiene: Use ollama status, model list, model suggest, and model pull so the right instruct/tool-capable model is available before a run.
Chrome DevTools MCP wiring: Use chrome-devtools list-tools / self-check to confirm your MCP server exposes the expected tool surface (see checkirai.config.json for project defaults).
Dart/Flutter MCP wiring: Use dart-mcp list-tools / self-check and the fixtures/flutter_app + fixtures/flutter-spec.md showcase for run_tests and driver-style verification.

Requirements

Node.js 22 or newer (engines in package.json)
pnpm (recommended; scripts assume it)
Ollama (optional but default for LLM-assisted phases) — install separately and start the daemon

Installation

1. Clone and install dependencies

git clone https://github.com/globulus/checkirai
cd checkirai
pnpm install

postinstall runs a TypeScript build so the checkirai bin can load dist/.

2. Make the CLI available globally (pick one)

From the repo root:

pnpm link --global

Or install this package globally:

pnpm add -g .

Confirm the binary is on your PATH (pnpm’s global bin):

pnpm bin -g
checkirai --help

Note: The published entrypoint is bin/checkirai.js and delegates to dist/. If you see a build error, run pnpm build manually.

3. Optional project config

Copy or edit checkirai.config.json (or .checkirai/config.json) in your project root for:

defaults — targetUrl, tools, outRoot, optional profile (selects a key from profiles for LLM overrides), plus runtime tuning: maxRunMs, runCommandAllowlist (prefix with * or full command line; empty means no run_command runs), allowShellMetacharacters (opt-in to shell metacharacters in allowlisted commands), stepRetries, stepRetryDelayMs, isolateProbeSessions (one session per probe), artifactMaxRuns (prune old per-run artifact folders).
llm — Shared: ollamaHost, allowAutoPull, requireToolCapable. Per role normalizer, plannerAssist, judge, triage: each has provider (ollama | remote | none), model, optional fallbackModel, temperature, maxRetries, timeoutMs, and when remote: remoteBaseUrl, remoteApiKey. There is no single global ollamaModel: "auto"; each role names an explicit model tag (defaults ship in src/llm/types.ts and the sample config).
profiles — Optional map (e.g. laptop_16gb) of partial per-role overrides merged on top of llm when defaults.profile or CHECKIRAI_PROFILE is set.
mcpServers — e.g. chrome-devtools or dart-mcp with command / args so checkirai verify can spawn the matching MCP server when --tools includes that integration token.

Web dashboard

The repo ships a local web UI plus a small API so you can kick off runs, watch progress, and browse results without living only in the terminal. checkirai.config.json defaults (timeouts, runCommandAllowlist, retries, isolation, artifact pruning, allowShellMetacharacters) are merged server-side when the request omits them. The LLM tab edits the full LlmPolicy (per-role providers, models, fallbacks, temperatures, Ollama host, auto-pull, require-tool-capable) and sends that object on verify_spec; the API still mergeLlmPolicyWithProjectProfile with the file, so profiles / defaults.profile apply on top. The General tab lists verifier capability ids (what probes may use) alongside the comma-separated tools tokens (what integrations are enabled). The model_catalog API (and Model catalog in the UI) includes a hardware block: total system RAM from the API host (os.totalmem()), a suggested profiles.* key (laptop_16gb / workstation_24gb / high_end_40gb per the LLM implementation plan), a RAM-filtered recommended model list, and—when that profile exists in the project file—a merged previewLlmPolicy you can apply in the dashboard.

Mode	Command	Notes
Development (API + Vite UI)	`pnpm web:dev`	UI: `http://127.0.0.1:5173` · API health: `http://127.0.0.1:8787/api/health`
Production build	`pnpm web:build`	Builds TypeScript + Vite static assets
Production serve	`pnpm web:start`	Serves built UI + API (`SERVE_STATIC_FROM=web/dist`)

For day-to-day work, pnpm web:dev is the usual choice.

CLI commands

Top-level program: checkirai (aliases in package.json: spec-driven-verifier, verify-app → same binary).

`checkirai verify`

Verify a target URL against a markdown spec (or restart from a previous run).

Option	Description
`--spec <path>`	Path to spec markdown (required unless restarting from `spec_ir` / `llm_plan` with `--restart-run`)
`--target <url>`	Base URL of the app under test (required)
`--tools <list>`	Comma-separated: `playwright-mcp`, `shell`, `fs`, `http`, `chrome-devtools`, `dart-mcp` (default `fs,http`)
`--dart-project-root <uri>`	Dart/Flutter project root (`file:` URI or absolute path) when using `dart-mcp`
`--dart-driver-device <id>`	Optional device id for `launch_app` preflight (driver-style runs)
`--out <dir>`	Output root (default `.verifier`)
`--policy <name>`	`read_only` or `ui_only`
`--llm-provider <p>`	`ollama`, `remote`, or `none` (default `ollama`). `none` turns off all four roles. `remote` is not fully selectable from flags alone—put per-role *`remote` fields in `checkirai.config.json`**; the CLI does not pass API keys on the command line.
`--ollama-host <url>`	Ollama HTTP API base URL (default `http://127.0.0.1:11434`); merged into policy for Ollama roles.
`--ollama-model <name>`	When set, overrides the `model` tag for all roles that use Ollama after config merge; omit to keep per-role models from `checkirai.config.json` / code defaults.
`--allow-auto-pull` / `--no-allow-auto-pull`	Allow pulling missing Ollama models
`--restart-from <phase>`	`start` · `spec_ir` · `llm_plan`
`--restart-run <runId>`	Parent run UUID when restarting

Exit codes: 0 pass · 1 fail · 2 inconclusive · 3 blocked.

`checkirai ollama status`

Check that Ollama is reachable.

Option	Default
`--host <url>`	`http://127.0.0.1:11434`

`checkirai model list`

List installed Ollama models.

Option	Default
`--host <url>`	`http://127.0.0.1:11434`

`checkirai model suggest`

Print recommended models (structured / tool-friendly output).

Option	Default
`--tooling` / `--no-tooling`	Prefer tooling-capable models (default on)

`checkirai model pull <modelName>`

Download a model via Ollama’s HTTP API (e.g. llama3.1:8b-instruct).

Option	Default
`--host <url>`	`http://127.0.0.1:11434`

`checkirai chrome-devtools list-tools`

Spawn a Chrome DevTools MCP server process and log the tools it exposes.

Option	Description
`--command <cmd>`	Required — executable to launch the MCP server
`--args <args>`	Space-separated arguments (optional)
`--cwd <cwd>`	Working directory (default: current directory)

`checkirai chrome-devtools self-check`

Verify the Chrome DevTools MCP server exposes the expected tool surface.

Option	Description
`--command <cmd>`	Required
`--args <args>`	Optional
`--cwd <cwd>`	Optional

`checkirai dart-mcp list-tools`

Spawn the Dart/Flutter MCP server process and log the tools it exposes.

Option	Description
`--command <cmd>`	Required — executable to launch the MCP server
`--args <args>`	Space-separated arguments (optional)
`--cwd <cwd>`	Working directory (default: current directory)

`checkirai dart-mcp self-check`

Verify the Dart MCP server exposes the expected tool surface.

Option	Description
`--command <cmd>`	Required
`--args <args>`	Optional
`--cwd <cwd>`	Optional

MCP server and Cursor

Checkir AI exposes an MCP server (stdio) so editors and agents can call verification as tools instead of shelling out.

Implementation: src/interfaces/mcp/server.ts (startMcpServer())
Tools: verification (verify_spec, restart_verify_spec, suggest_probe_plan, list_capabilities), run inspection (get_report, get_run_graph, get_artifact, explain_failure), and Ollama helpers (ollama_status, model_list, model_suggest, model_pull, model_ensure)

Start the server locally (stdio):

pnpm mcp

Optional: set CHECKIRAI_OUT to override the verifier output root (default .verifier).

Cursor: register the MCP server with node and an absolute path to dist/src/interfaces/mcp/bin.js, plus cwd on the clone root (built by pnpm install / pnpm build); or pnpm --silent mcp. Do not use plain pnpm mcp in the editor (script banner on stdout breaks MCP stdio). If args use a relative dist/... path and you see Cannot find module '/Users/…/dist/...' (home or wrong prefix), Cursor did not apply cwd to the child—switch the script path in args to an absolute path. If you use --import tsx and see ERR_MODULE_NOT_FOUND for tsx, fix cwd or install deps in that clone. See docs/USAGE.md for JSON snippets and verify_spec examples.

For end-to-end examples, probe output layout, and integration notes, docs/USAGE.md is the detailed guide.

Development scripts

Script	Purpose
`pnpm build`	Compile TypeScript to `dist/`
`pnpm dev`	Run CLI via tsx (`--help`)
`pnpm typecheck`	`tsc --noEmit`
`pnpm test`	Vitest
`pnpm lint` / `pnpm lint:fix`	Biome
`pnpm format` / `pnpm format:fix`	Biome formatter
`pnpm mcp`	MCP server (stdio)

Architecture overview

End-to-end, a run is a pipeline from natural-language intent to a frozen result. An LLM (Ollama by default, or remote) is used where structure and judgment are needed; deterministic code handles orchestration, policies, and parts of scoring.

Spec in — Markdown file, Spec bundle (inline markdown + URLs + files resolved to text), or a pre-built Spec IR object.
Normalize → Spec IR — The configured LLM turns prose into a structured intermediate representation: requirements, observables, and metadata the rest of the system consumes. Outputs are persisted (e.g. spec_ir artifacts) so a run is auditable and replayable.
Plan → test plan — The planner consults the capability graph for your --tools set (HTTP, filesystem, shell, Playwright / Chrome DevTools MCP, …). Verifier capabilities (e.g. navigate, read_ui_structure, run_command, call_http) are the atomic actions probes may request; see src/capabilities/types.ts (ALL_CAPABILITY_NAMES). An LLM (and/or procedural planners) produces executable steps aligned with what is actually available.
Execute — The executor bootstraps navigation to the run’s target URL when Chrome + navigate are available (so snapshots are not taken against whatever tab was already open). Between probes it can reset to that URL again to shed UI mutations; optional isolateProbeSessions uses one session per probe. run_command is allowlist-gated (default deny if the list is empty) and rejects shell metacharacters unless defaults.allowShellMetacharacters is true. Optional timeouts and step retries cap hung or flaky work.
Judge, triage & synthesize — Deterministic checks (including more observable kinds, URL from the page, HTTP evidence where present) and LLM judges (per-role Ollama or remote) assign per-requirement verdicts (pass / fail / inconclusive / blocked). Optional post-run triage uses the triage role. Optional depends_on on a requirement blocks dependents when a prerequisite fails. The runtime emits report.json, summary.md, and related rows for the dashboard and MCP tools.

flowchart LR
  MD[Markdown or bundle] --> N[LLM to Spec IR]
  IR[Spec IR object] --> P[Plan using tools]
  N --> P
  P --> X[Execute tools]
  X --> J[Judge]
  J --> S[Report and summary]

Starting from a checkpoint (“phases”)

You do not have to redo every expensive step. A parent run stores artifacts; a child run can restart from a saved phase by passing --restart-run with the parent’s run id and --restart-from:

`--restart-from`	Meaning
`start`	Full pipeline from spec input (default).
`spec_ir`	Reuse the parent’s frozen Spec IR—skip normalization LLM work; continue with planning and later stages.
`llm_plan`	Reuse the parent’s saved test-plan artifact—skip normalization and the main planning phase; continue with execution and judgement. Requires the same kind of setup as a full MCP + LLM generic loop (e.g. `chrome-devtools` or `dart-mcp` in `--tools` and an LLM provider other than `none`).

The same restartFromPhase / restartFromRunId fields exist on verify_spec over MCP and on the web API; over MCP you can also call restart_verify_spec with parentRunId (and optional overrides). Pick the phase that matches how much of the parent run you want to reuse when iterating on plans, tooling, or judges.

Status and contributing

This project is work in progress: behavior and APIs evolve almost daily as probes, judges and MCP integrations mature. If something is rough or undocumented, that is expected for now.

Contributions are welcome — issues, specs, probe ideas, and PRs that tighten verification or docs all help. For deeper context on the current MVP scope (including known limitations) read docs/USAGE.md.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
bin		bin
docs		docs
fixtures		fixtures
src		src
test		test
web		web
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
biome.json		biome.json
checkirai.config.json		checkirai.config.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Why?

Use cases

Requirements

Installation

1. Clone and install dependencies

2. Make the CLI available globally (pick one)

3. Optional project config

Web dashboard

CLI commands

`checkirai verify`

`checkirai ollama status`

`checkirai model list`

`checkirai model suggest`

`checkirai model pull <modelName>`

`checkirai chrome-devtools list-tools`

`checkirai chrome-devtools self-check`

`checkirai dart-mcp list-tools`

`checkirai dart-mcp self-check`

MCP server and Cursor

Development scripts

Architecture overview

Starting from a checkpoint (“phases”)

Status and contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Why?

Use cases

Requirements

Installation

1. Clone and install dependencies

2. Make the CLI available globally (pick one)

3. Optional project config

Web dashboard

CLI commands

checkirai verify

checkirai ollama status

checkirai model list

checkirai model suggest

checkirai model pull <modelName>

checkirai chrome-devtools list-tools

checkirai chrome-devtools self-check

checkirai dart-mcp list-tools

checkirai dart-mcp self-check

MCP server and Cursor

Development scripts

Architecture overview

Starting from a checkpoint (“phases”)

Status and contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`checkirai verify`

`checkirai ollama status`

`checkirai model list`

`checkirai model suggest`

`checkirai model pull <modelName>`

`checkirai chrome-devtools list-tools`

`checkirai chrome-devtools self-check`

`checkirai dart-mcp list-tools`

`checkirai dart-mcp self-check`

Packages