The control plane for spec-driven software delivery.
中文版 README | Project Vision / 项目愿景 | Roadmap
SpecOrch orchestrates AI coding agents with a spec-first, gate-first, evidence-driven approach. It connects intent, tasks, execution, verification, and evolution into a coherent control plane — so you can stop babysitting agents and start operating a delivery system.
Not a chatbot. Not a multi-agent playground. Not an IDE.
A control plane that makes software delivery orchestratable, verifiable, and self-improving.
Issue is not the requirement — Spec is the requirement. Merge is not done — Gate is done. Orchestration is not static — it evolves with every run. Prompt is advice — Harness is enforcement.
SpecOrch organizes the full delivery lifecycle into seven planes:
┌──────────────────────────────────────────────────────┐
│ Evolution traces → evals → harness improvement │
├──────────────────────────────────────────────────────┤
│ Control mission / session / PR / gate ops │
├──────────────────────────────────────────────────────┤
│ Evidence findings / tests / review / gate │
├──────────────────────────────────────────────────────┤
│ Execution worktree / sandbox / agent adapters │
├──────────────────────────────────────────────────────┤
│ Harness context contract / skills / policies │
├──────────────────────────────────────────────────────┤
│ Task plan DAG / waves / work packets │
├──────────────────────────────────────────────────────┤
│ Contract spec / scope / acceptance / freeze │
└──────────────────────────────────────────────────────┘
| Plane | Purpose |
|---|---|
| Contract | Freeze the spec — what to build, boundaries, acceptance criteria |
| Task | Decompose spec into executable task graph with dependencies |
| Harness | Make execution reliable: context contracts, policies, hooks, reactions |
| Execution | Run each task in isolated worktree/sandbox with pluggable agents |
| Evidence | Prove completion: gate evaluation, findings, deviations, reviews |
| Control | Operate the system: missions, sessions, PRs, dashboard |
| Evolution | Learn from evidence: evolve prompts, rules, strategies, policies |
See Seven Planes Architecture for full codebase mapping.
spec-orch discuss
# Interactive TUI brainstorming — type @freeze when readyOr create a Mission directly:
spec-orch mission create "WebSocket Real-time Notifications"
spec-orch mission approve websocket-real-time-notificationsspec-orch plan websocket-real-time-notifications
# Output: 4 waves, 7 work packetsWaves execute sequentially; work packets within a wave run in parallel.
One-shot CLI:
spec-orch run SON-20 --source linear --auto-prFull pipeline: load issue → plan → build → verify → review → gate → PR → Linear write-back.
Daemon mode — fully autonomous:
spec-orch daemon start --config spec-orch.toml --repo-root .Polls Linear for ready issues → readiness triage → build → verify → gate → PR → review loop.
Mission mode:
spec-orch run-plan websocket-real-time-notificationsspec-orch accept-issue SON-20 --accepted-by chrisYou verify results against spec — a compliance checklist and deviation summary, not raw diffs.
spec-orch retro websocket-real-time-notificationsGenerates retrospective from run evidence. The system learns and improves for next cycle.
| Adapter | Agent | Protocol |
|---|---|---|
codex_exec |
OpenAI Codex | codex exec --json |
opencode |
OpenCode | JSONL event stream |
claude_code |
Claude Code | stream-json output |
droid |
Factory Droid | ACP events |
acpx |
15+ agents | Agent Client Protocol |
Switch agents by changing one line in spec-orch.toml:
[builder]
adapter = "acpx"
agent = "opencode"
model = "minimax/MiniMax-M2.5"Define models once, compose a named chain, then let roles inherit it:
[llm]
default_model_chain = "default_reasoning"
[models.minimax_reasoning]
model = "MiniMax-M2.7-highspeed"
api_type = "anthropic"
api_key_env = "MINIMAX_API_KEY"
api_base_env = "MINIMAX_ANTHROPIC_BASE_URL"
[models.fireworks_kimi]
model = "accounts/fireworks/routers/kimi-k2p5-turbo"
api_type = "anthropic"
api_key_env = "ANTHROPIC_AUTH_TOKEN"
api_base_env = "ANTHROPIC_BASE_URL"
[model_chains.default_reasoning]
primary = "minimax_reasoning"
fallbacks = ["fireworks_kimi"]
[planner]
[supervisor]
adapter = "litellm"
[acceptance_evaluator]
adapter = "litellm"Fallbacks are used only for transient provider errors such as 429, 529,
overload, timeout, or temporary unavailability. Missing credentials or missing
base URLs remain explicit configuration failures.
Configurable merge conditions with profiles (full / standard / hotfix):
spec-orch gate evaluate SON-20 # Evaluate all conditions
spec-orch gate show-policy # Print gate policy
spec-orch explain SON-20 # Human-readable gate reportThe system improves itself after every run:
spec-orch evidence summary # Pattern analysis from historical runs
spec-orch harness synthesize # Auto-generate compliance rules
spec-orch prompt evolve # A/B tested prompt variants
spec-orch strategy analyze # Learned scoper hints
spec-orch policy distill # Zero-LLM deterministic scriptsSkill Discovery (SkillCraft-inspired): When [evolution.skill_evolver] enabled = true is set in spec-orch.toml, the system automatically discovers repeating tool-call patterns from builder telemetry across runs and saves them as reusable SkillManifest YAML files. Matched skills are automatically injected into builder context for future runs.
# spec-orch.toml — enable skill auto-discovery
[evolution.skill_evolver]
enabled = trueHow each mechanism activates:
| Mechanism | Activation | Configuration |
|---|---|---|
| SkillEvolver (save) | Config-driven, runs during evolution cycle | [evolution.skill_evolver] enabled = true |
| Skill Runtime (reuse) | Always active when .spec_orch/skills/ has YAML files |
No config needed |
| ContextRanker (hot/cold) | Always active in every ContextAssembler.assemble() call |
Budget via NodeContextSpec.max_tokens_budget |
| Memory compaction | Auto-runs every 10th _finalize_run() |
TTL default 30 days |
v0.5.2 — Alpha, post-core-extraction baseline.
The system is used to develop itself and improves itself with each iteration. 1751 tests collected, 65+ commands.
What works on main:
- Shared execution semantics across issue and mission paths
- Extracted
runtime_core,decision_core,acceptance_core,acceptance_runtime, andcontract_coreseams - Bounded acceptance graph runtime: graph profiles, stepwise prompt reveal, per-step artifacts, graph traces, and fixture candidate seeds
- Memory, evolution, and contract linkage now sit on top of the extracted seams instead of legacy ad hoc carriers
- Seven-plane architecture with closed-loop evolution (FlowEngine DAGs defined but
run_issue()uses direct sequencing; unification planned) - Spec-first approval gate:
run_issue()requires explicit spec approval by default (--auto-approveto bypass) - Pluggable builder/reviewer adapters (Codex, OpenCode, Claude Code, Droid, ACPX)
- ACPX unified adapter wrapping 15+ agents via Agent Client Protocol
- Fixture or Linear-backed issue loading with configurable issue sources
- Per-issue git worktree isolation
- Configurable verification (lint, typecheck, test, build) per project type
- Gate evaluation with profiles (full / standard / hotfix)
- Compliance engine with YAML-defined agent behavior contracts
- Daemon mode with readiness triage, review loop, merge check, retry
- GitHub PR auto-creation with gate as commit status
- Spec deviation tracking and structured findings
- Three-tier change management (Full / Standard / Hotfix)
- Web dashboard + Rich TUI (TypeScript/React/Ink)
- Mission Control Center with EventBus
- Conductor for progressive formalization
- Cross-session memory with SQLite WAL index + Qdrant semantic search (E2E verified) (ADR-0001)
- LLM-based memory distillation: expired episodic entries grouped by issue and consolidated into semantic summaries
- Builder telemetry ingestion: tool-call sequences stored in episodic memory during run finalization
- Human acceptance feedback:
accept-issuestores acceptance in memory for learning - Success trend aggregation:
get_trend_summary()provides windowed aggregate summary (success rate, top failure reasons) injected into LLM context - Enriched run summaries: builder adapter, verification results, and key learnings captured in semantic memory
- 4-layer memory (WORKING/EPISODIC/SEMANTIC/PROCEDURAL) with all layers active: PROCEDURAL consumed by ContextAssembler
- Full self-evolution: evidence analysis, harness synthesis, prompt evolution, policy distillation
spec-orch initfor project type detection and config generation- Low-cost model support (MiniMax-M2.5, ~$0.04/run)
spec-orch preflightone-click system health checkspec-orch selftestend-to-end smoke test with fixture issues- FlowRouter hybrid routing (static rules + LLM-based complexity analysis)
- KnowledgeDistiller: cross-run learning notebook (
.spec_orch/knowledge.md) (standalone, manual invocation) - ContextRanker: priority-aware context truncation replacing naive text slicing
- RunProgressSnapshot: pipeline stage checkpointing for daemon retry continuity
- SkillDegradationDetector: routing audit, baseline tracking (standalone, not yet wired into pipeline)
- TraceSampler: online evaluation sampling with configurable rules
- CompactRetentionPriority: architecture-aware context compression
- Atomic JSON writes across all state files (crash-safe daemon)
- Cross-platform file locking for evolution counter (POSIX + Windows)
- LifecycleEvolver protocol: unified 4-phase observe/propose/validate/promote for all 7 evolvers
- SkillEvolver: auto-discovers reusable builder tool-call patterns → SkillManifest YAML (SkillCraft-inspired)
- Skill Runtime: ContextAssembler loads + matches skills by trigger keywords, injects into builder context
- ContextRanker hot/cold separation: learning context (hints, skills, failure samples, procedures, trends) included in priority-based budget allocation
- Memory compaction + LLM distillation: episodic memory auto-expires after 30 days, run outcomes consolidated and distilled to semantic layer
- Layered memory architecture: filesystem truth source + SQLite WAL index + Qdrant semantic search (BAAI/bge-small-zh-v1.5 local embedding, E2E validated)
- Memory vNext shipped: entity relation layer (entity_scope/entity_id/relation_type), ProjectProfile + 4 learning views, hybrid retrieval (FTS5 + RRF), async derivation queue (ADR-0002)
- Memory subsystem refactored: MemoryService split into MemoryAnalytics + MemoryDistiller + MemoryRecorder, SQL-pushed date filters, soft-delete compaction, cross-process safe derivation queue
- CJK-aware text matching: jieba segmentation with graceful bigram fallback for non-CJK mixed content
- Modular CLI:
cli/package with 8 command submodules replacing single 4092-line file - LLM JSON output schema validation with fallback + observability events
# 1. Install
git clone https://github.com/fakechris/spec-orch.git
cd spec-orch
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
# 2. Configure environment — copy and edit .env
cp .env.example .env
# Edit .env: set SPEC_ORCH_LLM_API_KEY and SPEC_ORCH_LLM_API_BASE
# (see .env.example for provider-specific examples)
# 3. Initialize project config
spec-orch init # LLM-first detection (auto fallback to rules)
spec-orch init --offline # Force rule-based detection
spec-orch init --reconfigure # Re-detect and overwrite existing config
# 4. Verify everything works
spec-orch config check # Validate configuration
spec-orch discuss # Test LLM connectivity with interactive TUIMinimum required environment variables (set in .env or export):
| Variable | Purpose | Example |
|---|---|---|
SPEC_ORCH_LLM_API_KEY |
LLM provider API key (planner, discuss, triage) | sk-ant-xxx or MiniMax key |
SPEC_ORCH_LLM_API_BASE |
LLM API endpoint | https://api.anthropic.com |
SPEC_ORCH_LINEAR_TOKEN |
Linear issue tracking (optional for daemon) | lin_api_xxx |
See .env.example for full configuration reference.
pip install "spec-orch @ git+https://github.com/fakechris/spec-orch.git"
uv pip install "spec-orch @ git+https://github.com/fakechris/spec-orch.git"pip install spec-orch
pip install "spec-orch[all]"spec-orch --version # 0.5.2
spec-orch config check- Python 3.11+ (3.11, 3.12, 3.13 tested)
- Git (for worktree isolation)
- Builder CLI — one of: Codex, OpenCode, Droid, Claude Code, or any ACPX-compatible agent
- Linear API token (optional, for issue tracking)
- LLM API key (optional, for planning / review / triage)
| Extra | Packages | Use case |
|---|---|---|
planner |
litellm | discuss, plan, readiness triage |
dashboard |
fastapi, uvicorn | spec-orch dashboard |
slack |
slack-bolt | Slack discussion adapter |
memory |
qdrant-client, fastembed | Semantic vector search for memory recall |
cjk |
jieba | Chinese word segmentation for memory text matching |
all |
all of the above | Full feature set |
dev |
all + pytest, ruff, mypy, build, twine | Development |
SpecOrch is configured via spec-orch.toml. Run spec-orch init to auto-detect your project and generate config:
spec-orch init # LLM-first detection; auto fallback to rules
spec-orch init --offline # Force rule-based detection
spec-orch init --reconfigure # Re-run detection and overwrite existing config
spec-orch init --force # Force overwrite existing configspec-orch init persists selected detection mode into [init].detection_mode
inside spec-orch.toml for deterministic future reconfiguration.
To enable semantic vector search for memory recall, install the memory extra and add to spec-orch.toml:
[memory]
provider = "filesystem_qdrant"
[memory.qdrant]
mode = "local" # "local" | "memory" | "server"
path = ".spec_orch_qdrant"
collection = "spec_orch_memory"
embedding_model = "BAAI/bge-small-zh-v1.5"When provider = "filesystem_qdrant" is set, recall() uses Qdrant semantic search for EPISODIC and SEMANTIC layers while keeping the filesystem as the source of truth. Without the memory extra installed, the system silently falls back to filesystem-only mode.
See spec-orch.toml Reference and AI Config Guide for full documentation. See ADR-0001 for the memory architecture decision.
spec-orch discuss # Interactive brainstorming TUI
spec-orch mission create "Title" # Create mission + spec skeleton
spec-orch mission approve <id> # Freeze spec for execution
spec-orch mission status # List all missions
spec-orch contract generate <id> # Generate TaskContract from issuespec-orch plan <mission-id> # LLM scoper generates DAG
spec-orch plan-show <mission-id> # View wave/packet breakdown
spec-orch promote <mission-id> # Create Linear issues from plan
spec-orch pipeline <mission-id> # Show EODF pipeline progressspec-orch run <id> --source linear # Full one-shot pipeline
spec-orch run <id> --auto-approve # Skip spec approval, auto-approve
spec-orch run-plan <mission-id> # Execute plan with parallel waves
spec-orch run-issue <id> # Build + verify + gate (requires spec approval)
spec-orch run-issue <id> --auto-approve # Auto-approve spec and run
spec-orch daemon start # Autonomous daemon modespec-orch gate evaluate <id> # Evaluate gate conditions
spec-orch review-issue <id> # Review with verdict
spec-orch accept-issue <id> # Human acceptance
spec-orch explain <id> # Gate explanation report
spec-orch retro <mission-id> # Mission retrospectivespec-orch status <id> # Current run state
spec-orch status --all # All issues table
spec-orch dashboard # Web dashboard
spec-orch watch <id> # Real-time activity log
spec-orch config check # Validate configurationspec-orch evidence summary # Pattern analysis
spec-orch harness synthesize # Auto-generate rules
spec-orch prompt evolve # A/B tested prompts
spec-orch strategy analyze # Scoper hints
spec-orch policy distill # Zero-LLM scripts- Self-Evolution Architecture
- Pipeline Roles and Stages
- Orchestration Brain Design
- Context Contract Design
- Spec-Contract Integration
- Change Management Policy
- SDD Landscape & Positioning
- Directional Review (Agent Engineering)
This project is licensed under the MIT License. See LICENSE.