SpecOrch

The control plane for spec-driven software delivery.

中文版 README | Project Vision / 项目愿景 | Roadmap

SpecOrch orchestrates AI coding agents with a spec-first, gate-first, evidence-driven approach. It connects intent, tasks, execution, verification, and evolution into a coherent control plane — so you can stop babysitting agents and start operating a delivery system.

Not a chatbot. Not a multi-agent playground. Not an IDE.

A control plane that makes software delivery orchestratable, verifiable, and self-improving.

Core Insights

Issue is not the requirement — Spec is the requirement. Merge is not done — Gate is done. Orchestration is not static — it evolves with every run. Prompt is advice — Harness is enforcement.

Seven-Plane Architecture

SpecOrch organizes the full delivery lifecycle into seven planes:

┌──────────────────────────────────────────────────────┐
│  Evolution    traces → evals → harness improvement   │
├──────────────────────────────────────────────────────┤
│  Control      mission / session / PR / gate ops      │
├──────────────────────────────────────────────────────┤
│  Evidence     findings / tests / review / gate       │
├──────────────────────────────────────────────────────┤
│  Execution    worktree / sandbox / agent adapters    │
├──────────────────────────────────────────────────────┤
│  Harness      context contract / skills / policies   │
├──────────────────────────────────────────────────────┤
│  Task         plan DAG / waves / work packets        │
├──────────────────────────────────────────────────────┤
│  Contract     spec / scope / acceptance / freeze     │
└──────────────────────────────────────────────────────┘

Plane	Purpose
Contract	Freeze the spec — what to build, boundaries, acceptance criteria
Task	Decompose spec into executable task graph with dependencies
Harness	Make execution reliable: context contracts, policies, hooks, reactions
Execution	Run each task in isolated worktree/sandbox with pluggable agents
Evidence	Prove completion: gate evaluation, findings, deviations, reviews
Control	Operate the system: missions, sessions, PRs, dashboard
Evolution	Learn from evidence: evolve prompts, rules, strategies, policies

See Seven Planes Architecture for full codebase mapping.

User Story: From Idea to Merged Code

1. Discuss and Draft a Spec (Contract Plane)

spec-orch discuss
# Interactive TUI brainstorming — type @freeze when ready

Or create a Mission directly:

spec-orch mission create "WebSocket Real-time Notifications"
spec-orch mission approve websocket-real-time-notifications

2. Generate an Execution Plan (Task Plane)

spec-orch plan websocket-real-time-notifications
# Output: 4 waves, 7 work packets

Waves execute sequentially; work packets within a wave run in parallel.

3. Execute (Execution Plane)

One-shot CLI:

spec-orch run SON-20 --source linear --auto-pr

Full pipeline: load issue → plan → build → verify → review → gate → PR → Linear write-back.

Daemon mode — fully autonomous:

spec-orch daemon start --config spec-orch.toml --repo-root .

Polls Linear for ready issues → readiness triage → build → verify → gate → PR → review loop.

Mission mode:

spec-orch run-plan websocket-real-time-notifications

4. Human Acceptance (Evidence Plane)

spec-orch accept-issue SON-20 --accepted-by chris

You verify results against spec — a compliance checklist and deviation summary, not raw diffs.

5. Retrospective (Evolution Plane)

spec-orch retro websocket-real-time-notifications

Generates retrospective from run evidence. The system learns and improves for next cycle.

Key Components

Pluggable Agent Adapters

Adapter	Agent	Protocol
`codex_exec`	OpenAI Codex	`codex exec --json`
`opencode`	OpenCode	JSONL event stream
`claude_code`	Claude Code	stream-json output
`droid`	Factory Droid	ACP events
`acpx`	15+ agents	Agent Client Protocol

Switch agents by changing one line in spec-orch.toml:

[builder]
adapter = "acpx"
agent = "opencode"
model = "minimax/MiniMax-M2.5"

Multi-Model Defaults And Fallback Chains

Define models once, compose a named chain, then let roles inherit it:

[llm]
default_model_chain = "default_reasoning"

[models.minimax_reasoning]
model = "MiniMax-M2.7-highspeed"
api_type = "anthropic"
api_key_env = "MINIMAX_API_KEY"
api_base_env = "MINIMAX_ANTHROPIC_BASE_URL"

[models.fireworks_kimi]
model = "accounts/fireworks/routers/kimi-k2p5-turbo"
api_type = "anthropic"
api_key_env = "ANTHROPIC_AUTH_TOKEN"
api_base_env = "ANTHROPIC_BASE_URL"

[model_chains.default_reasoning]
primary = "minimax_reasoning"
fallbacks = ["fireworks_kimi"]

[planner]

[supervisor]
adapter = "litellm"

[acceptance_evaluator]
adapter = "litellm"

Fallbacks are used only for transient provider errors such as 429, 529, overload, timeout, or temporary unavailability. Missing credentials or missing base URLs remain explicit configuration failures.

Gate System

Configurable merge conditions with profiles (full / standard / hotfix):

spec-orch gate evaluate SON-20    # Evaluate all conditions
spec-orch gate show-policy        # Print gate policy
spec-orch explain SON-20          # Human-readable gate report

Self-Evolution Engine

The system improves itself after every run:

spec-orch evidence summary        # Pattern analysis from historical runs
spec-orch harness synthesize      # Auto-generate compliance rules
spec-orch prompt evolve           # A/B tested prompt variants
spec-orch strategy analyze        # Learned scoper hints
spec-orch policy distill          # Zero-LLM deterministic scripts

Skill Discovery (SkillCraft-inspired): When [evolution.skill_evolver] enabled = true is set in spec-orch.toml, the system automatically discovers repeating tool-call patterns from builder telemetry across runs and saves them as reusable SkillManifest YAML files. Matched skills are automatically injected into builder context for future runs.

# spec-orch.toml — enable skill auto-discovery
[evolution.skill_evolver]
enabled = true

How each mechanism activates:

Mechanism	Activation	Configuration
SkillEvolver (save)	Config-driven, runs during evolution cycle	`[evolution.skill_evolver] enabled = true`
Skill Runtime (reuse)	Always active when `.spec_orch/skills/` has YAML files	No config needed
ContextRanker (hot/cold)	Always active in every `ContextAssembler.assemble()` call	Budget via `NodeContextSpec.max_tokens_budget`
Memory compaction	Auto-runs every 10th `_finalize_run()`	TTL default 30 days

Status

v0.5.2 — Alpha, post-core-extraction baseline.

The system is used to develop itself and improves itself with each iteration. 1751 tests collected, 65+ commands.

What works on main:

Shared execution semantics across issue and mission paths
Extracted runtime_core, decision_core, acceptance_core, acceptance_runtime, and contract_core seams
Bounded acceptance graph runtime: graph profiles, stepwise prompt reveal, per-step artifacts, graph traces, and fixture candidate seeds
Memory, evolution, and contract linkage now sit on top of the extracted seams instead of legacy ad hoc carriers
Seven-plane architecture with closed-loop evolution (FlowEngine DAGs defined but run_issue() uses direct sequencing; unification planned)
Spec-first approval gate: run_issue() requires explicit spec approval by default (--auto-approve to bypass)
Pluggable builder/reviewer adapters (Codex, OpenCode, Claude Code, Droid, ACPX)
ACPX unified adapter wrapping 15+ agents via Agent Client Protocol
Fixture or Linear-backed issue loading with configurable issue sources
Per-issue git worktree isolation
Configurable verification (lint, typecheck, test, build) per project type
Gate evaluation with profiles (full / standard / hotfix)
Compliance engine with YAML-defined agent behavior contracts
Daemon mode with readiness triage, review loop, merge check, retry
GitHub PR auto-creation with gate as commit status
Spec deviation tracking and structured findings
Three-tier change management (Full / Standard / Hotfix)
Web dashboard + Rich TUI (TypeScript/React/Ink)
Mission Control Center with EventBus
Conductor for progressive formalization
Cross-session memory with SQLite WAL index + Qdrant semantic search (E2E verified) (ADR-0001)
LLM-based memory distillation: expired episodic entries grouped by issue and consolidated into semantic summaries
Builder telemetry ingestion: tool-call sequences stored in episodic memory during run finalization
Human acceptance feedback: accept-issue stores acceptance in memory for learning
Success trend aggregation: get_trend_summary() provides windowed aggregate summary (success rate, top failure reasons) injected into LLM context
Enriched run summaries: builder adapter, verification results, and key learnings captured in semantic memory
4-layer memory (WORKING/EPISODIC/SEMANTIC/PROCEDURAL) with all layers active: PROCEDURAL consumed by ContextAssembler
Full self-evolution: evidence analysis, harness synthesis, prompt evolution, policy distillation
spec-orch init for project type detection and config generation
Low-cost model support (MiniMax-M2.5, ~$0.04/run)
spec-orch preflight one-click system health check
spec-orch selftest end-to-end smoke test with fixture issues
FlowRouter hybrid routing (static rules + LLM-based complexity analysis)
KnowledgeDistiller: cross-run learning notebook (.spec_orch/knowledge.md) (standalone, manual invocation)
ContextRanker: priority-aware context truncation replacing naive text slicing
RunProgressSnapshot: pipeline stage checkpointing for daemon retry continuity
SkillDegradationDetector: routing audit, baseline tracking (standalone, not yet wired into pipeline)
TraceSampler: online evaluation sampling with configurable rules
CompactRetentionPriority: architecture-aware context compression
Atomic JSON writes across all state files (crash-safe daemon)
Cross-platform file locking for evolution counter (POSIX + Windows)
LifecycleEvolver protocol: unified 4-phase observe/propose/validate/promote for all 7 evolvers
SkillEvolver: auto-discovers reusable builder tool-call patterns → SkillManifest YAML (SkillCraft-inspired)
Skill Runtime: ContextAssembler loads + matches skills by trigger keywords, injects into builder context
ContextRanker hot/cold separation: learning context (hints, skills, failure samples, procedures, trends) included in priority-based budget allocation
Memory compaction + LLM distillation: episodic memory auto-expires after 30 days, run outcomes consolidated and distilled to semantic layer
Layered memory architecture: filesystem truth source + SQLite WAL index + Qdrant semantic search (BAAI/bge-small-zh-v1.5 local embedding, E2E validated)
Memory vNext shipped: entity relation layer (entity_scope/entity_id/relation_type), ProjectProfile + 4 learning views, hybrid retrieval (FTS5 + RRF), async derivation queue (ADR-0002)
Memory subsystem refactored: MemoryService split into MemoryAnalytics + MemoryDistiller + MemoryRecorder, SQL-pushed date filters, soft-delete compaction, cross-process safe derivation queue
CJK-aware text matching: jieba segmentation with graceful bigram fallback for non-CJK mixed content
Modular CLI: cli/ package with 8 command submodules replacing single 4092-line file
LLM JSON output schema validation with fallback + observability events

Installation

Quick Start

# 1. Install
git clone https://github.com/fakechris/spec-orch.git
cd spec-orch
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

# 2. Configure environment — copy and edit .env
cp .env.example .env
# Edit .env: set SPEC_ORCH_LLM_API_KEY and SPEC_ORCH_LLM_API_BASE
# (see .env.example for provider-specific examples)

# 3. Initialize project config
spec-orch init                    # LLM-first detection (auto fallback to rules)
spec-orch init --offline          # Force rule-based detection
spec-orch init --reconfigure      # Re-detect and overwrite existing config

# 4. Verify everything works
spec-orch config check            # Validate configuration
spec-orch discuss                 # Test LLM connectivity with interactive TUI

Minimum required environment variables (set in .env or export):

Variable	Purpose	Example
`SPEC_ORCH_LLM_API_KEY`	LLM provider API key (planner, discuss, triage)	`sk-ant-xxx` or MiniMax key
`SPEC_ORCH_LLM_API_BASE`	LLM API endpoint	`https://api.anthropic.com`
`SPEC_ORCH_LINEAR_TOKEN`	Linear issue tracking (optional for daemon)	`lin_api_xxx`

See .env.example for full configuration reference.

From GitHub via pip / uv

pip install "spec-orch @ git+https://github.com/fakechris/spec-orch.git"
uv pip install "spec-orch @ git+https://github.com/fakechris/spec-orch.git"

From PyPI

pip install spec-orch
pip install "spec-orch[all]"

Verify

spec-orch --version   # 0.5.2
spec-orch config check

Requirements

Python 3.11+ (3.11, 3.12, 3.13 tested)
Git (for worktree isolation)
Builder CLI — one of: Codex, OpenCode, Droid, Claude Code, or any ACPX-compatible agent
Linear API token (optional, for issue tracking)
LLM API key (optional, for planning / review / triage)

Optional Extras

Extra	Packages	Use case
`planner`	litellm	`discuss`, `plan`, readiness triage
`dashboard`	fastapi, uvicorn	`spec-orch dashboard`
`slack`	slack-bolt	Slack discussion adapter
`memory`	qdrant-client, fastembed	Semantic vector search for memory recall
`cjk`	jieba	Chinese word segmentation for memory text matching
`all`	all of the above	Full feature set
`dev`	all + pytest, ruff, mypy, build, twine	Development

Configuration

SpecOrch is configured via spec-orch.toml. Run spec-orch init to auto-detect your project and generate config:

spec-orch init               # LLM-first detection; auto fallback to rules
spec-orch init --offline     # Force rule-based detection
spec-orch init --reconfigure # Re-run detection and overwrite existing config
spec-orch init --force       # Force overwrite existing config

spec-orch init persists selected detection mode into [init].detection_mode inside spec-orch.toml for deterministic future reconfiguration.

To enable semantic vector search for memory recall, install the memory extra and add to spec-orch.toml:

[memory]
provider = "filesystem_qdrant"

[memory.qdrant]
mode = "local"                          # "local" | "memory" | "server"
path = ".spec_orch_qdrant"
collection = "spec_orch_memory"
embedding_model = "BAAI/bge-small-zh-v1.5"

When provider = "filesystem_qdrant" is set, recall() uses Qdrant semantic search for EPISODIC and SEMANTIC layers while keeping the filesystem as the source of truth. Without the memory extra installed, the system silently falls back to filesystem-only mode.

See spec-orch.toml Reference and AI Config Guide for full documentation. See ADR-0001 for the memory architecture decision.

CLI Reference (65+ commands)

Contract Plane

spec-orch discuss                     # Interactive brainstorming TUI
spec-orch mission create "Title"      # Create mission + spec skeleton
spec-orch mission approve <id>        # Freeze spec for execution
spec-orch mission status              # List all missions
spec-orch contract generate <id>      # Generate TaskContract from issue

Task Plane

spec-orch plan <mission-id>           # LLM scoper generates DAG
spec-orch plan-show <mission-id>      # View wave/packet breakdown
spec-orch promote <mission-id>        # Create Linear issues from plan
spec-orch pipeline <mission-id>       # Show EODF pipeline progress

Execution Plane

spec-orch run <id> --source linear    # Full one-shot pipeline
spec-orch run <id> --auto-approve     # Skip spec approval, auto-approve
spec-orch run-plan <mission-id>       # Execute plan with parallel waves
spec-orch run-issue <id>              # Build + verify + gate (requires spec approval)
spec-orch run-issue <id> --auto-approve  # Auto-approve spec and run
spec-orch daemon start                # Autonomous daemon mode

Evidence Plane

spec-orch gate evaluate <id>          # Evaluate gate conditions
spec-orch review-issue <id>           # Review with verdict
spec-orch accept-issue <id>           # Human acceptance
spec-orch explain <id>                # Gate explanation report
spec-orch retro <mission-id>          # Mission retrospective

Control Plane

spec-orch status <id>                 # Current run state
spec-orch status --all                # All issues table
spec-orch dashboard                   # Web dashboard
spec-orch watch <id>                  # Real-time activity log
spec-orch config check                # Validate configuration

Evolution Plane

spec-orch evidence summary            # Pattern analysis
spec-orch harness synthesize          # Auto-generate rules
spec-orch prompt evolve               # A/B tested prompts
spec-orch strategy analyze            # Scoper hints
spec-orch policy distill              # Zero-LLM scripts

Name		Name	Last commit message	Last commit date
Latest commit History 541 Commits
.cursor/skills		.cursor/skills
.github		.github
.spec_orch		.spec_orch
.spec_orch_evolution		.spec_orch_evolution
docs		docs
fixtures/issues		fixtures/issues
homebrew		homebrew
openspec		openspec
packages/tui		packages/tui
src/spec_orch		src/spec_orch
tests		tests
tools		tools
.env.example		.env.example
.gitignore		.gitignore
.impeccable.md		.impeccable.md
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
PLANNING.md		PLANNING.md
README.md		README.md
README.zh.md		README.zh.md
VISION.md		VISION.md
VISION.zh.md		VISION.zh.md
compliance.contracts.yaml		compliance.contracts.yaml
findings.md		findings.md
gate.policy.yaml		gate.policy.yaml
progress.md		progress.md
pyproject.toml		pyproject.toml
spec-orch.toml		spec-orch.toml
task_plan.md		task_plan.md
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

SpecOrch

Core Insights

Seven-Plane Architecture

User Story: From Idea to Merged Code

1. Discuss and Draft a Spec (Contract Plane)

2. Generate an Execution Plan (Task Plane)

3. Execute (Execution Plane)

4. Human Acceptance (Evidence Plane)

5. Retrospective (Evolution Plane)

Key Components

Pluggable Agent Adapters

Multi-Model Defaults And Fallback Chains

Gate System

Self-Evolution Engine

Status

Installation

Quick Start

From GitHub via pip / uv

From PyPI

Verify

Requirements

Optional Extras

Configuration

CLI Reference (65+ commands)

Contract Plane

Task Plane

Execution Plane

Evidence Plane

Control Plane

Evolution Plane

Documents

Vision & Architecture

Design (Current)

Reviews

Reference & Guides

Architecture Decision Records

Historical (Decision Records)

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages