You have a codebase. Wolfcastle will build it.
You give Wolfcastle a goal. It breaks that goal into a tree of targets and sends AI coding agents to destroy them until the tree is conquered. One at a time or three at once. While you do whatever it is you do.
Pre-release. Probably broken. Then fixed. Then broken again. On repeat until release. Install if you're feeling brave:
brew install dorkusprime/tap/wolfcastle
cd your-repo
wolfcastleThis opens the TUI. Press I to initialize the project, i then a to add work to the inbox, s to start the daemon. The dashboard shows progress in real time: tree state, audit results, log stream, everything. The TUI is the primary interface.
You can also run Wolfcastle without the TUI, from the command line or in a headless environment:
wolfcastle init # scaffold .wolfcastle/
wolfcastle inbox add "Build a website for my donut stand" # queue up work
wolfcastle start # the daemon wakes up
wolfcastle status -w # watch it workinit creates the .wolfcastle/ directory with config, prompts, and state scaffolding. inbox add queues a feature request for the daemon to decompose into projects and tasks. start launches the daemon, which runs the full pipeline until there's nothing left to do. status shows you what's happening (and -w keeps on showing it to you). Everything is configurable.
You can also run Wolfcastle in the background and just let it go. Give it work as you find work for it to do. No LLM calls are made if there's no work to be done:
wolfcastle start -d # the daemon wakes up
# Whenever you get to it:
# press i then a in the TUI, or from the command line:
wolfcastle inbox add "Build a website for my donut stand" # queue up workFeed it work through the TUI inbox, the CLI, or chat with your coding agent to hammer out the details and have it inject the work directly with a markdown file path. A detailed spec or PRD gets the most predictable results, but Wolfcastle takes vague directions too without getting lost in the wrong state like a second lieutenant. "Make the API faster" becomes specs, task trees, and acceptance criteria before any code gets written. The daemon takes it from there: triage, planning, execution, audit, commit. Depth-first through the tree until it's conquered or something insurmountable gets in the way.
By default, the daemon works one task at a time. With parallel execution enabled, independent siblings under the same orchestrator run concurrently. Each agent declares which files it intends to touch via scope locks; the daemon enforces disjoint scopes and serializes git commits. Three agents writing three packages simultaneously, no collisions, no coordination overhead. The tree structure, the audits, the state machine: all identical. The only difference is how many agents are swinging at once.
Coding agents are remarkably good at writing specs and following specs. Where they fall apart is the middle: deciding what to build, building it, and maintaining quality across dozens of tasks without human intervention. The models and their coding harnesses are capable. What's missing is the scaffolding around them: the structure that lets them work for hours without a human in the loop. Four problems kill autonomous agents in practice.
Coding agents work best when they know exactly what to do. The problem is getting from a goal to a plan. Some agents offer a "planning mode" to bridge this gap, but the agent is still both planner and executor. Keep context and the two blur together: the agent plans five things, does two, gets creative on the third, and forgets the fourth. Clear context and the executor can reinterpret the plan because nothing enforces it. Either way, the plan is a suggestion the agent made to itself.
Wolfcastle separates planning from execution. A planning agent creates the structure: specs, projects, task trees. An execution agent follows it. Neither can overwrite the other. The daemon enforces the contract. If the executor blocks, the system decomposes the problem into smaller, weaker targets and destroys those instead.
Context degrades over time: windows fill up, conversations get compacted, and earlier decisions contradict later ones. When context contradicts itself, the model reconciles those conflicts on every turn, burning capacity on bookkeeping instead of work. Decisions evaporate, lessons from failures disappear, and the agent runs in circles solving the same problems over and over.
Wolfcastle gives each task a fresh invocation with clean context and persists knowledge as artifacts on disk. Architecture Decision Records (ADRs) capture the "why" behind design choices so the next agent doesn't reverse a deliberate decision. Specifications give executors a contract instead of re-interpreting the goal each time. After Action Reviews (AARs) capture lessons learned so mistakes die the first time. Codebase knowledge files accumulate informal observations (build quirks, hidden dependencies, undocumented conventions) that grow across tasks, giving each new agent the institutional memory that none of the formal artifacts capture. The daemon injects all of these into context automatically: each invocation starts clean but informed.
Agents produce code, and nobody checks it. Or worse, the agent checks its own work in the same context where it wrote the code: the author proofreading their own novel. Errors of assumption survive because the assumptions were never questioned by a separate process. The code compiles, the tests pass because they were written to pass, and the architecture quietly rots.
Wolfcastle audits every piece of work from a separate invocation with fresh context. Audits are hierarchical: each leaf gets its own audit scoped to what it built, each orchestrator gets an audit scoped to how its children integrate. The auditor reads breadcrumbs (timestamped records of what each task produced), checks them against acceptance criteria, and files gaps. Gaps that can't be resolved locally escalate upward. No code ships without a second pair of eyes, and those eyes belong to a process that didn't write the code.
Most agent workflows assume someone is always watching: approving plans, reviewing code, deciding what to do next. The agent is capable, but the process treats it like an intern. Scale hits the human's availability long before it hits the agent's throughput.
Wolfcastle runs the full lifecycle without waiting for a human: intake, planning, execution, audit, remediation, commit. With parallel execution, throughput scales with the work: three independent packages build simultaneously instead of sequentially. Your role is to check the TUI dashboard, adjust priorities, and unblock the rare task that genuinely needs judgment. The system does the volume. You do the thinking.
The Ralph loop changed how people think about coding agents. Put an agent in a while loop, feed it a spec, and let it run: each iteration does one thing with fresh context, and backpressure from tests, compilation, and linting keeps the output honest. Simple, effective, and widely adopted, though not without rough edges: context rot degrades output as windows fill, placeholder implementations slip through when the model chases compilation over correctness, and architectural drift accumulates across iterations without independent review.
Ralph works because it respects a fundamental constraint: context degrades. The specs get re-read every pass, but they're read clean, with no competing noise from failed attempts or abandoned approaches. The signal-to-noise ratio is what matters.
Wolfcastle inherits these ideas: fresh context per task, backpressure validation, one thing at a time. It also pushes further on a principle Ralph hints at: don't make the model do work that doesn't require a model. Finding the next task, grooming a backlog, propagating state, validating structure: these are deterministic operations. Asking a non-deterministic agent to perform them reliably is a recipe for a soup sandwich. Wolfcastle handles all of it with CLI scripts and daemon logic. The agent reads code and writes code. Everything else is handled by Go binaries that never hallucinate, never drift, and never burn tokens on bookkeeping.
The difference is what sits around the core loop: planning agents that decide what to work on next, structured artifacts that carry knowledge forward without static files that grow until they hit the context ceiling, hierarchical audits that catch what compile-test-lint cannot, and a daemon that runs the loop without a human at the keyboard.
Ralph is the insight. Wolfcastle is the infrastructure.
Wolfcastle organizes code changes into a project tree, visible in the TUI tree pane. Orchestrators represent features, modules, or milestones. Leaves are where the actual coding tasks live. Every node ends with an automatic audit and, if gaps are found, remediation. Nodes have four states: not_started, in_progress, complete, blocked. State propagates upward deterministically: when a task completes, its leaf recomputes, then its parent, all the way to the root. No node decides its own fate. Insubordination is not a valid state.
The daemon hunts down the next task and eliminates it, running a pipeline of stages: intake triages inbox items into projects, planning agents decompose projects into specs and task trees, execution claims tasks and points agents at code, audits verify the results from a separate invocation with fresh context. In parallel mode, the daemon dispatches up to max_workers tasks simultaneously, each with its own agent and file scope. Each stage is a separate model call with a specific role. Stages are configured as a dictionary, and each one can be overridden individually across config tiers (base, custom, local) without replacing the entire pipeline.
The tree grows at runtime. Tasks that fail decompose into smaller, weaker problems. Agents can trigger decomposition mid-task when they recognize the work is too broad. Orchestrators spawn new children when planning reveals additional work. A hard cap stops any task from fighting forever. If the machine goes down mid-task, the daemon recovers on restart and resumes from the interrupted task. Completed trees are auto-archived. It does not waste time on the fallen.
Everything is deterministic except the model's output. State, specs, ADRs, AARs, breadcrumbs, and audit findings all live as files on disk, tracked in version control. Nothing important stays in memory. Every decision, every lesson, every result persists across invocations, restarts, and crashes. The model handles what models are good at: reading code, writing code, making judgment calls. CLI scripts handle what's deterministic: state transitions, propagation, validation, navigation. Neither does the other's job.
Three tiers: Base, custom, local. Higher tiers override lower ones. New to configuration? Start with the quickstart. Base ships with the release and gets regenerated by wolfcastle init. Custom is committed to the repo and shared with your team. Local is gitignored, yours alone, for personal overrides. JSON objects deep-merge; arrays replace entirely; set a field to null to eliminate it. Configuration is not a democracy. Manage it through wolfcastle config commands or edit the JSON directly.
Agents are defined as CLI commands. Anything that reads stdin and writes stdout works: Claude Code, Cursor, Copilot, GPT, Gemini, Llama, a bash script wrapping curl Your agents, your choice. Switch providers by editing a JSON file.
Each engineer's work lives in its own namespace. Everyone can see everyone else's state, but nobody writes to anyone else's. No merge conflicts. No coordination overhead. The daemon commits deterministically after each task, including partial work on failure, so nothing is lost. Agents never touch git. Run in an isolated worktree if you prefer.
Wolfcastle turns tokens into code. It uses a lot of them. Every planning pass, every execution, every audit, every remediation is a separate model invocation. The scaffolding makes each invocation more efficient: persistent artifacts mean less re-discovery, CLI scripts handle state instead of the model reasoning through it, and deterministic operations like propagation and validation never touch the model at all. But the volume is real. If you're using a metered API, expect meaningful spend. An unlimited plan (like Anthropic's Max plan for Claude Code) is the practical choice for sustained use.
- Git (branch verification, progress detection, auto-commit)
- A coding agent that reads stdin and writes stdout
- Local filesystem for
.wolfcastle/(why)
- TUI Guide
- 63 CLI commands
- Architecture Decision Records (101 and counting)
- Specifications
- Developer guides
- AGENTS.md


