TDD-driven task orchestration for local LLMs
A lightweight, bash-based pipeline for running AI agents on your local hardware. No API keys required - just your GPU and any OpenAI-compatible LLM server.
- TDD Workflow: RED → GREEN → REFACTOR cycle for reliable code generation
- Parallel Execution: Run multiple agents simultaneously (limited by GPU slots)
- Dynamic Agent Selection: LLM picks the best agent for each task
- Self-Improving: Generates new agents on-demand when needed
- Quality Gates: Automatic review and scoring of outputs
- Works with Any Local LLM: vLLM, LM Studio, llama.cpp, Ollama
- Token-Efficient: File mode for large specs saves 95%+ Claude tokens
- Local LLM Server: Any OpenAI-compatible endpoint (default:
localhost:8081) - jq: JSON processor (
apt install jqorbrew install jq) - curl: HTTP client (usually pre-installed)
- bash: Version 4.0+ recommended
# 1. Clone the repo
git clone https://github.com/Platano78/local-llm-agents.git
cd local-llm-agents
# 2. Run the install script
./install.sh
# 3. Start your local LLM server (example with vLLM)
vllm serve your-model --port 8081
# 4. Run a task (two options)
# Option A: Direct script execution
./scripts/orchestrate.sh "Create a Python function to check if a number is prime" 2
# Option B: Claude Code slash command (after install)
# In Claude Code terminal, type:
/local-agents Create a Python function to check if a number is primeAfter running ./install.sh, the /local-agents slash command is available in Claude Code:
# In Claude Code:
/local-agents "Create a REST API with CRUD operations"
/local-agents "Write unit tests for the UserService class"
/local-agents "Refactor this module to use async/await"The slash command:
- Installs scripts to
~/.claude/scripts/local-agents/ - Adds the
/local-agentscommand to~/.claude/commands/ - Copies starter agents to
~/.claude/agents/ - Supports token-efficient file mode for large specifications
When you use /local-agents <task>, Claude Code:
- Runs the full orchestration pipeline
- Shows progress through each stage
- Displays the synthesized results
- Can be used for manual step-by-step execution if needed
┌─────────────────────────────────────────────────────────────┐
│ ORCHESTRATE.SH │
├─────────────────────────────────────────────────────────────┤
│ 1. Health Check → Verify LLM server is running │
│ 2. Decompose → Break task into TDD subtasks │
│ 3. Map Agents → Select best agent for each task │
│ 4. Execute → Run agents in parallel batches │
│ 5. Quality Gate → Review and score outputs │
│ 6. Synthesize → Combine results into final output │
└─────────────────────────────────────────────────────────────┘
# Run with default 2 parallel slots
./scripts/orchestrate.sh "Your task description"
# Run with custom slot count
./scripts/orchestrate.sh "Your task description" 4For large task specifications (>5KB), use file mode to save Claude tokens:
# Write large spec to file
cat > /tmp/task-spec.md << 'EOF'
# My Complex Task
[Large specification content here...]
EOF
# Pass file path instead of inline text
./scripts/orchestrate.sh /tmp/task-spec.md 4
# Cleanup after completion
rm -f /tmp/task-spec.mdHow it works:
orchestrate.shauto-detects if the argument is a file path or inline text- File paths are read and processed; inline text works as before
- Backward compatible - no breaking changes
Token savings:
| Spec Size | Inline Mode | File Mode | Savings |
|---|---|---|---|
| 10KB | ~12K tokens | ~2K tokens | 83% |
| 50KB | ~52K tokens | ~2.5K tokens | 95% |
| 100KB | ~102K tokens | ~3K tokens | 97% |
See docs/TOKEN_EFFICIENCY.md for detailed analysis.
| Variable | Default | Description |
|---|---|---|
WORKER_PORT |
8081 |
Port for the main LLM worker |
ORCHESTRATOR_PORT |
8085 |
Port for orchestrator model (optional) |
AGENT_POOL |
./agents |
Directory containing agent definitions |
WORK_DIR |
/tmp/local-agents-* |
Working directory for outputs |
# Code generation
./scripts/orchestrate.sh "Create a Python class for a binary search tree"
# Testing
./scripts/orchestrate.sh "Write unit tests for the UserService class"
# Refactoring
./scripts/orchestrate.sh "Refactor the database module to use connection pooling"Agents are Markdown files in the agents/ directory. Each agent has:
- A system prompt defining its role
- Suggested tools it can use
- Output format expectations
| Agent | Purpose |
|---|---|
code-generator-agent |
Writes implementation code |
test-writer-agent |
Creates unit tests |
code-reviewer-agent |
Reviews code quality |
refactor-agent |
Improves code structure |
documentation-agent |
Writes documentation |
---
name: my-custom-agent
category: development
phase: GREEN
---
# My Custom Agent
You are a specialized agent for [specific task].
## Instructions
1. Analyze the task
2. Produce output in the specified format
3. Include relevant details
## Output Format
[Define expected output structure]The pipeline breaks your task into atomic TDD subtasks:
{
"parallel_groups": [
{
"group": 1,
"description": "Write failing tests",
"tasks": [{"id": "T1", "phase": "RED", "task": "Write test for..."}]
},
{
"group": 2,
"description": "Implement to pass tests",
"tasks": [{"id": "T2", "phase": "GREEN", "task": "Implement..."}]
}
]
}For each task, the pipeline:
- Queries the LLM to select the best agent from the pool
- Falls back to phase-based defaults if no match
- Can generate new agents on-demand for specialized tasks
Available Slots: 2
Batch 1 (RED phase):
┌─────────┐ ┌─────────┐
│ Slot 1 │ │ Slot 2 │ ← Run in parallel
│ T1 │ │ T2 │
└─────────┘ └─────────┘
[Wait for batch completion]
Batch 2 (GREEN phase):
┌─────────┐ ┌─────────┐
│ Slot 1 │ │ Slot 2 │ ← Run in parallel
│ T3 │ │ T4 │
└─────────┘ └─────────┘
Every execution is reviewed:
- Overall quality score (0-100)
- Per-task assessment
- Critical issues flagged
- Recommendations provided
| Server | Tested | Notes |
|---|---|---|
| vLLM | ✅ | Recommended for multi-slot execution |
| LM Studio | ✅ | Easy setup, good for single slot |
| llama.cpp | ✅ | Lightweight option |
| Ollama | ✅ | Use with OLLAMA_HOST |
| LocalAI | Should work, not extensively tested |
local-llm-agents/
├── scripts/
│ ├── orchestrate.sh # Main pipeline
│ ├── health-check.sh # Server health verification
│ ├── tdd-decomposer.sh # Task decomposition
│ ├── agent-mapper.sh # Agent selection
│ ├── agent-selector.sh # Dynamic LLM-based selection
│ ├── agent-generator.sh # On-demand agent creation
│ ├── execute-agents.sh # Parallel execution
│ ├── react-executor.sh # ReAct loop for agents
│ ├── quality-gate.sh # Output review
│ ├── synthesize.sh # Result aggregation
│ └── tool-executor.sh # Tool execution
├── prompts/
│ ├── decompose.txt # Decomposition prompt
│ ├── quality-review.txt # Quality gate prompt
│ ├── synthesize.txt # Synthesis prompt
│ └── tool-instructions.txt
├── agents/ # Agent definitions
├── examples/ # Example usage
└── install.sh # Installation script
Start your local LLM server:
# vLLM
vllm serve model-name --port 8081
# LM Studio
# Start via GUI, enable server on port 8081
# Ollama
OLLAMA_HOST=0.0.0.0:8081 ollama serve- Check if model is loaded and responding
- Try increasing timeout:
LLM_TIMEOUT=60 - Verify the model supports chat completions API
- Ensure
AGENT_POOLpoints to correct directory - Check agent filename matches expected pattern:
agent-name.md
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Submit a pull request
MIT License - see LICENSE for details.