Local LLM Agents

TDD-driven task orchestration for local LLMs

A lightweight, bash-based pipeline for running AI agents on your local hardware. No API keys required - just your GPU and any OpenAI-compatible LLM server.

Features

TDD Workflow: RED → GREEN → REFACTOR cycle for reliable code generation
Parallel Execution: Run multiple agents simultaneously (limited by GPU slots)
Dynamic Agent Selection: LLM picks the best agent for each task
Self-Improving: Generates new agents on-demand when needed
Quality Gates: Automatic review and scoring of outputs
Works with Any Local LLM: vLLM, LM Studio, llama.cpp, Ollama
Token-Efficient: File mode for large specs saves 95%+ Claude tokens

Requirements

Local LLM Server: Any OpenAI-compatible endpoint (default: localhost:8081)
jq: JSON processor (apt install jq or brew install jq)
curl: HTTP client (usually pre-installed)
bash: Version 4.0+ recommended

Quick Start

# 1. Clone the repo
git clone https://github.com/Platano78/local-llm-agents.git
cd local-llm-agents

# 2. Run the install script
./install.sh

# 3. Start your local LLM server (example with vLLM)
vllm serve your-model --port 8081

# 4. Run a task (two options)

# Option A: Direct script execution
./scripts/orchestrate.sh "Create a Python function to check if a number is prime" 2

# Option B: Claude Code slash command (after install)
# In Claude Code terminal, type:
/local-agents Create a Python function to check if a number is prime

Claude Code Integration

After running ./install.sh, the /local-agents slash command is available in Claude Code:

# In Claude Code:
/local-agents "Create a REST API with CRUD operations"
/local-agents "Write unit tests for the UserService class"
/local-agents "Refactor this module to use async/await"

The slash command:

Installs scripts to ~/.claude/scripts/local-agents/
Adds the /local-agents command to ~/.claude/commands/
Copies starter agents to ~/.claude/agents/
Supports token-efficient file mode for large specifications

What the Slash Command Provides

When you use /local-agents <task>, Claude Code:

Runs the full orchestration pipeline
Shows progress through each stage
Displays the synthesized results
Can be used for manual step-by-step execution if needed

Architecture

┌─────────────────────────────────────────────────────────────┐
│                     ORCHESTRATE.SH                          │
├─────────────────────────────────────────────────────────────┤
│  1. Health Check    → Verify LLM server is running          │
│  2. Decompose       → Break task into TDD subtasks          │
│  3. Map Agents      → Select best agent for each task       │
│  4. Execute         → Run agents in parallel batches        │
│  5. Quality Gate    → Review and score outputs              │
│  6. Synthesize      → Combine results into final output     │
└─────────────────────────────────────────────────────────────┘

Usage

Basic Usage

# Run with default 2 parallel slots
./scripts/orchestrate.sh "Your task description"

# Run with custom slot count
./scripts/orchestrate.sh "Your task description" 4

Token-Efficient File Mode

For large task specifications (>5KB), use file mode to save Claude tokens:

# Write large spec to file
cat > /tmp/task-spec.md << 'EOF'
# My Complex Task
[Large specification content here...]
EOF

# Pass file path instead of inline text
./scripts/orchestrate.sh /tmp/task-spec.md 4

# Cleanup after completion
rm -f /tmp/task-spec.md

How it works:

orchestrate.sh auto-detects if the argument is a file path or inline text
File paths are read and processed; inline text works as before
Backward compatible - no breaking changes

Token savings:

Spec Size	Inline Mode	File Mode	Savings
10KB	~12K tokens	~2K tokens	83%
50KB	~52K tokens	~2.5K tokens	95%
100KB	~102K tokens	~3K tokens	97%

See docs/TOKEN_EFFICIENCY.md for detailed analysis.

Environment Variables

Variable	Default	Description
`WORKER_PORT`	`8081`	Port for the main LLM worker
`ORCHESTRATOR_PORT`	`8085`	Port for orchestrator model (optional)
`AGENT_POOL`	`./agents`	Directory containing agent definitions
`WORK_DIR`	`/tmp/local-agents-*`	Working directory for outputs

Example Tasks

# Code generation
./scripts/orchestrate.sh "Create a Python class for a binary search tree"

# Testing
./scripts/orchestrate.sh "Write unit tests for the UserService class"

# Refactoring
./scripts/orchestrate.sh "Refactor the database module to use connection pooling"

Agent Pool

Agents are Markdown files in the agents/ directory. Each agent has:

A system prompt defining its role
Suggested tools it can use
Output format expectations

Included Agents

Agent	Purpose
`code-generator-agent`	Writes implementation code
`test-writer-agent`	Creates unit tests
`code-reviewer-agent`	Reviews code quality
`refactor-agent`	Improves code structure
`documentation-agent`	Writes documentation

Creating Custom Agents

---
name: my-custom-agent
category: development
phase: GREEN
---

# My Custom Agent

You are a specialized agent for [specific task].

## Instructions
1. Analyze the task
2. Produce output in the specified format
3. Include relevant details

## Output Format
[Define expected output structure]

How It Works

1. Task Decomposition

The pipeline breaks your task into atomic TDD subtasks:

{
  "parallel_groups": [
    {
      "group": 1,
      "description": "Write failing tests",
      "tasks": [{"id": "T1", "phase": "RED", "task": "Write test for..."}]
    },
    {
      "group": 2, 
      "description": "Implement to pass tests",
      "tasks": [{"id": "T2", "phase": "GREEN", "task": "Implement..."}]
    }
  ]
}

2. Dynamic Agent Selection

For each task, the pipeline:

Queries the LLM to select the best agent from the pool
Falls back to phase-based defaults if no match
Can generate new agents on-demand for specialized tasks

3. Parallel Execution

Available Slots: 2

Batch 1 (RED phase):
┌─────────┐  ┌─────────┐
│ Slot 1  │  │ Slot 2  │  ← Run in parallel
│ T1      │  │ T2      │
└─────────┘  └─────────┘
[Wait for batch completion]

Batch 2 (GREEN phase):
┌─────────┐  ┌─────────┐
│ Slot 1  │  │ Slot 2  │  ← Run in parallel
│ T3      │  │ T4      │
└─────────┘  └─────────┘

4. Quality Gate

Every execution is reviewed:

Overall quality score (0-100)
Per-task assessment
Critical issues flagged
Recommendations provided

Supported LLM Servers

Server	Tested	Notes
vLLM	✅	Recommended for multi-slot execution
LM Studio	✅	Easy setup, good for single slot
llama.cpp	✅	Lightweight option
Ollama	✅	Use with `OLLAMA_HOST`
LocalAI	⚠️	Should work, not extensively tested

Project Structure

local-llm-agents/
├── scripts/
│   ├── orchestrate.sh      # Main pipeline
│   ├── health-check.sh     # Server health verification
│   ├── tdd-decomposer.sh   # Task decomposition
│   ├── agent-mapper.sh     # Agent selection
│   ├── agent-selector.sh   # Dynamic LLM-based selection
│   ├── agent-generator.sh  # On-demand agent creation
│   ├── execute-agents.sh   # Parallel execution
│   ├── react-executor.sh   # ReAct loop for agents
│   ├── quality-gate.sh     # Output review
│   ├── synthesize.sh       # Result aggregation
│   └── tool-executor.sh    # Tool execution
├── prompts/
│   ├── decompose.txt       # Decomposition prompt
│   ├── quality-review.txt  # Quality gate prompt
│   ├── synthesize.txt      # Synthesis prompt
│   └── tool-instructions.txt
├── agents/                 # Agent definitions
├── examples/               # Example usage
└── install.sh              # Installation script

Troubleshooting

"No LLM server available"

Start your local LLM server:

# vLLM
vllm serve model-name --port 8081

# LM Studio
# Start via GUI, enable server on port 8081

# Ollama
OLLAMA_HOST=0.0.0.0:8081 ollama serve

"Empty response from LLM"

Check if model is loaded and responding
Try increasing timeout: LLM_TIMEOUT=60
Verify the model supports chat completions API

"Agent file not found"

Ensure AGENT_POOL points to correct directory
Check agent filename matches expected pattern: agent-name.md

Contributing

Contributions welcome! Please:

Fork the repository
Create a feature branch
Submit a pull request

License

MIT License - see LICENSE for details.

Related Projects

vLLM - High-throughput LLM serving
LM Studio - Desktop app for local LLMs
Ollama - Run LLMs locally

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
agents		agents
commands		commands
docs		docs
examples		examples
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh

Folders and files

Latest commit

History

Repository files navigation

Local LLM Agents

Features

Requirements

Quick Start

Claude Code Integration

What the Slash Command Provides

Architecture

Usage

Basic Usage

Token-Efficient File Mode

Environment Variables

Example Tasks

Agent Pool

Included Agents

Creating Custom Agents

How It Works

1. Task Decomposition

2. Dynamic Agent Selection

3. Parallel Execution

4. Quality Gate

Supported LLM Servers

Project Structure

Troubleshooting

"No LLM server available"

"Empty response from LLM"

"Agent file not found"

Contributing

License

Related Projects

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages