Intelligent model routing for OpenClaw with quota prediction, task classification, and automatic optimization.
Smart Router helps you get the most out of your LLM quotas by:
- Predicting exhaustion - Know when you'll run out of tokens before it happens
- Analyzing workloads - Identify which cron jobs and agents can use cheaper models
- Automatic optimization - Shift workloads to appropriate models based on task complexity
- Local model support - Route simple tasks to MLX, Ollama, or other local servers
- Budget tracking - Monitor spend on pay-per-token providers like OpenRouter
cd ~/.openclaw/extensions
git clone https://github.com/joshuaswarren/openclaw-tactician.git
cd openclaw-tactician
npm install && npm run build{
"plugins": {
"openclaw-tactician": {
"mode": "dry-run",
"providers": {
"anthropic": {
"quotaSource": "self-tracked",
"quotaType": "tokens",
"tier": "premium",
"resetSchedule": { "type": "weekly", "dayOfWeek": 3, "hour": 7 }
},
"openai-codex": {
"quotaSource": "self-tracked",
"quotaType": "messages",
"tier": "premium",
"resetSchedule": { "type": "fixed", "fixedDate": "2026-02-09T14:36:00Z" }
},
"openrouter": {
"quotaSource": "api",
"quotaType": "budget",
"tier": "budget"
}
}
}
}
}kill -USR1 $(pgrep openclaw-gateway)openclaw router status# Show provider status and usage
openclaw router status [provider]
# Predict quota exhaustion
openclaw router predict [--hours=24]
# List configured providers
openclaw router providers
# Manually set usage (e.g., after checking your account)
openclaw router set-usage <provider> <percent|tokens>
# Examples:
openclaw router set-usage anthropic 79%
openclaw router set-usage openai-codex 91%
# Reset quota counter after provider reset
openclaw router reset <provider>
# Analyze crons/agents for optimization opportunities
openclaw router analyze [--type=all|crons|agents]
# Generate and optionally apply optimizations
openclaw router optimize [--apply] [--safe-only]
# Detect local model servers
openclaw router detect-local
# Get or set operation mode
openclaw router mode [manual|dry-run|auto]Chat with OpenClaw using these capabilities:
"What's my token usage looking like?"
→ Calls router_status tool
"When will I run out of Codex tokens?"
→ Calls router_predict tool
"Which of my cron jobs could use cheaper models?"
→ Calls router_analyze tool
"Optimize my model usage"
→ Calls router_optimize tool (with confirmation)
"Move everything off Anthropic"
→ Calls router_shift tool
| Mode | Behavior |
|---|---|
manual |
CLI only. No automatic changes. |
dry-run |
Preview optimizations. Ask before applying. (Default) |
auto |
Automatically apply safe (reversible) optimizations. |
This plugin can fetch real-time usage from provider APIs, similar to CodexBar:
| Provider | Status | Method | Environment Variables |
|---|---|---|---|
| Claude | ✅ Working | Web API | CLAUDE_SESSION_KEY, CLAUDE_COOKIES |
| Codex | ✅ Working | OAuth | Auto from ~/.codex/auth.json |
| Kimi | ✅ Working | API | KIMI_AUTH_TOKEN |
| Z.ai | ✅ Working | API | Z_AI_API_KEY |
| OpenRouter | ✅ Working | API | Auto from provider config |
| Copilot | GitHub API | Auto from gh CLI or GITHUB_TOKEN |
|
| Gemini | ✅ Working | OAuth | Auto from ~/.gemini/oauth_creds.json |
# Fetch from a specific provider
openclaw router fetch claude
openclaw router fetch kimi
openclaw router fetch zai
openclaw router fetch codex
openclaw router fetch gemini
# Check credential status
openclaw router credentialsClaude requires browser cookies to fetch usage. The session key alone isn't enough - Cloudflare requires cf_clearance.
Getting the cookies:
- Open https://claude.ai in your browser
- Open DevTools → Application → Cookies → claude.ai
- Copy
sessionKey(starts withsk-ant-sid01-) - Copy
cf_clearance(Cloudflare token)
Environment variables:
export CLAUDE_SESSION_KEY="sk-ant-sid01-..."
export CLAUDE_COOKIES="sessionKey=sk-ant-sid01-...; cf_clearance=..."Usage data returned:
- Session (5hr): Percentage used, reset time
- Weekly (7d): Percentage used, reset time
- Opus: Separate tracking for Opus model
Kimi requires a browser session JWT, NOT the API key.
Important: KIMI_API_KEY (starts with sk-kimi-) is for the coding API which has no usage endpoint.
Getting the token:
- Open https://kimi.com in your browser and log in
- Open DevTools → Application → Cookies → kimi.com
- Copy the
kimi-authcookie value (it's a JWT)
Environment variable:
export KIMI_AUTH_TOKEN="eyJ..."Z.ai uses a standard API key for usage tracking.
Environment variable:
export Z_AI_API_KEY="your-api-key"Regions:
- Global:
api.z.ai(default) - China: Set
Z_AI_REGION=chinato useopen.bigmodel.cn
Codex uses OAuth credentials stored by the Codex CLI.
Setup:
- Install and run
codexCLI - Authenticate via the CLI
- Credentials are stored at
~/.codex/auth.json
No environment variables needed - the plugin reads from the auth file.
Uses GitHub OAuth token.
Sources (checked in order):
GITHUB_TOKENenvironment variablegh auth token(GitHub CLI)~/.config/gh/hosts.yml
Gemini uses OAuth credentials stored by the Gemini CLI. Supports multiple accounts for quota rotation on free tier.
Single Account Setup:
- Install and run
geminiCLI - Authenticate via the CLI
- Credentials are stored at
~/.gemini/oauth_creds.json
Multi-Account Setup (for free tier rotation):
Configure multiple accounts in openclaw.json:
"google": {
"quotaSource": "api",
"accounts": [
{ "name": "personal", "credentialsPath": "~/.gemini/oauth_creds.json" },
{ "name": "work1", "credentialsPath": "~/.gemini/accounts/work1.json" },
{ "name": "work2", "credentialsPath": "~/.gemini/accounts/work2.json" }
]
}Setting up additional accounts:
- Run
geminiin a temp directory to authenticate with a different Google account - Copy the credentials to a unique path:
cp ~/.gemini/oauth_creds.json ~/.gemini/accounts/work1.json - Repeat for each account
Usage data returned (per account):
- Pro models: Usage percentage, reset time
- Flash models: Usage percentage, reset time
- Tier: Your subscription tier (if available)
Note: Gemini CLI tokens expire and cannot be auto-refreshed. Re-run gemini CLI when tokens expire.
For providers without usage APIs (like Google Gemini), we track usage ourselves via the llm_end hook:
- We don't know your actual limits - Set them manually with
router set-usage - Tracking starts from zero - Historical usage before plugin install is unknown
- Reset timing may drift - We reset when configured, not when your provider does
Recommended workflow:
# Check your provider's dashboard for current usage, then sync:
openclaw router set-usage google 45%| Type | Unit | Example Providers |
|---|---|---|
tokens |
Input + output tokens | Anthropic, OpenAI API |
messages |
Conversations/completions | OpenAI Codex (tier-based) |
requests |
API calls per day | Google free tier |
budget |
USD spend | OpenRouter |
For providers without usage APIs, we track usage ourselves via the llm_end hook:
- We don't know your actual limits - Set them manually with
router set-usage - Tracking starts from zero - Historical usage before plugin install is unknown
- Reset timing may drift - We reset when configured, not when your provider does
Recommended workflow:
# Check your provider's dashboard for current usage
# Then sync the plugin:
openclaw router set-usage anthropic 79%
openclaw router set-usage openai-codex 91%{
"plugins": {
"openclaw-tactician": {
// Operation mode: manual, dry-run, auto
"mode": "dry-run",
// Enable debug logging
"debug": false,
// Provider-specific configuration
"providers": {
"anthropic": {
// How to track: api, manual, unlimited, self-tracked
"quotaSource": "self-tracked",
// What the limit measures: tokens, requests, messages, budget
"quotaType": "tokens",
// Optional: set a limit for warnings (omit if unknown)
// "limit": 10000000,
// When quota resets
"resetSchedule": {
"type": "weekly", // daily, weekly, monthly, fixed
"dayOfWeek": 3, // 0=Sunday (for weekly)
"hour": 7, // Hour of reset (0-23)
"timezone": "America/Chicago"
},
// Cost tier: premium, standard, budget, free, local
"tier": "premium",
// Priority within tier (higher = preferred)
"priority": 100
},
"openrouter": {
"quotaSource": "api",
"budget": {
"monthlyLimit": 10.00,
"alertThreshold": 0.8
},
"tier": "budget"
},
"local-mlx": {
"quotaSource": "unlimited",
"tier": "local",
"local": {
"type": "mlx",
"endpoint": "http://localhost:8080",
"models": ["mlx-community/Llama-3.2-3B-Instruct-4bit"]
}
}
},
// Minimum quality scores by task type
"qualityThresholds": {
"coding": 0.8,
"reasoning": 0.75,
"creative": 0.6,
"simple": 0.4
},
// How far ahead to predict (hours)
"predictionHorizonHours": 24,
// Alert thresholds (0-1)
"warningThreshold": 0.8,
"criticalThreshold": 0.95,
// Auto-optimization interval (minutes)
"optimizationIntervalMinutes": 60,
// When to use local models: never, simple-only, when-available, prefer
"localModelPreference": "simple-only"
}
}
}┌──────────────────────────────────────────────────────────────────┐
│ openclaw-tactician │
├──────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ Quota │ │ Capability │ │ Optimization │ │
│ │ Tracker │ │ Scorer │ │ Engine │ │
│ └─────┬───────┘ └──────┬──────┘ └───────────┬─────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Provider Registry │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌────────┐ │ │
│ │ │Anthropic│ │ OpenAI │ │ Google │ │OpenRouter│ │ Local │ │ │
│ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ └────────┘ │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Interface Layer │ │
│ │ ┌──────────────────────┐ ┌───────────────────────────────┐ │ │
│ │ │ CLI Commands │ │ Agent Tools (Chat) │ │ │
│ │ └──────────────────────┘ └───────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
The plugin analyzes prompts to determine task complexity:
| Signal | Classification | Quality Threshold |
|---|---|---|
| Code keywords, ``` blocks | Coding | 0.8 |
| "analyze", "design", "strategy" | Reasoning | 0.75 |
| "write", "story", "creative" | Creative | 0.6 |
| "summarize", "list", "check" | Simple | 0.4 |
Each model is scored on capability dimensions (0-1):
- coding - Code generation and debugging
- reasoning - Logic, math, analysis
- creative - Writing, brainstorming
- instruction - Following complex instructions
- context - Long context handling
- speed - Response latency
Default scores are provided for common models. Override with manual scores in config.
- Analyze - Scan cron jobs and agents for optimization opportunities
- Score - Match task requirements to model capabilities
- Plan - Generate actions (change model, add fallback, split job)
- Apply - Execute changes (dry-run or live based on mode)
The plugin auto-detects these local servers:
| Server | Default Port | Detection |
|---|---|---|
| Ollama | 11434 | GET / returns "Ollama" |
| MLX-LM | 8080 | OpenAI-compatible /v1/models |
| LM Studio | 1234 | OpenAI-compatible /v1/models |
| vLLM | 8000 | /health endpoint |
Run openclaw router detect-local to check what's available.
Configure localModelPreference:
| Value | Behavior |
|---|---|
never |
Don't use local models |
simple-only |
Route simple tasks (summarize, list) to local |
when-available |
Use local when cloud is constrained |
prefer |
Prefer local over cloud when capable |
LM Studio provides a local OpenAI-compatible API server for running quantized models on macOS/Windows/Linux.
1. Install LM Studio
Download from lmstudio.ai or use the CLI:
# Check if installed
lms --version
# List available models
lms ls
# Start the server
lms server start2. Configure Tactician
Add an lmstudio provider in your openclaw.json:
{
"plugins": {
"entries": {
"openclaw-tactician": {
"config": {
"providers": {
"lmstudio": {
"quotaSource": "unlimited",
"quotaType": "tokens",
"tier": "local",
"priority": 30,
"local": {
"type": "lmstudio",
"endpoint": "http://127.0.0.1:1234/v1",
"models": [
"internlm2_5-20b-chat",
"mistral-nemo-instruct-2407",
"mathstral-7b-v0.1"
]
}
}
},
"localModelPreference": "simple-only"
}
}
}
}
}3. Verify Connection
# Check if server is running
curl http://127.0.0.1:1234/v1/models
# Or use the router CLI
openclaw router detect-localQuantized models run faster but have reduced accuracy compared to full-precision models. Tactician applies automatic degradation factors:
| Quantization | Performance Retention | Speed Boost |
|---|---|---|
| Q8_0 | ~95-98% | 1.5-2x |
| Q6_K | ~93-96% | 2-2.5x |
| Q5_K_M | ~90-95% | 2.5-3x |
| Q4_K_M | ~85-92% | 3-4x |
| Q4_0 | ~80-88% | 3.5-4.5x |
| Q3_K_M | ~75-85% | 4-5x |
For example, a Q4_K_M quantized Mistral Nemo 12B:
- Full model MMLU: 68%
- Quant-adjusted MMLU: ~62% (91% retention)
- But runs at 80+ tokens/sec locally vs cloud latency
| Use Case | Recommended Model | Why |
|---|---|---|
| Simple tasks | mathstral-7b (Q4) | Fastest, low memory |
| General chat | mistral-nemo-12b (Q4) | Good balance, huge context |
| Coding | deepseek-coder-6.7b (Q4) | Optimized for code |
| Complex reasoning | internlm2_5-20b (Q4) | Strong reasoning despite quant |
The plugin adapts to your setup:
| Scenario | Behavior |
|---|---|
| No providers configured | Just monitors usage if hooks fire |
| Single provider | Warns on high usage, no shifting |
| No local models | Uses cloud free tiers first |
| No free tiers | Optimizes premium usage |
| Budget-only (OpenRouter) | Tracks spend, warns at threshold |
Add the provider to your plugin config:
"providers": {
"my-provider": {
"quotaSource": "manual",
"limit": 1000000
}
}Update manual usage to match your actual account:
openclaw router set-usage anthropic 79%- Ensure the server is running
- Check the default port is being used
- Run
openclaw router detect-localfor diagnostics
- Check you're in
dry-runorautomode - Use
--applyflag with optimize command - Restart gateway after changes:
kill -USR1 $(pgrep openclaw-gateway)
- Fork the repository
- Create a feature branch
- Make your changes
- Run
npm run check-types - Submit a PR
- OpenClaw - The AI agent framework
- openclaw-engram - Memory plugin
- openclaw-patcher - Auto-patching utility
MIT