openclaw-tactician

Intelligent model routing for OpenClaw with quota prediction, task classification, and automatic optimization.

What It Does

Smart Router helps you get the most out of your LLM quotas by:

Predicting exhaustion - Know when you'll run out of tokens before it happens
Analyzing workloads - Identify which cron jobs and agents can use cheaper models
Automatic optimization - Shift workloads to appropriate models based on task complexity
Local model support - Route simple tasks to MLX, Ollama, or other local servers
Budget tracking - Monitor spend on pay-per-token providers like OpenRouter

Quick Start

1. Install

cd ~/.openclaw/extensions
git clone https://github.com/joshuaswarren/openclaw-tactician.git
cd openclaw-tactician
npm install && npm run build

2. Enable in openclaw.json

{
  "plugins": {
    "openclaw-tactician": {
      "mode": "dry-run",
      "providers": {
        "anthropic": {
          "quotaSource": "self-tracked",
          "quotaType": "tokens",
          "tier": "premium",
          "resetSchedule": { "type": "weekly", "dayOfWeek": 3, "hour": 7 }
        },
        "openai-codex": {
          "quotaSource": "self-tracked",
          "quotaType": "messages",
          "tier": "premium",
          "resetSchedule": { "type": "fixed", "fixedDate": "2026-02-09T14:36:00Z" }
        },
        "openrouter": {
          "quotaSource": "api",
          "quotaType": "budget",
          "tier": "budget"
        }
      }
    }
  }
}

3. Restart Gateway

kill -USR1 $(pgrep openclaw-gateway)

4. Check Status

openclaw router status

Usage

CLI Commands

# Show provider status and usage
openclaw router status [provider]

# Predict quota exhaustion
openclaw router predict [--hours=24]

# List configured providers
openclaw router providers

# Manually set usage (e.g., after checking your account)
openclaw router set-usage <provider> <percent|tokens>
# Examples:
openclaw router set-usage anthropic 79%
openclaw router set-usage openai-codex 91%

# Reset quota counter after provider reset
openclaw router reset <provider>

# Analyze crons/agents for optimization opportunities
openclaw router analyze [--type=all|crons|agents]

# Generate and optionally apply optimizations
openclaw router optimize [--apply] [--safe-only]

# Detect local model servers
openclaw router detect-local

# Get or set operation mode
openclaw router mode [manual|dry-run|auto]

Conversational Interface

Chat with OpenClaw using these capabilities:

"What's my token usage looking like?"
→ Calls router_status tool

"When will I run out of Codex tokens?"
→ Calls router_predict tool

"Which of my cron jobs could use cheaper models?"
→ Calls router_analyze tool

"Optimize my model usage"
→ Calls router_optimize tool (with confirmation)

"Move everything off Anthropic"
→ Calls router_shift tool

Operation Modes

Mode	Behavior
`manual`	CLI only. No automatic changes.
`dry-run`	Preview optimizations. Ask before applying. (Default)
`auto`	Automatically apply safe (reversible) optimizations.

Provider Usage Fetching

Direct API Fetchers (Real-Time Usage)

This plugin can fetch real-time usage from provider APIs, similar to CodexBar:

Provider	Status	Method	Environment Variables
Claude	✅ Working	Web API	`CLAUDE_SESSION_KEY`, `CLAUDE_COOKIES`
Codex	✅ Working	OAuth	Auto from `~/.codex/auth.json`
Kimi	✅ Working	API	`KIMI_AUTH_TOKEN`
Z.ai	✅ Working	API	`Z_AI_API_KEY`
OpenRouter	✅ Working	API	Auto from provider config
Copilot	⚠️ Partial	GitHub API	Auto from `gh` CLI or `GITHUB_TOKEN`
Gemini	✅ Working	OAuth	Auto from `~/.gemini/oauth_creds.json`

Fetching Usage

# Fetch from a specific provider
openclaw router fetch claude
openclaw router fetch kimi
openclaw router fetch zai
openclaw router fetch codex
openclaw router fetch gemini

# Check credential status
openclaw router credentials

Provider Setup

Claude (Anthropic)

Claude requires browser cookies to fetch usage. The session key alone isn't enough - Cloudflare requires cf_clearance.

Getting the cookies:

Open https://claude.ai in your browser
Open DevTools → Application → Cookies → claude.ai
Copy sessionKey (starts with sk-ant-sid01-)
Copy cf_clearance (Cloudflare token)

Environment variables:

export CLAUDE_SESSION_KEY="sk-ant-sid01-..."
export CLAUDE_COOKIES="sessionKey=sk-ant-sid01-...; cf_clearance=..."

Usage data returned:

Session (5hr): Percentage used, reset time
Weekly (7d): Percentage used, reset time
Opus: Separate tracking for Opus model

Kimi

Kimi requires a browser session JWT, NOT the API key.

Important: KIMI_API_KEY (starts with sk-kimi-) is for the coding API which has no usage endpoint.

Getting the token:

Open https://kimi.com in your browser and log in
Open DevTools → Application → Cookies → kimi.com
Copy the kimi-auth cookie value (it's a JWT)

Environment variable:

export KIMI_AUTH_TOKEN="eyJ..."

Z.ai (Zhipu/GLM)

Z.ai uses a standard API key for usage tracking.

Environment variable:

export Z_AI_API_KEY="your-api-key"

Regions:

Global: api.z.ai (default)
China: Set Z_AI_REGION=china to use open.bigmodel.cn

Codex (OpenAI)

Codex uses OAuth credentials stored by the Codex CLI.

Setup:

Install and run codex CLI
Authenticate via the CLI
Credentials are stored at ~/.codex/auth.json

No environment variables needed - the plugin reads from the auth file.

GitHub Copilot

Uses GitHub OAuth token.

Sources (checked in order):

GITHUB_TOKEN environment variable
gh auth token (GitHub CLI)
~/.config/gh/hosts.yml

Gemini (Google)

Gemini uses OAuth credentials stored by the Gemini CLI. Supports multiple accounts for quota rotation on free tier.

Single Account Setup:

Install and run gemini CLI
Authenticate via the CLI
Credentials are stored at ~/.gemini/oauth_creds.json

Multi-Account Setup (for free tier rotation):

Configure multiple accounts in openclaw.json:

"google": {
  "quotaSource": "api",
  "accounts": [
    { "name": "personal", "credentialsPath": "~/.gemini/oauth_creds.json" },
    { "name": "work1", "credentialsPath": "~/.gemini/accounts/work1.json" },
    { "name": "work2", "credentialsPath": "~/.gemini/accounts/work2.json" }
  ]
}

Setting up additional accounts:

Run gemini in a temp directory to authenticate with a different Google account
Copy the credentials to a unique path: cp ~/.gemini/oauth_creds.json ~/.gemini/accounts/work1.json
Repeat for each account

Usage data returned (per account):

Pro models: Usage percentage, reset time
Flash models: Usage percentage, reset time
Tier: Your subscription tier (if available)

Note: Gemini CLI tokens expire and cannot be auto-refreshed. Re-run gemini CLI when tokens expire.

Self-Tracked Providers

For providers without usage APIs (like Google Gemini), we track usage ourselves via the llm_end hook:

We don't know your actual limits - Set them manually with router set-usage
Tracking starts from zero - Historical usage before plugin install is unknown
Reset timing may drift - We reset when configured, not when your provider does

Recommended workflow:

# Check your provider's dashboard for current usage, then sync:
openclaw router set-usage google 45%

Quota Types

Type	Unit	Example Providers
`tokens`	Input + output tokens	Anthropic, OpenAI API
`messages`	Conversations/completions	OpenAI Codex (tier-based)
`requests`	API calls per day	Google free tier
`budget`	USD spend	OpenRouter

Self-Tracked Providers

For providers without usage APIs, we track usage ourselves via the llm_end hook:

We don't know your actual limits - Set them manually with router set-usage
Tracking starts from zero - Historical usage before plugin install is unknown
Reset timing may drift - We reset when configured, not when your provider does

Recommended workflow:

# Check your provider's dashboard for current usage
# Then sync the plugin:
openclaw router set-usage anthropic 79%
openclaw router set-usage openai-codex 91%

Configuration Reference

{
  "plugins": {
    "openclaw-tactician": {
      // Operation mode: manual, dry-run, auto
      "mode": "dry-run",

      // Enable debug logging
      "debug": false,

      // Provider-specific configuration
      "providers": {
        "anthropic": {
          // How to track: api, manual, unlimited, self-tracked
          "quotaSource": "self-tracked",
          // What the limit measures: tokens, requests, messages, budget
          "quotaType": "tokens",
          // Optional: set a limit for warnings (omit if unknown)
          // "limit": 10000000,
          // When quota resets
          "resetSchedule": {
            "type": "weekly",    // daily, weekly, monthly, fixed
            "dayOfWeek": 3,      // 0=Sunday (for weekly)
            "hour": 7,           // Hour of reset (0-23)
            "timezone": "America/Chicago"
          },
          // Cost tier: premium, standard, budget, free, local
          "tier": "premium",
          // Priority within tier (higher = preferred)
          "priority": 100
        },
        "openrouter": {
          "quotaSource": "api",
          "budget": {
            "monthlyLimit": 10.00,
            "alertThreshold": 0.8
          },
          "tier": "budget"
        },
        "local-mlx": {
          "quotaSource": "unlimited",
          "tier": "local",
          "local": {
            "type": "mlx",
            "endpoint": "http://localhost:8080",
            "models": ["mlx-community/Llama-3.2-3B-Instruct-4bit"]
          }
        }
      },

      // Minimum quality scores by task type
      "qualityThresholds": {
        "coding": 0.8,
        "reasoning": 0.75,
        "creative": 0.6,
        "simple": 0.4
      },

      // How far ahead to predict (hours)
      "predictionHorizonHours": 24,

      // Alert thresholds (0-1)
      "warningThreshold": 0.8,
      "criticalThreshold": 0.95,

      // Auto-optimization interval (minutes)
      "optimizationIntervalMinutes": 60,

      // When to use local models: never, simple-only, when-available, prefer
      "localModelPreference": "simple-only"
    }
  }
}

How It Works

Architecture

┌──────────────────────────────────────────────────────────────────┐
│                     openclaw-tactician                         │
├──────────────────────────────────────────────────────────────────┤
│                                                                   │
│  ┌─────────────┐   ┌─────────────┐   ┌─────────────────────────┐ │
│  │   Quota     │   │ Capability  │   │      Optimization       │ │
│  │  Tracker    │   │   Scorer    │   │        Engine           │ │
│  └─────┬───────┘   └──────┬──────┘   └───────────┬─────────────┘ │
│        │                  │                       │               │
│        ▼                  ▼                       ▼               │
│  ┌─────────────────────────────────────────────────────────────┐ │
│  │                   Provider Registry                          │ │
│  │  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌────────┐ │ │
│  │  │Anthropic│ │ OpenAI  │ │ Google  │ │OpenRouter│ │ Local  │ │ │
│  │  └─────────┘ └─────────┘ └─────────┘ └─────────┘ └────────┘ │ │
│  └─────────────────────────────────────────────────────────────┘ │
│                                                                   │
│  ┌─────────────────────────────────────────────────────────────┐ │
│  │                      Interface Layer                         │ │
│  │  ┌──────────────────────┐  ┌───────────────────────────────┐ │ │
│  │  │     CLI Commands     │  │     Agent Tools (Chat)        │ │ │
│  │  └──────────────────────┘  └───────────────────────────────┘ │ │
│  └─────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘

Task Classification

The plugin analyzes prompts to determine task complexity:

Signal	Classification	Quality Threshold
Code keywords, ``` blocks	Coding	0.8
"analyze", "design", "strategy"	Reasoning	0.75
"write", "story", "creative"	Creative	0.6
"summarize", "list", "check"	Simple	0.4

Model Capability Scoring

Each model is scored on capability dimensions (0-1):

coding - Code generation and debugging
reasoning - Logic, math, analysis
creative - Writing, brainstorming
instruction - Following complex instructions
context - Long context handling
speed - Response latency

Default scores are provided for common models. Override with manual scores in config.

Optimization Flow

Analyze - Scan cron jobs and agents for optimization opportunities
Score - Match task requirements to model capabilities
Plan - Generate actions (change model, add fallback, split job)
Apply - Execute changes (dry-run or live based on mode)

Local Model Support

The plugin auto-detects these local servers:

Server	Default Port	Detection
Ollama	11434	`GET /` returns "Ollama"
MLX-LM	8080	OpenAI-compatible `/v1/models`
LM Studio	1234	OpenAI-compatible `/v1/models`
vLLM	8000	`/health` endpoint

Run openclaw router detect-local to check what's available.

Configure localModelPreference:

Value	Behavior
`never`	Don't use local models
`simple-only`	Route simple tasks (summarize, list) to local
`when-available`	Use local when cloud is constrained
`prefer`	Prefer local over cloud when capable

LM Studio Setup

LM Studio provides a local OpenAI-compatible API server for running quantized models on macOS/Windows/Linux.

1. Install LM Studio

Download from lmstudio.ai or use the CLI:

# Check if installed
lms --version

# List available models
lms ls

# Start the server
lms server start

2. Configure Tactician

Add an lmstudio provider in your openclaw.json:

{
  "plugins": {
    "entries": {
      "openclaw-tactician": {
        "config": {
          "providers": {
            "lmstudio": {
              "quotaSource": "unlimited",
              "quotaType": "tokens",
              "tier": "local",
              "priority": 30,
              "local": {
                "type": "lmstudio",
                "endpoint": "http://127.0.0.1:1234/v1",
                "models": [
                  "internlm2_5-20b-chat",
                  "mistral-nemo-instruct-2407",
                  "mathstral-7b-v0.1"
                ]
              }
            }
          },
          "localModelPreference": "simple-only"
        }
      }
    }
  }
}

3. Verify Connection

# Check if server is running
curl http://127.0.0.1:1234/v1/models

# Or use the router CLI
openclaw router detect-local

Quantization & Benchmark Adjustments

Quantized models run faster but have reduced accuracy compared to full-precision models. Tactician applies automatic degradation factors:

Quantization	Performance Retention	Speed Boost
Q8_0	~95-98%	1.5-2x
Q6_K	~93-96%	2-2.5x
Q5_K_M	~90-95%	2.5-3x
Q4_K_M	~85-92%	3-4x
Q4_0	~80-88%	3.5-4.5x
Q3_K_M	~75-85%	4-5x

For example, a Q4_K_M quantized Mistral Nemo 12B:

Full model MMLU: 68%
Quant-adjusted MMLU: ~62% (91% retention)
But runs at 80+ tokens/sec locally vs cloud latency

Model Recommendations for Local

Use Case	Recommended Model	Why
Simple tasks	mathstral-7b (Q4)	Fastest, low memory
General chat	mistral-nemo-12b (Q4)	Good balance, huge context
Coding	deepseek-coder-6.7b (Q4)	Optimized for code
Complex reasoning	internlm2_5-20b (Q4)	Strong reasoning despite quant

Graceful Degradation

The plugin adapts to your setup:

Scenario	Behavior
No providers configured	Just monitors usage if hooks fire
Single provider	Warns on high usage, no shifting
No local models	Uses cloud free tiers first
No free tiers	Optimizes premium usage
Budget-only (OpenRouter)	Tracks spend, warns at threshold

Troubleshooting

"Unknown provider" errors

Add the provider to your plugin config:

"providers": {
  "my-provider": {
    "quotaSource": "manual",
    "limit": 1000000
  }
}

Predictions seem off

Update manual usage to match your actual account:

openclaw router set-usage anthropic 79%

Local models not detected

Ensure the server is running
Check the default port is being used
Run openclaw router detect-local for diagnostics

Optimizations not applying

Check you're in dry-run or auto mode
Use --apply flag with optimize command
Restart gateway after changes: kill -USR1 $(pgrep openclaw-gateway)

Contributing

Fork the repository
Create a feature branch
Make your changes
Run npm run check-types
Submit a PR

Related Projects

OpenClaw - The AI agent framework
openclaw-engram - Memory plugin
openclaw-patcher - Auto-patching utility

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
src		src
state		state
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
openclaw.plugin.json		openclaw.plugin.json
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts

Folders and files

Latest commit

History

Repository files navigation

openclaw-tactician

What It Does

Quick Start

1. Install

2. Enable in openclaw.json

3. Restart Gateway

4. Check Status

Usage

CLI Commands

Conversational Interface

Operation Modes

Provider Usage Fetching

Direct API Fetchers (Real-Time Usage)

Fetching Usage

Provider Setup

Claude (Anthropic)

Kimi

Z.ai (Zhipu/GLM)

Codex (OpenAI)

GitHub Copilot

Gemini (Google)

Self-Tracked Providers

Quota Types

Self-Tracked Providers

Configuration Reference

How It Works

Architecture

Task Classification

Model Capability Scoring

Optimization Flow

Local Model Support

LM Studio Setup

Quantization & Benchmark Adjustments

Model Recommendations for Local

Graceful Degradation

Troubleshooting

"Unknown provider" errors

Predictions seem off

Local models not detected

Optimizations not applying

Contributing

Related Projects

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages