Skip to content

joshuaswarren/openclaw-tactician

Repository files navigation

openclaw-tactician

npm version License: MIT

Intelligent model routing for OpenClaw with quota prediction, task classification, and automatic optimization.

What It Does

Smart Router helps you get the most out of your LLM quotas by:

  • Predicting exhaustion - Know when you'll run out of tokens before it happens
  • Analyzing workloads - Identify which cron jobs and agents can use cheaper models
  • Automatic optimization - Shift workloads to appropriate models based on task complexity
  • Local model support - Route simple tasks to MLX, Ollama, or other local servers
  • Budget tracking - Monitor spend on pay-per-token providers like OpenRouter

Quick Start

1. Install

cd ~/.openclaw/extensions
git clone https://github.com/joshuaswarren/openclaw-tactician.git
cd openclaw-tactician
npm install && npm run build

2. Enable in openclaw.json

{
  "plugins": {
    "openclaw-tactician": {
      "mode": "dry-run",
      "providers": {
        "anthropic": {
          "quotaSource": "self-tracked",
          "quotaType": "tokens",
          "tier": "premium",
          "resetSchedule": { "type": "weekly", "dayOfWeek": 3, "hour": 7 }
        },
        "openai-codex": {
          "quotaSource": "self-tracked",
          "quotaType": "messages",
          "tier": "premium",
          "resetSchedule": { "type": "fixed", "fixedDate": "2026-02-09T14:36:00Z" }
        },
        "openrouter": {
          "quotaSource": "api",
          "quotaType": "budget",
          "tier": "budget"
        }
      }
    }
  }
}

3. Restart Gateway

kill -USR1 $(pgrep openclaw-gateway)

4. Check Status

openclaw router status

Usage

CLI Commands

# Show provider status and usage
openclaw router status [provider]

# Predict quota exhaustion
openclaw router predict [--hours=24]

# List configured providers
openclaw router providers

# Manually set usage (e.g., after checking your account)
openclaw router set-usage <provider> <percent|tokens>
# Examples:
openclaw router set-usage anthropic 79%
openclaw router set-usage openai-codex 91%

# Reset quota counter after provider reset
openclaw router reset <provider>

# Analyze crons/agents for optimization opportunities
openclaw router analyze [--type=all|crons|agents]

# Generate and optionally apply optimizations
openclaw router optimize [--apply] [--safe-only]

# Detect local model servers
openclaw router detect-local

# Get or set operation mode
openclaw router mode [manual|dry-run|auto]

Conversational Interface

Chat with OpenClaw using these capabilities:

"What's my token usage looking like?"
→ Calls router_status tool

"When will I run out of Codex tokens?"
→ Calls router_predict tool

"Which of my cron jobs could use cheaper models?"
→ Calls router_analyze tool

"Optimize my model usage"
→ Calls router_optimize tool (with confirmation)

"Move everything off Anthropic"
→ Calls router_shift tool

Operation Modes

Mode Behavior
manual CLI only. No automatic changes.
dry-run Preview optimizations. Ask before applying. (Default)
auto Automatically apply safe (reversible) optimizations.

Provider Usage Fetching

Direct API Fetchers (Real-Time Usage)

This plugin can fetch real-time usage from provider APIs, similar to CodexBar:

Provider Status Method Environment Variables
Claude ✅ Working Web API CLAUDE_SESSION_KEY, CLAUDE_COOKIES
Codex ✅ Working OAuth Auto from ~/.codex/auth.json
Kimi ✅ Working API KIMI_AUTH_TOKEN
Z.ai ✅ Working API Z_AI_API_KEY
OpenRouter ✅ Working API Auto from provider config
Copilot ⚠️ Partial GitHub API Auto from gh CLI or GITHUB_TOKEN
Gemini ✅ Working OAuth Auto from ~/.gemini/oauth_creds.json

Fetching Usage

# Fetch from a specific provider
openclaw router fetch claude
openclaw router fetch kimi
openclaw router fetch zai
openclaw router fetch codex
openclaw router fetch gemini

# Check credential status
openclaw router credentials

Provider Setup

Claude (Anthropic)

Claude requires browser cookies to fetch usage. The session key alone isn't enough - Cloudflare requires cf_clearance.

Getting the cookies:

  1. Open https://claude.ai in your browser
  2. Open DevTools → Application → Cookies → claude.ai
  3. Copy sessionKey (starts with sk-ant-sid01-)
  4. Copy cf_clearance (Cloudflare token)

Environment variables:

export CLAUDE_SESSION_KEY="sk-ant-sid01-..."
export CLAUDE_COOKIES="sessionKey=sk-ant-sid01-...; cf_clearance=..."

Usage data returned:

  • Session (5hr): Percentage used, reset time
  • Weekly (7d): Percentage used, reset time
  • Opus: Separate tracking for Opus model

Kimi

Kimi requires a browser session JWT, NOT the API key.

Important: KIMI_API_KEY (starts with sk-kimi-) is for the coding API which has no usage endpoint.

Getting the token:

  1. Open https://kimi.com in your browser and log in
  2. Open DevTools → Application → Cookies → kimi.com
  3. Copy the kimi-auth cookie value (it's a JWT)

Environment variable:

export KIMI_AUTH_TOKEN="eyJ..."

Z.ai (Zhipu/GLM)

Z.ai uses a standard API key for usage tracking.

Environment variable:

export Z_AI_API_KEY="your-api-key"

Regions:

  • Global: api.z.ai (default)
  • China: Set Z_AI_REGION=china to use open.bigmodel.cn

Codex (OpenAI)

Codex uses OAuth credentials stored by the Codex CLI.

Setup:

  1. Install and run codex CLI
  2. Authenticate via the CLI
  3. Credentials are stored at ~/.codex/auth.json

No environment variables needed - the plugin reads from the auth file.

GitHub Copilot

Uses GitHub OAuth token.

Sources (checked in order):

  1. GITHUB_TOKEN environment variable
  2. gh auth token (GitHub CLI)
  3. ~/.config/gh/hosts.yml

Gemini (Google)

Gemini uses OAuth credentials stored by the Gemini CLI. Supports multiple accounts for quota rotation on free tier.

Single Account Setup:

  1. Install and run gemini CLI
  2. Authenticate via the CLI
  3. Credentials are stored at ~/.gemini/oauth_creds.json

Multi-Account Setup (for free tier rotation):

Configure multiple accounts in openclaw.json:

"google": {
  "quotaSource": "api",
  "accounts": [
    { "name": "personal", "credentialsPath": "~/.gemini/oauth_creds.json" },
    { "name": "work1", "credentialsPath": "~/.gemini/accounts/work1.json" },
    { "name": "work2", "credentialsPath": "~/.gemini/accounts/work2.json" }
  ]
}

Setting up additional accounts:

  1. Run gemini in a temp directory to authenticate with a different Google account
  2. Copy the credentials to a unique path: cp ~/.gemini/oauth_creds.json ~/.gemini/accounts/work1.json
  3. Repeat for each account

Usage data returned (per account):

  • Pro models: Usage percentage, reset time
  • Flash models: Usage percentage, reset time
  • Tier: Your subscription tier (if available)

Note: Gemini CLI tokens expire and cannot be auto-refreshed. Re-run gemini CLI when tokens expire.

Self-Tracked Providers

For providers without usage APIs (like Google Gemini), we track usage ourselves via the llm_end hook:

  1. We don't know your actual limits - Set them manually with router set-usage
  2. Tracking starts from zero - Historical usage before plugin install is unknown
  3. Reset timing may drift - We reset when configured, not when your provider does

Recommended workflow:

# Check your provider's dashboard for current usage, then sync:
openclaw router set-usage google 45%

Quota Types

Type Unit Example Providers
tokens Input + output tokens Anthropic, OpenAI API
messages Conversations/completions OpenAI Codex (tier-based)
requests API calls per day Google free tier
budget USD spend OpenRouter

Self-Tracked Providers

For providers without usage APIs, we track usage ourselves via the llm_end hook:

  1. We don't know your actual limits - Set them manually with router set-usage
  2. Tracking starts from zero - Historical usage before plugin install is unknown
  3. Reset timing may drift - We reset when configured, not when your provider does

Recommended workflow:

# Check your provider's dashboard for current usage
# Then sync the plugin:
openclaw router set-usage anthropic 79%
openclaw router set-usage openai-codex 91%

Configuration Reference

{
  "plugins": {
    "openclaw-tactician": {
      // Operation mode: manual, dry-run, auto
      "mode": "dry-run",

      // Enable debug logging
      "debug": false,

      // Provider-specific configuration
      "providers": {
        "anthropic": {
          // How to track: api, manual, unlimited, self-tracked
          "quotaSource": "self-tracked",
          // What the limit measures: tokens, requests, messages, budget
          "quotaType": "tokens",
          // Optional: set a limit for warnings (omit if unknown)
          // "limit": 10000000,
          // When quota resets
          "resetSchedule": {
            "type": "weekly",    // daily, weekly, monthly, fixed
            "dayOfWeek": 3,      // 0=Sunday (for weekly)
            "hour": 7,           // Hour of reset (0-23)
            "timezone": "America/Chicago"
          },
          // Cost tier: premium, standard, budget, free, local
          "tier": "premium",
          // Priority within tier (higher = preferred)
          "priority": 100
        },
        "openrouter": {
          "quotaSource": "api",
          "budget": {
            "monthlyLimit": 10.00,
            "alertThreshold": 0.8
          },
          "tier": "budget"
        },
        "local-mlx": {
          "quotaSource": "unlimited",
          "tier": "local",
          "local": {
            "type": "mlx",
            "endpoint": "http://localhost:8080",
            "models": ["mlx-community/Llama-3.2-3B-Instruct-4bit"]
          }
        }
      },

      // Minimum quality scores by task type
      "qualityThresholds": {
        "coding": 0.8,
        "reasoning": 0.75,
        "creative": 0.6,
        "simple": 0.4
      },

      // How far ahead to predict (hours)
      "predictionHorizonHours": 24,

      // Alert thresholds (0-1)
      "warningThreshold": 0.8,
      "criticalThreshold": 0.95,

      // Auto-optimization interval (minutes)
      "optimizationIntervalMinutes": 60,

      // When to use local models: never, simple-only, when-available, prefer
      "localModelPreference": "simple-only"
    }
  }
}

How It Works

Architecture

┌──────────────────────────────────────────────────────────────────┐
│                     openclaw-tactician                         │
├──────────────────────────────────────────────────────────────────┤
│                                                                   │
│  ┌─────────────┐   ┌─────────────┐   ┌─────────────────────────┐ │
│  │   Quota     │   │ Capability  │   │      Optimization       │ │
│  │  Tracker    │   │   Scorer    │   │        Engine           │ │
│  └─────┬───────┘   └──────┬──────┘   └───────────┬─────────────┘ │
│        │                  │                       │               │
│        ▼                  ▼                       ▼               │
│  ┌─────────────────────────────────────────────────────────────┐ │
│  │                   Provider Registry                          │ │
│  │  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌────────┐ │ │
│  │  │Anthropic│ │ OpenAI  │ │ Google  │ │OpenRouter│ │ Local  │ │ │
│  │  └─────────┘ └─────────┘ └─────────┘ └─────────┘ └────────┘ │ │
│  └─────────────────────────────────────────────────────────────┘ │
│                                                                   │
│  ┌─────────────────────────────────────────────────────────────┐ │
│  │                      Interface Layer                         │ │
│  │  ┌──────────────────────┐  ┌───────────────────────────────┐ │ │
│  │  │     CLI Commands     │  │     Agent Tools (Chat)        │ │ │
│  │  └──────────────────────┘  └───────────────────────────────┘ │ │
│  └─────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘

Task Classification

The plugin analyzes prompts to determine task complexity:

Signal Classification Quality Threshold
Code keywords, ``` blocks Coding 0.8
"analyze", "design", "strategy" Reasoning 0.75
"write", "story", "creative" Creative 0.6
"summarize", "list", "check" Simple 0.4

Model Capability Scoring

Each model is scored on capability dimensions (0-1):

  • coding - Code generation and debugging
  • reasoning - Logic, math, analysis
  • creative - Writing, brainstorming
  • instruction - Following complex instructions
  • context - Long context handling
  • speed - Response latency

Default scores are provided for common models. Override with manual scores in config.

Optimization Flow

  1. Analyze - Scan cron jobs and agents for optimization opportunities
  2. Score - Match task requirements to model capabilities
  3. Plan - Generate actions (change model, add fallback, split job)
  4. Apply - Execute changes (dry-run or live based on mode)

Local Model Support

The plugin auto-detects these local servers:

Server Default Port Detection
Ollama 11434 GET / returns "Ollama"
MLX-LM 8080 OpenAI-compatible /v1/models
LM Studio 1234 OpenAI-compatible /v1/models
vLLM 8000 /health endpoint

Run openclaw router detect-local to check what's available.

Configure localModelPreference:

Value Behavior
never Don't use local models
simple-only Route simple tasks (summarize, list) to local
when-available Use local when cloud is constrained
prefer Prefer local over cloud when capable

LM Studio Setup

LM Studio provides a local OpenAI-compatible API server for running quantized models on macOS/Windows/Linux.

1. Install LM Studio

Download from lmstudio.ai or use the CLI:

# Check if installed
lms --version

# List available models
lms ls

# Start the server
lms server start

2. Configure Tactician

Add an lmstudio provider in your openclaw.json:

{
  "plugins": {
    "entries": {
      "openclaw-tactician": {
        "config": {
          "providers": {
            "lmstudio": {
              "quotaSource": "unlimited",
              "quotaType": "tokens",
              "tier": "local",
              "priority": 30,
              "local": {
                "type": "lmstudio",
                "endpoint": "http://127.0.0.1:1234/v1",
                "models": [
                  "internlm2_5-20b-chat",
                  "mistral-nemo-instruct-2407",
                  "mathstral-7b-v0.1"
                ]
              }
            }
          },
          "localModelPreference": "simple-only"
        }
      }
    }
  }
}

3. Verify Connection

# Check if server is running
curl http://127.0.0.1:1234/v1/models

# Or use the router CLI
openclaw router detect-local

Quantization & Benchmark Adjustments

Quantized models run faster but have reduced accuracy compared to full-precision models. Tactician applies automatic degradation factors:

Quantization Performance Retention Speed Boost
Q8_0 ~95-98% 1.5-2x
Q6_K ~93-96% 2-2.5x
Q5_K_M ~90-95% 2.5-3x
Q4_K_M ~85-92% 3-4x
Q4_0 ~80-88% 3.5-4.5x
Q3_K_M ~75-85% 4-5x

For example, a Q4_K_M quantized Mistral Nemo 12B:

  • Full model MMLU: 68%
  • Quant-adjusted MMLU: ~62% (91% retention)
  • But runs at 80+ tokens/sec locally vs cloud latency

Model Recommendations for Local

Use Case Recommended Model Why
Simple tasks mathstral-7b (Q4) Fastest, low memory
General chat mistral-nemo-12b (Q4) Good balance, huge context
Coding deepseek-coder-6.7b (Q4) Optimized for code
Complex reasoning internlm2_5-20b (Q4) Strong reasoning despite quant

Graceful Degradation

The plugin adapts to your setup:

Scenario Behavior
No providers configured Just monitors usage if hooks fire
Single provider Warns on high usage, no shifting
No local models Uses cloud free tiers first
No free tiers Optimizes premium usage
Budget-only (OpenRouter) Tracks spend, warns at threshold

Troubleshooting

"Unknown provider" errors

Add the provider to your plugin config:

"providers": {
  "my-provider": {
    "quotaSource": "manual",
    "limit": 1000000
  }
}

Predictions seem off

Update manual usage to match your actual account:

openclaw router set-usage anthropic 79%

Local models not detected

  1. Ensure the server is running
  2. Check the default port is being used
  3. Run openclaw router detect-local for diagnostics

Optimizations not applying

  • Check you're in dry-run or auto mode
  • Use --apply flag with optimize command
  • Restart gateway after changes: kill -USR1 $(pgrep openclaw-gateway)

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Run npm run check-types
  5. Submit a PR

Related Projects

License

MIT

About

Intelligent model routing for OpenClaw with quota prediction, task classification, and automatic optimization

Topics

Resources

License

Stars

Watchers

Forks

Packages