Train MetaClaw on your own GPU servers instead of Tinker cloud. Keep training data and model weights on your infrastructure.
MetaClaw (local) ── HTTP ──► Training Server (your GPU server)
config.yaml: ├── Training: PEFT/LoRA
backend: remote ├── Inference: vLLM or HuggingFace
└── Checkpoints: local filesystem
The remote backend is a drop-in alternative to Tinker. Switch with one config change — no code modifications needed.
On your GPU server:
# Clone the repo
git clone https://github.com/OctoClaws/MetaClaw.git
cd MetaClaw
# Install dependencies (most should already be present on GPU servers)
pip install -r metaclaw_training_server/requirements.txt
# Start the server
export METACLAW_API_KEY="your-secret-key"
bash scripts/start_training_server.sh --port 8000Verify it's running:
curl http://localhost:8000/v1/health
# {"status":"healthy","gpu_count":8,"model":"","step":0,...}In ~/.metaclaw/config.yaml:
mode: rl
rl:
enabled: true
backend: remote
remote_url: http://your-gpu-server:8000
remote_api_key: your-secret-key
model: Qwen/Qwen3-4B # or local path on GPU server
lora_rank: 32
batch_size: 4That's it. MetaClaw will automatically route all training and inference to your server.
rl:
backend: tinker # or "auto"
tinker_api_key: tk-xxxpython3 -m metaclaw_training_server.server \
--host 0.0.0.0 \
--port 8000 \
--api-key "your-secret-key" \
--checkpoint-dir /data/metaclaw_checkpoints \
--training-devices 0,1 \
--inference-devices 2,3 \
--inference-backend vllm \
--log-level INFO| Variable | Default | Description |
|---|---|---|
METACLAW_API_KEY |
(empty) | Bearer token for authentication |
METACLAW_PORT |
8000 |
Server port |
METACLAW_CHECKPOINT_DIR |
/mnt/data/metaclaw_checkpoints |
Where to save checkpoints |
METACLAW_TRAINING_DEVICES |
0 |
Comma-separated GPU ids for training |
METACLAW_INFERENCE_DEVICES |
1 |
Comma-separated GPU ids for inference |
METACLAW_INFERENCE_BACKEND |
hf |
vllm or hf (HuggingFace generate) |
For a typical 8×A100-80GB setup with Qwen3-4B:
| Use | GPUs | VRAM |
|---|---|---|
| Training (PEFT/LoRA) | GPU 0 | ~10 GB |
| Inference (vLLM) | GPU 1 | ~10 GB |
| Free | GPU 2–7 | Available |
For larger models (e.g. 72B), scale up the device allocation:
--training-devices 0,1,2,3 --inference-devices 4,5,6,7hf (HuggingFace generate) — Simple, no extra setup. Uses the training model directly for inference. Recommended for getting started.
vllm — Fast inference with LoRA hot-swap. After each training step, the new LoRA adapter is loaded dynamically without restarting vLLM. Recommended for production.
All endpoints require Authorization: Bearer <api_key> header (except /v1/health).
Health check. Always open (no auth required).
Response:
{
"status": "healthy",
"gpu_count": 8,
"model": "Qwen/Qwen3-4B",
"step": 42,
"inference_backend": "vllm",
"version": "0.1.0"
}Create a new LoRA training session. Loads the base model and initializes the LoRA adapter.
Request:
{
"base_model": "Qwen/Qwen3-4B",
"rank": 32
}base_model can be a HuggingFace model id or a local path on the GPU server.
Response:
{
"session_id": "a1b2c3d4",
"status": "ready",
"model": "Qwen/Qwen3-4B",
"rank": 32,
"trainable_params": "328.60M / 3890.12M"
}Forward + backward pass on a batch of training datums. Does not update weights — call optim_step separately.
Request:
{
"session_id": "a1b2c3d4",
"datums": [
{
"model_input_tokens": [1, 2, 3, ...],
"target_tokens": [2, 3, 4, ...],
"logprobs": [0.0, 0.0, -1.5, ...],
"advantages": [0.0, 0.0, 0.3, ...]
}
],
"loss_fn": "importance_sampling"
}Supported loss_fn values:
importance_sampling— GRPO-style (default)ppo— Clipped PPOcispo— Conservative importance sampling
Response:
{
"status": "ok",
"loss": 0.123,
"total_tokens": 256,
"num_datums": 4
}Run optimizer step (AdamW) and zero gradients.
Request:
{
"session_id": "a1b2c3d4",
"learning_rate": 1e-4
}Response:
{
"status": "ok",
"step": 1,
"grad_norm": 0.456,
"avg_loss": 0.123,
"learning_rate": 0.0001
}Save LoRA adapter weights. If vLLM is the inference backend, the new adapter is hot-swapped automatically.
Request:
{
"session_id": "a1b2c3d4",
"name": "step_0010"
}Response:
{
"status": "ok",
"adapter_path": "/data/metaclaw_checkpoints/adapters/step_0010",
"step": 10
}Save full training state (adapter weights + optimizer state + step count).
Request:
{
"session_id": "a1b2c3d4",
"name": "checkpoint_v1"
}Response:
{
"status": "ok",
"path": "/data/metaclaw_checkpoints/checkpoints/checkpoint_v1"
}Resume training from a saved checkpoint.
Request:
{
"session_id": "a1b2c3d4",
"path": "/data/metaclaw_checkpoints/checkpoints/checkpoint_v1"
}Generate tokens with the current model (base + LoRA adapter).
Request:
{
"tokens": [1, 2, 3, 4, 5],
"num_samples": 1,
"temperature": 0.7,
"max_tokens": 2048,
"top_k": 50,
"top_p": 0.95,
"stop": ["\n"]
}Response:
{
"sequences": [
{
"tokens": [6, 7, 8, 9, 10],
"logprobs": [-1.2, -0.8, -1.5, -0.3, -2.1],
"stop_reason": "stop"
}
]
}Set METACLAW_API_KEY on the server. All requests (except /v1/health) require:
Authorization: Bearer your-secret-key
If your GPU server is on a public network, use an SSH tunnel instead of exposing the port directly:
# On your local machine
ssh -L 8000:localhost:8000 user@gpu-server -N
# Then configure MetaClaw to use localhost
rl:
remote_url: http://localhost:8000For production, put the server behind a reverse proxy (nginx/caddy) with TLS:
MetaClaw → HTTPS → nginx → HTTP → training server
Check that the server is running and the port is accessible:
# On GPU server
curl http://localhost:8000/v1/health
# From local machine (if using SSH tunnel)
curl http://localhost:8000/v1/healthIf the GPU server can't access HuggingFace, use a local model path:
rl:
model: /mnt/data/models/Qwen3-4B # local path on GPU serverReduce model size, LoRA rank, or adjust GPU allocation:
# Use more GPUs for training
--training-devices 0,1,2,3
# Reduce LoRA rank
rl:
lora_rank: 16Switch from HuggingFace generate to vLLM:
--inference-backend vllm --inference-devices 1,2┌──────────────────────────────────────────────────────┐
│ MetaClaw (local) │
│ │
│ ┌─────────────┐ ┌──────────────┐ │
│ │ trainer.py │───►│ sdk_backend │ │
│ │ │ │ .py │ │
│ │ data_ │ │ backend= │ │
│ │ formatter.py │ │ remote │ │
│ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │
│ ┌──────▼───────────────────▼───────┐ │
│ │ remote_backend/client.py │ │
│ │ ServiceClient / TrainingClient │ │
│ │ SamplingClient │ │
│ └──────────────┬───────────────────┘ │
└─────────────────┼────────────────────────────────────┘
│ HTTP (Bearer Token)
┌─────────────────▼────────────────────────────────────┐
│ GPU Server │
│ │
│ ┌────────────────────────────────────────┐ │
│ │ metaclaw_training_server │ │
│ │ ┌──────────────┐ ┌───────────────┐ │ │
│ │ │ training_ │ │ inference_ │ │ │
│ │ │ engine.py │ │ engine.py │ │ │
│ │ │ (PEFT/LoRA) │ │ (vLLM/HF) │ │ │
│ │ │ GPU 0 │ │ GPU 1 │ │ │
│ │ └──────────────┘ └───────────────┘ │ │
│ └────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────┘
| Feature | Tinker | Remote |
|---|---|---|
| Setup | API key only | Deploy server on GPU |
| Data location | Tinker cloud | Your server |
| GPU cost | Pay-per-use | Your hardware |
| Supported models | Tinker catalog | Any HuggingFace model |
| Inference | Tinker sampling | vLLM / HuggingFace |
| LoRA hot-swap | Built-in | vLLM native |
| Checkpoints | Tinker storage | Local filesystem |
| Network | Internet required | LAN / SSH tunnel |