One OpenAI-compatible endpoint in front of OpenAI, Anthropic, and GLM (Z.ai) — with prompt injection and a prompt-cache keepalive that most proxies don't have.
llm-bridge-cache is a tiny, zero-dependency local server. Point any app that speaks the OpenAI Chat Completions API (SillyTavern, OpenCode, Cline, your own scripts) at it, and it routes each request to the right backend by model name — translating formats on the fly. Bring your own keys; nothing is bundled.
OpenAI-compatible client LLM Bridge & Cache upstream
(SillyTavern / OpenCode / …) ──► POST /v1/chat/completions ──┬──► OpenAI /v1/chat/completions
└──► Anthropic /v1/messages (+ cache keepalive)
- One endpoint, many providers. Stop reconfiguring your client every time you switch models.
gpt-4ogoes to OpenAI,claude-sonnet-4-5to Anthropic,glm-4.6to GLM (Z.ai) — automatically. Any OpenAI-compatible backend can be added by dropping in abaseUrl. - Prompt-cache keepalive ⭐ Anthropic's prompt cache expires after ~5 minutes. Pause to think, and your next message re-pays full price to re-cache a huge system prompt. This bridge quietly replays the last request with
max_tokens=1on a timer to keep the cache warm — saving up to ~90% on input tokens for long, stable prompts (RP cards, agent system prompts, big lorebooks). - Prompt injection. Inject or override a system prompt, or insert messages at a chosen depth — without touching your client.
- Zero dependencies, BYO key. Pure Node standard library. Your keys live in your environment, never in the package.
| 🗣️ Three inbound formats | OpenAI Chat Completions, Anthropic Messages, and OpenAI Responses — reply comes back in the format you sent. |
| 🔀 Multi-provider routing | Model-name rules, explicit provider/model syntax, and per-provider model lists. |
| 🔑 Bring-your-own-key + rotation | Read keys from env or config; list several to round-robin across them. |
| ♻️ Prompt-cache keepalive | Per-conversation, multi-chat, auto-evicting. Anthropic-aware. |
| 💉 Prompt injection | System prepend/append/replace + depth injection. |
| 🌊 Streaming | Full SSE streaming for both providers, translated to OpenAI chunks. |
| 🧰 Tools & vision | Function-calling and image inputs converted between formats. |
| 📊 Status endpoint | /status shows providers, cache-refresh activity, and usage — never your keys. |
The bridge speaks three inbound API dialects, so almost any client works as-is. Each request is routed to a provider by model name; the reply comes back in the same format you sent.
| You send… | → Anthropic provider | → OpenAI provider |
|---|---|---|
/v1/chat/completions (OpenAI Chat) |
converted to Anthropic, response converted back | passthrough |
/v1/messages (Anthropic Messages) |
passthrough + prompt-cache breakpoints + keepalive | converted to OpenAI, response converted back |
/v1/responses (OpenAI Responses) |
not yet — returns a clear 400 | passthrough |
So Claude Code / the Anthropic SDK can keep speaking Anthropic on /v1/messages (and finally get the cache keepalive), Codex / the Agents SDK can speak Responses on /v1/responses, and everything else uses /v1/chat/completions. Responses→Anthropic translation is on the roadmap.
Requires Node ≥ 18.17.
# 1. set your keys (bring your own)
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
# 2. run it (no install needed)
npx llm-bridge-cacheThat's it — the bridge is now at http://127.0.0.1:8787/v1. Point your client there.
Prefer a config file?
npx llm-bridge-cache init # writes ./llm-bridge.config.json
# edit it, then:
npx llm-bridge-cacheGlobal install also works: npm i -g llm-bridge-cache then llm-bridge-cache (or the short alias llmbc).
Use base URL http://127.0.0.1:8787/v1 and any non-empty API key (unless you set a bridge key — see below).
SillyTavern
API → Chat Completion → Custom (OpenAI-compatible).
- Custom Endpoint:
http://127.0.0.1:8787/v1 - API Key: anything (e.g.
local), unless you enabled a bridge key. - Model:
claude-sonnet-4-5,gpt-4o, etc.
OpenCode
Add a custom OpenAI-compatible provider pointing at the bridge:
{
"provider": {
"bridge": {
"npm": "@ai-sdk/openai-compatible",
"options": { "baseURL": "http://127.0.0.1:8787/v1" },
"models": { "claude-sonnet-4-5": {}, "gpt-4o": {} }
}
}
}Claude Code / Anthropic SDK (via /v1/messages)
Point the Anthropic base URL at the bridge — it accepts the native Messages format and adds the cache keepalive:
export ANTHROPIC_BASE_URL=http://127.0.0.1:8787 # SDK appends /v1/messagesThe bridge forwards to Anthropic untouched (plus cache breakpoints + keepalive), or cross-converts to OpenAI if you target a gpt-* model.
Codex / OpenAI Agents SDK (via /v1/responses)
Set the OpenAI base URL to the bridge; the Responses API is passed through to OpenAI:
export OPENAI_BASE_URL=http://127.0.0.1:8787/v1 # SDK appends /responsesGeneric / curl
# OpenAI Chat Completions in
curl http://127.0.0.1:8787/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"claude-sonnet-4-5","messages":[{"role":"user","content":"hi"}]}'
# Anthropic Messages in
curl http://127.0.0.1:8787/v1/messages \
-H "Content-Type: application/json" \
-d '{"model":"claude-sonnet-4-5","max_tokens":256,"messages":[{"role":"user","content":"hi"}]}'llm-bridge-cache init writes a llm-bridge.config.json you can edit. Every field is optional and overlays the built-in defaults.
- Keys can be
"env:VAR_NAME"(read from the environment — recommended) or a literal string. baseUrlis overridable, so any OpenAI- or Anthropic-compatible gateway works too.- Routing precedence: explicit
provider/model(e.g.openai/o3) → first matching rule → providermodelslist →defaultProvider.
After each real Anthropic turn, the bridge fingerprints the request (model + system prefix) and starts a per-conversation timer. Every intervalMinutes it replays that request with max_tokens=1, refreshing the cached prefix for pennies. It tracks several chats at once, stops after maxRefreshes pings of inactivity, and evicts stale conversations. OpenAI caches automatically (no keepalive needed), so the refresher is armed only for Anthropic.
| Method | Path | Purpose |
|---|---|---|
POST |
/v1/chat/completions |
OpenAI Chat Completions in (streaming + non-streaming). |
POST |
/v1/messages |
Anthropic Messages in (streaming + non-streaming). |
POST |
/v1/responses |
OpenAI Responses in (streaming + non-streaming). |
GET |
/v1/models |
Lists configured models in OpenAI format. |
GET |
/status |
Providers, cache-refresh activity, usage (no keys). |
GET |
/health |
{ "status": "ok" }. |
By default the bridge listens only on 127.0.0.1 and accepts any client key. To require auth, set apiKey (or LLM_BRIDGE_API_KEY); clients must then send Authorization: Bearer <key>. If you bind to 0.0.0.0, set a bridge key.
llm-bridge-cache [options]
llm-bridge-cache init # write a starter config
-c, --config <path> JSON config file
-p, --port <n> port (default 8787)
-H, --host <addr> host (default 127.0.0.1)
-l, --log-level <lv> trace|debug|info|warn|error
-h, --help help
-v, --version version
git clone https://github.com/crossps/llm-bridge-cache
cd llm-bridge-cache
node --test # runs the smoke + unit suite (no API keys needed)
node bin/cli.js # run locallyThe test suite spins up a mock upstream, so it verifies routing, format conversion, streaming, prompt-cache breakpoints, and the keepalive without any real credentials.
Does it store my keys? No. Keys are read from your environment (or a config file you control) at runtime and used only for outbound requests. /status reports key counts, never values.
Is this affiliated with OpenAI or Anthropic? No. It's an independent, unofficial format bridge.
Can I add more providers? Anything OpenAI-compatible works today by adding a provider with its baseUrl. Native adapters for more backends are on the roadmap — PRs welcome.
MIT © crossps
{ "port": 8787, "host": "127.0.0.1", "apiKey": "env:LLM_BRIDGE_API_KEY", // require clients to send this; "" = open "defaultProvider": "anthropic", // fallback when no rule matches "providers": { "openai": { "baseUrl": "https://api.openai.com/v1", "apiKeys": ["env:OPENAI_API_KEY"], // list several to round-robin "models": ["gpt-4o", "o3"] }, "anthropic": { "baseUrl": "https://api.anthropic.com/v1", "apiKeys": ["env:ANTHROPIC_API_KEY"], "version": "2023-06-01", "models": ["claude-opus-4-5", "claude-sonnet-4-5"] }, "glm": { // Zhipu / Z.ai — OpenAI-compatible "baseUrl": "https://api.z.ai/api/paas/v4", "apiKeys": ["env:GLM_API_KEY"], "models": ["glm-4.6", "glm-4.5", "glm-4.5-flash"] } }, "routing": { "rules": [ { "match": "^(gpt-|o[0-9]|chatgpt)", "provider": "openai" }, { "match": "^claude", "provider": "anthropic" }, { "match": "^glm", "provider": "glm" } ] }, "injection": { "enabled": false, "systemMode": "prepend", // prepend | append | replace "system": "", "depth": [{ "role": "user", "content": "(stay in character)", "depth": 1 }] }, "anthropic": { "promptCaching": true, "defaultMaxTokens": 4096 }, "cacheRefresh": { "enabled": true, "intervalMinutes": 4.5, // < Anthropic's 5-min TTL "maxRefreshes": 3, // pings per cycle after each real turn "maxTokens": 1, "maxChats": 5 } }