Skip to content

crossps/llm-bridge-cache

Repository files navigation

LLM Bridge & Cache

One OpenAI-compatible endpoint in front of OpenAI, Anthropic, and GLM (Z.ai) — with prompt injection and a prompt-cache keepalive that most proxies don't have.

npm version license: MIT node dependencies: none

llm-bridge-cache is a tiny, zero-dependency local server. Point any app that speaks the OpenAI Chat Completions API (SillyTavern, OpenCode, Cline, your own scripts) at it, and it routes each request to the right backend by model name — translating formats on the fly. Bring your own keys; nothing is bundled.

   OpenAI-compatible client                 LLM Bridge & Cache                upstream
  (SillyTavern / OpenCode / …) ──► POST /v1/chat/completions ──┬──► OpenAI   /v1/chat/completions
                                                               └──► Anthropic /v1/messages  (+ cache keepalive)

Why it exists

  • One endpoint, many providers. Stop reconfiguring your client every time you switch models. gpt-4o goes to OpenAI, claude-sonnet-4-5 to Anthropic, glm-4.6 to GLM (Z.ai) — automatically. Any OpenAI-compatible backend can be added by dropping in a baseUrl.
  • Prompt-cache keepalive ⭐ Anthropic's prompt cache expires after ~5 minutes. Pause to think, and your next message re-pays full price to re-cache a huge system prompt. This bridge quietly replays the last request with max_tokens=1 on a timer to keep the cache warm — saving up to ~90% on input tokens for long, stable prompts (RP cards, agent system prompts, big lorebooks).
  • Prompt injection. Inject or override a system prompt, or insert messages at a chosen depth — without touching your client.
  • Zero dependencies, BYO key. Pure Node standard library. Your keys live in your environment, never in the package.

Features

🗣️ Three inbound formats OpenAI Chat Completions, Anthropic Messages, and OpenAI Responses — reply comes back in the format you sent.
🔀 Multi-provider routing Model-name rules, explicit provider/model syntax, and per-provider model lists.
🔑 Bring-your-own-key + rotation Read keys from env or config; list several to round-robin across them.
♻️ Prompt-cache keepalive Per-conversation, multi-chat, auto-evicting. Anthropic-aware.
💉 Prompt injection System prepend/append/replace + depth injection.
🌊 Streaming Full SSE streaming for both providers, translated to OpenAI chunks.
🧰 Tools & vision Function-calling and image inputs converted between formats.
📊 Status endpoint /status shows providers, cache-refresh activity, and usage — never your keys.

Accepted input formats

The bridge speaks three inbound API dialects, so almost any client works as-is. Each request is routed to a provider by model name; the reply comes back in the same format you sent.

You send… → Anthropic provider → OpenAI provider
/v1/chat/completions (OpenAI Chat) converted to Anthropic, response converted back passthrough
/v1/messages (Anthropic Messages) passthrough + prompt-cache breakpoints + keepalive converted to OpenAI, response converted back
/v1/responses (OpenAI Responses) not yet — returns a clear 400 passthrough

So Claude Code / the Anthropic SDK can keep speaking Anthropic on /v1/messages (and finally get the cache keepalive), Codex / the Agents SDK can speak Responses on /v1/responses, and everything else uses /v1/chat/completions. Responses→Anthropic translation is on the roadmap.

Quick start

Requires Node ≥ 18.17.

# 1. set your keys (bring your own)
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...

# 2. run it (no install needed)
npx llm-bridge-cache

That's it — the bridge is now at http://127.0.0.1:8787/v1. Point your client there.

Prefer a config file?

npx llm-bridge-cache init       # writes ./llm-bridge.config.json
# edit it, then:
npx llm-bridge-cache

Global install also works: npm i -g llm-bridge-cache then llm-bridge-cache (or the short alias llmbc).

Connect your client

Use base URL http://127.0.0.1:8787/v1 and any non-empty API key (unless you set a bridge key — see below).

SillyTavern

API → Chat Completion → Custom (OpenAI-compatible).

  • Custom Endpoint: http://127.0.0.1:8787/v1
  • API Key: anything (e.g. local), unless you enabled a bridge key.
  • Model: claude-sonnet-4-5, gpt-4o, etc.
OpenCode

Add a custom OpenAI-compatible provider pointing at the bridge:

{
  "provider": {
    "bridge": {
      "npm": "@ai-sdk/openai-compatible",
      "options": { "baseURL": "http://127.0.0.1:8787/v1" },
      "models": { "claude-sonnet-4-5": {}, "gpt-4o": {} }
    }
  }
}
Claude Code / Anthropic SDK (via /v1/messages)

Point the Anthropic base URL at the bridge — it accepts the native Messages format and adds the cache keepalive:

export ANTHROPIC_BASE_URL=http://127.0.0.1:8787   # SDK appends /v1/messages

The bridge forwards to Anthropic untouched (plus cache breakpoints + keepalive), or cross-converts to OpenAI if you target a gpt-* model.

Codex / OpenAI Agents SDK (via /v1/responses)

Set the OpenAI base URL to the bridge; the Responses API is passed through to OpenAI:

export OPENAI_BASE_URL=http://127.0.0.1:8787/v1   # SDK appends /responses
Generic / curl
# OpenAI Chat Completions in
curl http://127.0.0.1:8787/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-sonnet-4-5","messages":[{"role":"user","content":"hi"}]}'

# Anthropic Messages in
curl http://127.0.0.1:8787/v1/messages \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-sonnet-4-5","max_tokens":256,"messages":[{"role":"user","content":"hi"}]}'

Configuration

llm-bridge-cache init writes a llm-bridge.config.json you can edit. Every field is optional and overlays the built-in defaults.

{
  "port": 8787,
  "host": "127.0.0.1",
  "apiKey": "env:LLM_BRIDGE_API_KEY",   // require clients to send this; "" = open
  "defaultProvider": "anthropic",        // fallback when no rule matches

  "providers": {
    "openai": {
      "baseUrl": "https://api.openai.com/v1",
      "apiKeys": ["env:OPENAI_API_KEY"], // list several to round-robin
      "models": ["gpt-4o", "o3"]
    },
    "anthropic": {
      "baseUrl": "https://api.anthropic.com/v1",
      "apiKeys": ["env:ANTHROPIC_API_KEY"],
      "version": "2023-06-01",
      "models": ["claude-opus-4-5", "claude-sonnet-4-5"]
    },
    "glm": {                                  // Zhipu / Z.ai — OpenAI-compatible
      "baseUrl": "https://api.z.ai/api/paas/v4",
      "apiKeys": ["env:GLM_API_KEY"],
      "models": ["glm-4.6", "glm-4.5", "glm-4.5-flash"]
    }
  },

  "routing": {
    "rules": [
      { "match": "^(gpt-|o[0-9]|chatgpt)", "provider": "openai" },
      { "match": "^claude", "provider": "anthropic" },
      { "match": "^glm", "provider": "glm" }
    ]
  },

  "injection": {
    "enabled": false,
    "systemMode": "prepend",             // prepend | append | replace
    "system": "",
    "depth": [{ "role": "user", "content": "(stay in character)", "depth": 1 }]
  },

  "anthropic": { "promptCaching": true, "defaultMaxTokens": 4096 },

  "cacheRefresh": {
    "enabled": true,
    "intervalMinutes": 4.5,              // < Anthropic's 5-min TTL
    "maxRefreshes": 3,                   // pings per cycle after each real turn
    "maxTokens": 1,
    "maxChats": 5
  }
}
  • Keys can be "env:VAR_NAME" (read from the environment — recommended) or a literal string.
  • baseUrl is overridable, so any OpenAI- or Anthropic-compatible gateway works too.
  • Routing precedence: explicit provider/model (e.g. openai/o3) → first matching rule → provider models list → defaultProvider.

How the cache keepalive works

After each real Anthropic turn, the bridge fingerprints the request (model + system prefix) and starts a per-conversation timer. Every intervalMinutes it replays that request with max_tokens=1, refreshing the cached prefix for pennies. It tracks several chats at once, stops after maxRefreshes pings of inactivity, and evicts stale conversations. OpenAI caches automatically (no keepalive needed), so the refresher is armed only for Anthropic.

Endpoints

Method Path Purpose
POST /v1/chat/completions OpenAI Chat Completions in (streaming + non-streaming).
POST /v1/messages Anthropic Messages in (streaming + non-streaming).
POST /v1/responses OpenAI Responses in (streaming + non-streaming).
GET /v1/models Lists configured models in OpenAI format.
GET /status Providers, cache-refresh activity, usage (no keys).
GET /health { "status": "ok" }.

Securing the bridge

By default the bridge listens only on 127.0.0.1 and accepts any client key. To require auth, set apiKey (or LLM_BRIDGE_API_KEY); clients must then send Authorization: Bearer <key>. If you bind to 0.0.0.0, set a bridge key.

CLI

llm-bridge-cache [options]
llm-bridge-cache init                 # write a starter config

  -c, --config <path>    JSON config file
  -p, --port <n>         port (default 8787)
  -H, --host <addr>      host (default 127.0.0.1)
  -l, --log-level <lv>   trace|debug|info|warn|error
  -h, --help             help
  -v, --version          version

Development

git clone https://github.com/crossps/llm-bridge-cache
cd llm-bridge-cache
node --test            # runs the smoke + unit suite (no API keys needed)
node bin/cli.js        # run locally

The test suite spins up a mock upstream, so it verifies routing, format conversion, streaming, prompt-cache breakpoints, and the keepalive without any real credentials.

FAQ

Does it store my keys? No. Keys are read from your environment (or a config file you control) at runtime and used only for outbound requests. /status reports key counts, never values.

Is this affiliated with OpenAI or Anthropic? No. It's an independent, unofficial format bridge.

Can I add more providers? Anything OpenAI-compatible works today by adding a provider with its baseUrl. Native adapters for more backends are on the roadmap — PRs welcome.

License

MIT © crossps

About

Zero-dependency OpenAI-compatible API bridge for OpenAI & Anthropic, with prompt injection and an Anthropic prompt-cache keepalive. Bring your own key.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors