add ai gateway#1
Merged
Merged
Conversation
…ve smoke A centralized internal egress L7 proxy to LLM providers (Pingora + tokio). Apps point their stock OpenAI/Anthropic SDK at it; the gateway authenticates, swaps in the real provider key, relays the response untouched, and emits token-usage facts for billing. Self-contained: no path deps into the beyond repo. Auth branches on key format: bai_… is a stateless Ed25519-signed virtual key (verify → deny-set check → swap to the pool key); anything else is BYO — the user's own provider token, passed through unchanged. Providers are data: a row in route::KNOWN_PROVIDERS (name, authority, base path, auth scheme) or a config entry — adding an OpenAI-wire provider is one line, no new code paths. Ships 10 known providers (openai, anthropic, openrouter, fireworks, groq, deepseek, together, cerebras, mistral, xai), each with its connection facts verified against the provider's official docs (cited inline in route.rs). The client's /v1 prefix is rewritten to each provider's real mount point (Groq /openai/v1, Fireworks /inference/v1, OpenRouter /api/v1) so a verbatim passthrough can't 404. Hardening: per-key rate guardrail (count-min, fixed memory), gap-free deny-set seeding (resume-from-revision), optional on-disk snapshot for restart-before-NATS enforcement, chunked-safe body-size cap, redacting/zeroizing Secret newtype, TTL-cached async DNS, NATS-independent auth (fail-open deny-set). Verification: - 45 unit tests; e2e suite (real beyond-ai binary + real nats-server + mock upstream) covering managed key-swap, BYO passthrough, both dialects, usage metering, deny-set propagation, rate limiting, snapshot restart. - Live smoke suite (tests/smoke.rs, mise run test:smoke): exercises the full managed path — Ed25519 verify → deny-check → key-swap → real TLS — against real providers, gated per API key (#[ignore] + key-presence). The Anthropic managed path is verified green against production. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The provider is now the request's first path segment (`/{provider}/…`); the rest
of the path is forwarded to the upstream verbatim (native passthrough — the
gateway holds no per-provider mount knowledge). A bare path with no provider
prefix that starts with `/v1` is the drop-in default: dialect picks
openai/anthropic, so those two are a host-only swap. An unknown first segment is
a 404.
This makes the gateway a true drop-in for a provider's base URL with any stock
tool (Codex, Cursor, the OpenAI/Anthropic SDKs) — provider is a base-URL concern,
not a per-request header that tools can't set. Removes the `x-beyond-provider`
header, the per-provider `base_path` rewrite table, `CLIENT_PREFIX`, and
`Provider::upstream_path` — a net simplification. `dialect` becomes a provider
attribute (drives usage parsing + stream_options injection eligibility, now a
prefix-agnostic suffix check); `dialect_for_path` survives only for the bare-path
default.
Auth swap / BYO passthrough, deny-set, rate limits, usage metering, model
provenance, and stream_options injection are unchanged.
Swept end to end per the plan: route/proxy/state/config + all inline comments;
e2e (reworked Fireworks → prefix-strip assertion; added `/openai`-prefix ==
bare-default and unknown→404 tests); all smoke tests (per-provider native paths,
Anthropic via `/anthropic/v1/messages`); ARCHITECTURE, README, config.example.
Verified: 68 lib + 18 e2e + 10 smoke pass, clippy clean across all targets, no
remaining x-beyond-provider/base_path/upstream_path/CLIENT_PREFIX references, and
the live Anthropic prefix route returns 200 (`mise run test:smoke`).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The repo was a Cargo workspace with exactly one member (`crates/gateway`) — a
remnant of a planned multi-crate layout (SDK, control-plane) that was dropped.
Over-structured: a workspace's only leverage (`[workspace.dependencies]`,
`[workspace.package]`, shared lints across crates) needs 2+ members to pay off.
Flattened to a single crate at the repo root:
- `crates/gateway/{src,tests,benches}` → `{src,tests,benches}` (git-tracked moves,
history preserved).
- The two manifests merged into one root `Cargo.toml`: `[workspace.*]` tables
inlined as `[package]`/`[dependencies]`/`[dev-dependencies]`, dep versions
resolved from `[workspace.dependencies]`. Dropped the unused `async-nats`
workspace dep (pulled transitively via slipstream; never named directly).
Rigor preserved exactly — same `[lints.rust]` (forbid unsafe, deny
unused_must_use) and `[lints.clippy]` panic-surface denies (unwrap/expect/panic/
todo/unimplemented), same `[profile.release]` overflow-checks. `[lints]` at the
package level still binds every target (lib/bin/tests/benches), so the bin-root
gap stays closed.
`mise check:rs` drops the now-meaningless `--workspace`; CI/ci.yml comments
updated `[workspace.lints]` → `[lints]`.
All CI steps pass locally (RUSTFLAGS=-D warnings): dprint check, cargo fmt
--check, clippy --all-targets -D warnings, 68 unit + 18 e2e (10 smoke ignored),
release build. Live Anthropic smoke green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.