sglang
Here are 80 public repositories matching this topic...
A GPU cluster manager that configures and orchestrates inference engines like vLLM and SGLang for high-performance AI model deployment.
-
Updated
Mar 28, 2026 - Python
OpenClaw-RL: Train any agent simply by talking
-
Updated
Mar 28, 2026 - Python
MOSS-TTSD is a spoken dialogue generation model designed for expressive multi-speaker synthesis. It features long-context modeling, flexible speaker control, and multilingual support, while enabling zero-shot voice cloning from short audio references.
-
Updated
Mar 23, 2026 - Python
LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.
-
Updated
Mar 28, 2026 - Python
SOTA rounding-based quantization for high-accuracy low-bit LLM inference, seamlessly optimized for CPU, Intel GPU, and CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.
-
Updated
Mar 28, 2026 - Python
MOVA: Towards Scalable and Synchronized Video–Audio Generation
-
Updated
Mar 14, 2026 - Python
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
-
Updated
Mar 27, 2026 - Python
基于SparkTTS、OrpheusTTS等模型,提供高质量中文语音合成与声音克隆服务。
-
Updated
May 18, 2025 - Python
Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, TensorRT-LLM, and Triton
-
Updated
Mar 28, 2026 - Go
☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!
-
Updated
Jan 26, 2026 - Go
High-performance lightweight proxy and load balancer for LLM infrastructure. Intelligent routing, automatic failover and unified model discovery across local and remote inference backends.
-
Updated
Mar 13, 2026 - Go
Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history, tokenization caching, Responses API, embeddings, WASM plugins, MCP, and multi-tenant auth.
-
Updated
Mar 28, 2026 - Rust
Efficient LLM inference on Slurm clusters.
-
Updated
Mar 27, 2026 - Python
Improve this page
Add a description, image, and links to the sglang topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the sglang topic, visit your repo's landing page and select "manage topics."