sglang

Star

Here are 80 public repositories matching this topic...

kvcache-ai / Mooncake

Star

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

inference rdma disaggregation llm vllm sglang kvcache

Updated Mar 28, 2026
C++

gpustack / gpustack

Star

A GPU cluster manager that configures and orchestrates inference engines like vLLM and SGLang for high-performance AI model deployment.

cuda inference openai llama maas rocm ascend llm llm-serving vllm genai llm-inference qwen deepseek sglang distributed-inference high-performance-inference mindie

Updated Mar 28, 2026
Python

Gen-Verse / OpenClaw-RL

Star

OpenClaw-RL: Train any agent simply by talking

async gui-application coding slime tinker memory-systems skill-learning rlhf sglang grpo on-policy-distillation openclaw-skills open-claw

Updated Mar 28, 2026
Python

MOSS-TTSD is a spoken dialogue generation model designed for expressive multi-speaker synthesis. It features long-context modeling, flexible speaker control, and multilingual support, while enabling zero-shot voice cloning from short audio references.

streaming finetune text-to-speeh large-language-models sglang speech-dialogue-generation

Updated Mar 23, 2026
Python

ModelCloud / GPTQModel

Star

LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

transformers quantization optimum peft vllm gptq sglang

Updated Mar 28, 2026
Python

intel / auto-round

Star

SOTA rounding-based quantization for high-accuracy low-bit LLM inference, seamlessly optimized for CPU, Intel GPU, and CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

transformers rounding quantization int4 llms vllm gguf vlms sglang mxfp4 nvfp4

Updated Mar 28, 2026
Python

OpenMOSS / MOVA

Star

MOVA: Towards Scalable and Synchronized Video–Audio Generation

multimodal diffusion-models sglang video-audio-generation

Updated Mar 14, 2026
Python

ovg-project / kvcached

Star

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

serverless inference-engine llm llm-serving vllm llm-inference ollama llm-framework sglang kvcache gpu-sharing kvcached gpu-mutiplexing kvcache-optimization elastic-kvcache online-offline-coserve

Updated Mar 27, 2026
Python

sgl-project / SpecForge

Star

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

training eagle pytorch llm fsdp sglang eagle3

Updated Mar 25, 2026
Python

SemiAnalysisAI / InferenceX

Star

Open Source Continuous Inference Benchmarking Qwen3.5, DeepSeek, GPTOSS - GB200 NVL72 vs MI355X vs B200 vs GB300 NVL72 vs H100 & soon™ TPUv6e/v7/Trainium2/3

benchmark ai amd cuda pytorch nvidia rocm llm vllm sglang gb200

Updated Mar 28, 2026
Python

HuiResearch / FlashTTS

Star

基于SparkTTS、OrpheusTTS等模型，提供高质量中文语音合成与声音克隆服务。

vllm sglang llamacpp-python sparktts spark-tts orpheus-tts megatts3 flashtts

Updated May 18, 2025
Python

sgl-project / ome

Star

Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, TensorRT-LLM, and Triton

k8s llama oracle-cloud model-serving model-as-a-service multi-node-kubernetes llm vllm llm-inference qwen deepseek sglang kimi-k2 pd-disaggregation

Updated Mar 28, 2026
Go

0xSero / vllm-studio

Star

Control panel for VLLM, Sglang, llama.cpp, exllamav3

ai local hosting self llamacpp vllm exllama local-ai sglang

Updated Mar 23, 2026
TypeScript

InftyAI / llmaz

Star

☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!

kubernetes inference huggingface llm modelscope llamacpp vllm text-generation-inference ollama sglang inference-platform

Updated Jan 26, 2026
Go

shell-nlp / gpt_server

Star

gpt_server是一个用于生产级部署LLMs、Embedding、Reranker、ASR、TTS、文生图、图片编辑和文生视频的开源框架。

tts openai llama gpt infinity embedding asr text-moderation llm prompt-injection vllm fastchat function-calling rerank sglang lmdeploy

Updated Mar 22, 2026
Python

sgl-project / rbg

Star

A workload for deploying LLM inference services on Kubernetes

k8s llm sglang pd-disagg

Updated Mar 27, 2026
Go

thushan / olla

Sponsor

Star

High-performance lightweight proxy and load balancer for LLM infrastructure. Intelligent routing, automatic failover and unified model discovery across local and remote inference backends.

Updated Mar 13, 2026
Go

lightseekorg / smg

Star

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history, tokenization caching, Responses API, embeddings, WASM plugins, MCP, and multi-tenant auth.

chat mcp routing gemini openai claude llm anthropic vllm sglang anthropic-api inference-gateway responses-api tensorrtllm trtllm lightseek

Updated Mar 28, 2026
Rust

VectorInstitute / vector-inference

Star

Efficient LLM inference on Slurm clusters.

inference speech-to-text vlm text-embedding multimodal audio-transcription llm vllm reward-model llm-infernece sglang llm-infrastructure

Updated Mar 27, 2026
Python

spark-arena / sparkrun

Star

sparkrun - launch, manage, and stop LLM inference workloads on NVIDIA DGX Spark systems

inference llama-cpp vllm sglang dgx-spark

Updated Mar 28, 2026
Python

Improve this page

Add a description, image, and links to the sglang topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the sglang topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sglang

Here are 80 public repositories matching this topic...

kvcache-ai / Mooncake

gpustack / gpustack

Gen-Verse / OpenClaw-RL

OpenMOSS / MOSS-TTSD

ModelCloud / GPTQModel

intel / auto-round

OpenMOSS / MOVA

ovg-project / kvcached

sgl-project / SpecForge

SemiAnalysisAI / InferenceX

HuiResearch / FlashTTS

sgl-project / ome

0xSero / vllm-studio

InftyAI / llmaz

shell-nlp / gpt_server

sgl-project / rbg

thushan / olla

lightseekorg / smg

VectorInstitute / vector-inference

spark-arena / sparkrun

Improve this page

Add this topic to your repo