llm-infernece

Here are 4 public repositories matching this topic...

VectorInstitute / vector-inference

Efficient LLM inference on Slurm clusters.

inference speech-to-text vlm text-embedding multimodal audio-transcription llm vllm reward-model llm-infernece sglang llm-infrastructure

Updated Apr 22, 2026
Python

pandada8 / llm-inference-benchmark

Star

LLM 推理服务性能测试

llm-infernece

Updated Dec 17, 2023
Jupyter Notebook

scale-snu / layered-prefill

Star

Layered prefill changes the scheduling axis from tokens to layers and removes redundant MoE weight reloads while keeping decode stall free. The result is lower TTFT, lower end-to-end latency, and lower energy per token without hurting TBT stability.

inference moe llm llm-serving vllm llm-infernece

Updated Mar 9, 2026
Python

lucienhuangfu / eLLM

Star

eLLM can infer LLM on CPUs faster than on GPUs

inference transformer moe llama minimax cpu-inference qwen llm-infernece rust-llm

Updated Apr 24, 2026
Rust

Improve this page

Add a description, image, and links to the llm-infernece topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-infernece topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-infernece

Here are 4 public repositories matching this topic...

VectorInstitute / vector-inference

pandada8 / llm-inference-benchmark

scale-snu / layered-prefill

lucienhuangfu / eLLM

Improve this page

Add this topic to your repo