Popular repositories Loading
-
vLLM-2080Ti-Definitive
vLLM-2080Ti-Definitive PublicThe definitive vLLM runtime for dual RTX 2080 Ti 22GB + NVLink, delivering 27B/31B local inference with 100+ tok/s single-request decode and native 262K context.
-
2080Ti-LLM-Toolbox
2080Ti-LLM-Toolbox PublicSingle-request LLM serving recipes, patches, and benchmarks for modified RTX 2080 Ti 22GB / SM75 systems
-
rdna1-gfx101x-rocm-llama-fix
rdna1-gfx101x-rocm-llama-fix Publicmake RDNA1 / Navi1x / gfx101x GPUs (5700XT, W5500 etc.) run modern large language models on ROCm 6 and ROCm 7 through llama.cpp, compatible with modern LLM like Gemma 4 and Qwen3.5, even on a host …
Shell 3
-
ai-proxy-hub
ai-proxy-hub PublicAI Proxy Hub is a cross-platform local gateway for AI clients and upstream APIs. It unifies multiple upstream endpoints behind one local control plane, adds protocol-aware routing and failover, and…
Python 3
-
FlashQLA-SM70-SM75
FlashQLA-SM70-SM75 PublicForked from QwenLM/FlashQLA
high-performance linear attention kernel library built on TileLang
Python 3
If the problem persists, check the GitHub status page or contact support.


