Skip to content
View weicj's full-sized avatar

Block or report weicj

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Popular repositories Loading

  1. vLLM-2080Ti-Definitive vLLM-2080Ti-Definitive Public

    The definitive vLLM runtime for dual RTX 2080 Ti 22GB + NVLink, delivering 27B/31B local inference with 100+ tok/s single-request decode and native 262K context.

    Python 29 5

  2. 2080Ti-LLM-Toolbox 2080Ti-LLM-Toolbox Public

    Single-request LLM serving recipes, patches, and benchmarks for modified RTX 2080 Ti 22GB / SM75 systems

    Shell 15 2

  3. rdna1-gfx101x-rocm-llama-fix rdna1-gfx101x-rocm-llama-fix Public

    make RDNA1 / Navi1x / gfx101x GPUs (5700XT, W5500 etc.) run modern large language models on ROCm 6 and ROCm 7 through llama.cpp, compatible with modern LLM like Gemma 4 and Qwen3.5, even on a host …

    Shell 3

  4. ai-proxy-hub ai-proxy-hub Public

    AI Proxy Hub is a cross-platform local gateway for AI clients and upstream APIs. It unifies multiple upstream endpoints behind one local control plane, adds protocol-aware routing and failover, and…

    Python 3

  5. FlashQLA-SM70-SM75 FlashQLA-SM70-SM75 Public

    Forked from QwenLM/FlashQLA

    high-performance linear attention kernel library built on TileLang

    Python 3

  6. Ragent6 Ragent6 Public

    A benchmark for LLM variant quant models at agentic coding situation

    Python 2