Pinned Loading
-
dvmazur/mixtral-offloading
dvmazur/mixtral-offloading PublicRun Mixtral-8x7B models in Colab or consumer desktops
-
mini-flash-attention
mini-flash-attention PublicMinimal FlashAttention in CUDA C++/CuTe: readable WMMA/CuTe kernels, no NxN workspace, up to 4.5x faster than naive PyTorch
-
dropbox/hqq
dropbox/hqq PublicOfficial implementation of Half-Quadratic Quantization (HQQ)
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.



