tensor-parallel

Here are 2 public repositories matching this topic...

theogravity / dual-rtx-6000-blackwell-Gemma-4-31B-IT-NVFP4

Optimized vLLM setup for Gemma 4 31B NVFP4 with MTP on dual RTX PRO 6000 Blackwell using vllm and docker: native FP4 Tensor Cores, Multi-Token Prediction (96.5% acceptance rate), and prefix caching. Includes benchmark results and replication scripts.

docker amd cuda gemma blackwell vllm llm-inference am5 speculative-decoding fp4 prefix-caching multi-token-prediction nvfp4 rtx-6000 gemma4 tensor-parallel

Updated May 10, 2026
Shell

idonati / spark-vllm-docker-festr2

Star

Patches + recipe to deploy festr2/MiMo-V2.5-Pro-NVFP4-MXFP8-attn-TP8 on 8-node DGX Spark sm_121 (Ray + vLLM, TP=8). Fixes the fused-qkv loader bug that mis-slotted Q values as K/V on 7 of 8 ranks.

moe ray quantization mimo huggingface vllm gb10 nvfp4 dgx-spark mxfp8 sm121 tensor-parallel

Updated May 19, 2026
Python

Improve this page

Add a description, image, and links to the tensor-parallel topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the tensor-parallel topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly