-
-
Notifications
You must be signed in to change notification settings - Fork 15.9k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Bugfix] Fix load balancer waiting count
bug
Something isn't working
documentation
Improvements or additions to documentation
nvidia
performance
Performance-related issues
v1
#40634
opened Apr 22, 2026 by
Himan-D
Loading…
[Feature] Triton INT4 / INT2 per-token-head KV cache quantization
documentation
Improvements or additions to documentation
v1
#40633
opened Apr 22, 2026 by
JartX
Contributor
Loading…
[Refactor] Unify 2D/3D kernels in triton_unified_attention
v1
#40631
opened Apr 22, 2026 by
JartX
Contributor
Loading…
[Bugfix][CI] Fix wrong residual shape in TestFusedAddRMSNorm.example_inputs that causes flaky test
bug
Something isn't working
#40629
opened Apr 22, 2026 by
zhangj1an
Loading…
3 of 4 tasks
[Bugfix] Include inductor and functorch configs in compilation cache key
bug
Something isn't working
#40627
opened Apr 22, 2026 by
zou3519
Collaborator
Loading…
Pr 39921
documentation
Improvements or additions to documentation
needs-rebase
nvidia
performance
Performance-related issues
#40626
opened Apr 22, 2026 by
Himan-D
Loading…
4 tasks
[kv_offload] Decouple store policy and request lifecycle from the scheduler
kv-connector
v1
#40625
opened Apr 22, 2026 by
hickeyma
Contributor
Loading…
2 tasks done
[CI] Split disaggregated tests into own test-area
ci/build
ready
ONLY add when PR is ready to merge/full CI is needed
#40623
opened Apr 22, 2026 by
NickLucche
Collaborator
Loading…
Fix FireRedASR2 hallucination on non-speech audio
#40619
opened Apr 22, 2026 by
Virtuoso461
Loading…
4 tasks
Revert "[MoE Refactor] Add more MoE layer tests" (#39349)
#40618
opened Apr 22, 2026 by
vllm-agent
•
Draft
[Bugfix] Quiet weight prefetch logs when executor is shutting down
bug
Something isn't working
#40615
opened Apr 22, 2026 by
zxuhan
Loading…
[Attention] Tune TRITON_MLA for SM120 + FP8 decode
v1
#40614
opened Apr 22, 2026 by
voipmonitor
Contributor
•
Draft
[SpecDecode] Add seq-length gate for speculative decode
speculative-decoding
v1
#40613
opened Apr 22, 2026 by
voipmonitor
Contributor
•
Draft
[SpecDecode] Add local argmax helper for Llama Eagle3 drafts
llama
Related to Llama models
speculative-decoding
#40612
opened Apr 22, 2026 by
voipmonitor
Contributor
•
Draft
[SpecDecode] Allow draft-specific attention backend and KV dtype
nvidia
speculative-decoding
v1
#40611
opened Apr 22, 2026 by
voipmonitor
Contributor
•
Draft
[SpecDecode] Fix async proposer synchronization
v1
#40610
opened Apr 22, 2026 by
voipmonitor
Contributor
•
Draft
[Core] Enable FP8 KV cache with DCP for MLA
#40609
opened Apr 22, 2026 by
voipmonitor
Contributor
•
Draft
[Bugfix] Gemma-4: Add bnb QuantState alias hook on k_eq_v to load Gemma-4 BNB 4-bit weights
bug
Something isn't working
#40606
opened Apr 22, 2026 by
zhangj1an
Loading…
4 tasks
docs: remove outdated LD_PRELOAD instructions [CPU-Backend]
cpu
Related to CPU backends
documentation
Improvements or additions to documentation
#40603
opened Apr 22, 2026 by
specapoorv
Loading…
Refactor INC quantization into package with INCScheme orchestrator
#40601
opened Apr 22, 2026 by
yiliu30
Contributor
Loading…
[WIP] Support ViT full CUDA graph for Kimi K2.5
nvidia
v1
#40600
opened Apr 22, 2026 by
gty111
Contributor
Loading…
4 tasks
[MM][Gemma4] Support Related to multi-modality (#4194)
max_soft_tokens="auto" to avoid wasting the vision budget on small images
multi-modality
#40599
opened Apr 22, 2026 by
developer0hye
Loading…
fix: correct typo 'Hyrbid' to 'Hybrid' in test-amd.yaml
ci/build
rocm
Related to AMD ROCm
#40598
opened Apr 22, 2026 by
TheodorePTP
Loading…
Previous Next
ProTip!
What’s not been updated in a month: updated:<2026-03-22.