-
Notifications
You must be signed in to change notification settings - Fork 4.8k
Pull requests: deepspeedai/DeepSpeed
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Normalize ZeRO-3 DeepCompile grad dtype before reduction
#8038
opened May 30, 2026 by
tohtana
Collaborator
Loading…
Enable bf16 check_grad_overflow by default (matching fp16)
#8035
opened May 29, 2026 by
yongzhe-wang
Loading…
2 tasks done
Stop obsolete CI jobs on workflow cancellation
#8034
opened May 28, 2026 by
tohtana
Collaborator
Loading…
Fix DeepCompile ZeRO-3 release parameter lifetime
#8032
opened May 28, 2026 by
tohtana
Collaborator
Loading…
[Draft] Add ZeRO-3 elastic checkpoint save/load support
#8031
opened May 28, 2026 by
nathon-lee
Contributor
•
Draft
[Draft] Add On-Policy Distillation (OPSD) Trainer in DeepSpeed
#8027
opened May 26, 2026 by
PKUWZP
Collaborator
Loading…
3 of 5 tasks
Refactor/torch autocast encapsulate global state
#7946
opened Apr 2, 2026 by
nathon-lee
Contributor
Loading…
Fix ZeRO-3 optimizer initialization validation (#7844)
#7929
opened Mar 28, 2026 by
amadhan882
Loading…
doc: Remove suggestion to build extensions in parallel
#7899
opened Mar 12, 2026 by
Flamefire
Contributor
Loading…
Fix Stage 0 + Ulysses crash: make bwc_tensor_model_parallel_rank() resilient to MP API absence
#7888
opened Mar 6, 2026 by
nathon-lee
Contributor
Loading…
fix(zero): Ensure full gradient reduction for Muon optimizer with reduce_scatter
#7878
opened Feb 27, 2026 by
nathon-lee
Contributor
Loading…
fix: correct DistributedAttention output shape and pad uneven sequence lengths (#7842)
#7868
opened Feb 22, 2026 by
harshang03
•
Draft
fix: keep fp32-pinned parameters out of the bf16 cast path in ZeRO-3 (#7747)
#7867
opened Feb 22, 2026 by
harshang03
•
Draft
Revert "fix: remove premature MPI environment variable check in OpenMPIRunner"
#7864
opened Feb 21, 2026 by
mikloorbi-sys
•
Draft
Fix global .cuh ignore and enforce tracked CUDA headers
#7858
opened Feb 18, 2026 by
harshang03
•
Draft
Fix ZeRO legacy grad-hook crash when next_functions is missing
#7857
opened Feb 17, 2026 by
harshang03
•
Draft
Reject non-finite fp16 loss_scale across config and ZeRO paths
#7856
opened Feb 17, 2026 by
harshang03
•
Draft
Fix zero/division safety gaps in utility and inference paths
#7855
opened Feb 17, 2026 by
harshang03
•
Draft
Fix count_used_parameters_in_backward crash on PyTorch < 2.3 (#7756)
#7849
opened Feb 12, 2026 by
harshang03
•
Draft
[BUG] Fix: Fix gradient norm calculation and dynamic shape blocking in PP+ZeRO1 collective communication
#7847
opened Feb 12, 2026 by
Thinksky5124
Loading…
Previous Next
ProTip!
Updated in the last three days: updated:>2026-05-27.