Skip to content

Pull requests: NVIDIA/TransformerEngine

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Streamline group Hadamard ComputeKernel loads
#2810 opened Mar 29, 2026 by cael-ling Loading…
5 of 13 tasks
Single __syncthreads per stage in GroupHadamardAmaxTmaKernel
#2809 opened Mar 29, 2026 by cael-ling Loading…
8 of 13 tasks
Precomputed swizzle_idx into group Hadamard ComputeKernel
#2808 opened Mar 29, 2026 by cael-ling Loading…
8 of 13 tasks
[PyTorch][Flash Attn] Add fallback import for FA3
#2806 opened Mar 26, 2026 by eattia-nvidia Loading…
7 of 13 tasks
[PyT] Fix FSDP2 memory leaks for FP8 weight workspaces and transpose caches
#2805 opened Mar 26, 2026 by pstjohn Loading…
3 tasks done
2
3
Fix empty CUDA_ARCHITECTURES when SM120 is the only arch
#2804 opened Mar 26, 2026 by sudhakarsingh27 Loading…
13 tasks
[PyT][Test] Add xfailing FSDP2 memory leak detection tests
#2803 opened Mar 25, 2026 by pstjohn Loading…
4 tasks done
adds NVFP4 Fused Adam support
#2797 opened Mar 24, 2026 by jomitchellnv Loading…
2 of 13 tasks
[JAX] Add warning if using BSHD and max_segments_per_seq > 1
#2796 opened Mar 24, 2026 by jberchtold-nvidia Loading…
8 of 13 tasks
[JAX] TE GMM v2 enforcement Env Var
#2794 opened Mar 23, 2026 by jberchtold-nvidia Draft
13 tasks
Avoid CPU offload wait_event for validation
#2793 opened Mar 23, 2026 by vasunvidia Loading…
13 tasks
Optimize fp8 block scaling Allgather for FSDP2
#2789 opened Mar 23, 2026 by vthumbe1503 Loading…
1 of 13 tasks
[Common][JAX] Add CUB TopK MaxPairs interface
#2784 opened Mar 20, 2026 by huanghua1994 Loading…
8 of 13 tasks
Optimize naive top-k masking in fused router
#2783 opened Mar 19, 2026 by yosh20004 Loading…
3 of 13 tasks
add mark_not_offload() interface for cpu_offload_v1
#2770 opened Mar 17, 2026 by lhb8125 Loading…
13 tasks
GEMM + Swiglu fused Grouped MLP for MXFP8 2.14.0 MoE
#2769 opened Mar 17, 2026 by ksivaman Loading…
13 tasks
ProTip! Filter pull requests by the default branch with base:main.