v26.03

Latest

Latest

shijieliu released this 14 Apr 10:05

· 3 commits to main since this release

c7b9ea2

What's Changed

Features & Enhancements

Add Torch export for HSTU model by @jensenhwa in #327
[Feature] dynamicemb table fusion and expansion by @jiashuy in #343
feat(benchmark): HSTU E2E training benchmark suite with progressive optimizations by @JacoCheung in #340
Add HSTU inference benchmark results on B200 by @geoffreyQiu in #338
Relax alignment requirements(remove pow of 2) in dynamicemb by @jiashuy in #312
perf: avoid D2H sync in _Split2DJaggedFunction by precomputing split lengths by @JacoCheung in #318
refactor: migrate to fbgemm_gpu_hstu, remove legacy HSTU compat layer by @JacoCheung in #321
Optimize balancer and setup debug logger. by @JacoCheung in #308
fix: align DynamicEmb capacity to bucket_capacity instead of DEMB_TABLE_ALIGN_SIZE by @JacoCheung in #329

Bug Fixes

fix missing import by @gameofdimension in #320
refactor: remove redundant apply_optimizer_in_backward in sharding.py by @ShaobinChen-AH in #330
error handling for empty kv list by @gameofdimension in #331
Fix docker, cmake and imports after torch export support by @geoffreyQiu in #358
Make table_ptrs_dev persistent by @jiashuy in #356
Create DynamicEmbStorage when zero local hbm; reset _prefetch_outstanding_keys only in reset_cache_states by @jiashuy in #354
Fix empty batch hang fundamentally by @jiashuy in #349
[bugfix] fix hang issue when fed empty batch by @gameofdimension in #342
Fix optimizer states dim(ckpt) of rowwise adagrad by @jiashuy in #305
Refactor test for alignment; add get_sharded_table_capacity by @jiashuy in #348

Misc

fix(pipeline): drain eval pipeline naturally to prevent batch leak by @JacoCheung in #314
Fix NVE dependency by @geoffreyQiu in #323
refactor: move HSTU build to devel stage by @shijieliu in #325
Upgrade to Torch 2.11 with Cuda 13.1 by @geoffreyQiu in #347
Update HSTU inference README file by @geoffreyQiu in #360

New Contributors

@jensenhwa made their first contribution in #327

Full Changelog: v26.01...v26.03

Contributors

geoffreyQiu, shijieliu, and 5 other contributors

Assets 4