Skip to content

v26.03

Latest

Choose a tag to compare

@shijieliu shijieliu released this 14 Apr 10:05
· 3 commits to main since this release
c7b9ea2

What's Changed

Features & Enhancements

  • Add Torch export for HSTU model by @jensenhwa in #327
  • [Feature] dynamicemb table fusion and expansion by @jiashuy in #343
  • feat(benchmark): HSTU E2E training benchmark suite with progressive optimizations by @JacoCheung in #340
  • Add HSTU inference benchmark results on B200 by @geoffreyQiu in #338
  • Relax alignment requirements(remove pow of 2) in dynamicemb by @jiashuy in #312
  • perf: avoid D2H sync in _Split2DJaggedFunction by precomputing split lengths by @JacoCheung in #318
  • refactor: migrate to fbgemm_gpu_hstu, remove legacy HSTU compat layer by @JacoCheung in #321
  • Optimize balancer and setup debug logger. by @JacoCheung in #308
  • fix: align DynamicEmb capacity to bucket_capacity instead of DEMB_TABLE_ALIGN_SIZE by @JacoCheung in #329

Bug Fixes

Misc

New Contributors

Full Changelog: v26.01...v26.03