We have supported both retrieval and ranking model whose backbones are HSTU layers. In this example collection, we allow user to specify the model structures via gin-config file. Supported datasets are listed below. Regarding the gin-config interface, please refer to inline comments .
To facilitate large embedding tables and scaling-laws of HSTU dense, we have integrate TorchRec that does shard embedding tables and Megatron-LM that enable dense parallelism(e.g Data, Tensor, Sequence, Pipeline, and Context parallelism) in this example.
This integration ensures efficient training by coordinating sparse (embedding) and dense (context/data) parallelisms within a single model.

We provide dockerfile for users to build environment.
git clone --recursive https://github.com/NVIDIA/recsys-examples.git && cd recsys-examples
docker build -f docker/Dockerfile --platform linux/amd64 -t recsys-examples:latest .If you want to build image for Grace, you can use
git clone --recursive https://github.com/NVIDIA/recsys-examples.git && cd recsys-examples
docker build -f docker/Dockerfile --platform linux/arm64 -t recsys-examples:latest .Note: The
--recursiveflag is required to fetch submodules (e.g.third_party/FBGEMMfor HSTU attention kernels). If you already cloned without it, rungit submodule update --init --recursive. You can also set your own base image with args--build-arg <BASE_IMAGE>.
Before running examples, build and install libs following the instructions below:
HSTU attention kernels are provided by the fbgemm_gpu_hstu package (import name: hstu),
included as a git submodule at third_party/FBGEMM. Install it from source:
git submodule update --init --recursive
cd third_party/FBGEMM/fbgemm_gpu/experimental/hstu && pip install .On top of those two core libs, Megatron-Core along with other libs are required. You can install them via pypi package:
pip install torchx gin-config torchmetrics==1.0.3 typing-extensions iopath megatron-core==0.12.1If you fail to install the megatron-core package, usually due to the python version incompatibility, please try to clone and then install the source code.
git clone -b core_v0.12.1 https://github.com/NVIDIA/Megatron-LM.git megatron-lm && \
pip install -e ./megatron-lmWe provide our custom HSTU CUDA operators for enhanced performance. You need to install these operators using the following command:
cd /workspace/recsys-examples/examples/hstu && \
python setup.py installWe have supported several datasets as listed in the following sections:
refer to MovieLens 1M and MovieLens 20M for details.
| dataset | # users | seqlen max | seqlen min | seqlen mean | seqlen median | # items |
|---|---|---|---|---|---|---|
| kuairand_pure | 27285 | 910 | 1 | 1 | 39 | 7551 |
| kuairand_1k | 1000 | 49332 | 10 | 5038 | 3379 | 4369953 |
| kuairand_27k | 27285 | 228000 | 100 | 11796 | 8591 | 32038725 |
refer to KuaiRand for details.
Before getting started, please make sure that all pre-requisites are fulfilled. You can refer to Get Started section in the root directory of the repo to set up the environment.
In order to prepare the dataset for training, you can use our hstu_data_preprocessor.py under the commons folder of the project.
cd <root-to-repo>/examples/commons &&
mkdir -p ./tmp_data && python3 ./hstu_data_preprocessor.py --dataset_name <"ml-1m"|"ml-20m"|"kuairand-pure"|"kuairand-1k"|"kuairand-27k">
The entrypoint for training are pretrain_gr_retrieval.py or pretrain_gr_ranking.py. We use gin-config to specify the model structure, training arguments, hyper-params etc.
Command to run retrieval task with MovieLens 20m dataset:
# Before running the `pretrain_gr_retrieval.py`, make sure that current working directory is `hstu`
cd <root-to-project>examples/hstu
PYTHONPATH=${PYTHONPATH}:$(realpath ../) torchrun --nproc_per_node 1 --master_addr localhost --master_port 6000 ./training/pretrain_gr_retrieval.py --gin-config-file ./training/configs/movielen_retrieval.ginTo run ranking task with MovieLens 20m dataset:
# Before running the `pretrain_gr_ranking.py`, make sure that current working directory is `hstu`
cd <root-to-project>examples/hstu
PYTHONPATH=${PYTHONPATH}:$(realpath ../) torchrun --nproc_per_node 1 --master_addr localhost --master_port 6000 ./training/pretrain_gr_ranking.py --gin-config-file ./training/configs/movielen_ranking.gin