HSTU Training example

We have supported both retrieval and ranking model whose backbones are HSTU layers. In this example collection, we allow user to specify the model structures via gin-config file. Supported datasets are listed below. Regarding the gin-config interface, please refer to inline comments .

Parallelism Introduction

To facilitate large embedding tables and scaling-laws of HSTU dense, we have integrate TorchRec that does shard embedding tables and Megatron-LM that enable dense parallelism(e.g Data, Tensor, Sequence, Pipeline, and Context parallelism) in this example. This integration ensures efficient training by coordinating sparse (embedding) and dense (context/data) parallelisms within a single model.

Environment Setup

Start from dockerfile

We provide dockerfile for users to build environment.

git clone --recursive https://github.com/NVIDIA/recsys-examples.git && cd recsys-examples
docker build -f docker/Dockerfile --platform linux/amd64 -t recsys-examples:latest .

If you want to build image for Grace, you can use

git clone --recursive https://github.com/NVIDIA/recsys-examples.git && cd recsys-examples
docker build -f docker/Dockerfile --platform linux/arm64 -t recsys-examples:latest .

Note: The --recursive flag is required to fetch submodules (e.g. third_party/FBGEMM for HSTU attention kernels). If you already cloned without it, run git submodule update --init --recursive. You can also set your own base image with args --build-arg <BASE_IMAGE>.

Start from source file

Before running examples, build and install libs following the instructions below:

Dynamic Embeddings documentation

HSTU attention kernels are provided by the fbgemm_gpu_hstu package (import name: hstu), included as a git submodule at third_party/FBGEMM. Install it from source:

git submodule update --init --recursive
cd third_party/FBGEMM/fbgemm_gpu/experimental/hstu && pip install .

On top of those two core libs, Megatron-Core along with other libs are required. You can install them via pypi package:

pip install torchx gin-config torchmetrics==1.0.3 typing-extensions iopath megatron-core==0.12.1

If you fail to install the megatron-core package, usually due to the python version incompatibility, please try to clone and then install the source code.

git clone -b core_v0.12.1 https://github.com/NVIDIA/Megatron-LM.git megatron-lm && \
pip install -e ./megatron-lm

We provide our custom HSTU CUDA operators for enhanced performance. You need to install these operators using the following command:

cd /workspace/recsys-examples/examples/hstu && \
python setup.py install

Dataset Introduction

We have supported several datasets as listed in the following sections:

Dataset Information

MovieLens

refer to MovieLens 1M and MovieLens 20M for details.

KuaiRand

dataset	# users	seqlen max	seqlen min	seqlen mean	seqlen median	# items
kuairand_pure	27285	910	1	1	39	7551
kuairand_1k	1000	49332	10	5038	3379	4369953
kuairand_27k	27285	228000	100	11796	8591	32038725

refer to KuaiRand for details.

Running the examples

Before getting started, please make sure that all pre-requisites are fulfilled. You can refer to Get Started section in the root directory of the repo to set up the environment.

Dataset preprocessing

In order to prepare the dataset for training, you can use our hstu_data_preprocessor.py under the commons folder of the project.

cd <root-to-repo>/examples/commons && 
mkdir -p ./tmp_data && python3 ./hstu_data_preprocessor.py --dataset_name <"ml-1m"|"ml-20m"|"kuairand-pure"|"kuairand-1k"|"kuairand-27k">

Start training

The entrypoint for training are pretrain_gr_retrieval.py or pretrain_gr_ranking.py. We use gin-config to specify the model structure, training arguments, hyper-params etc.

Command to run retrieval task with MovieLens 20m dataset:

# Before running the `pretrain_gr_retrieval.py`, make sure that current working directory is `hstu`
cd <root-to-project>examples/hstu 
PYTHONPATH=${PYTHONPATH}:$(realpath ../) torchrun --nproc_per_node 1 --master_addr localhost --master_port 6000  ./training/pretrain_gr_retrieval.py --gin-config-file ./training/configs/movielen_retrieval.gin

To run ranking task with MovieLens 20m dataset:

# Before running the `pretrain_gr_ranking.py`, make sure that current working directory is `hstu`
cd <root-to-project>examples/hstu 
PYTHONPATH=${PYTHONPATH}:$(realpath ../) torchrun --nproc_per_node 1 --master_addr localhost --master_port 6000  ./training/pretrain_gr_ranking.py --gin-config-file ./training/configs/movielen_ranking.gin

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HSTU Training example

Parallelism Introduction

Environment Setup

Start from dockerfile

Start from source file

Dataset Introduction

Dataset Information

MovieLens

KuaiRand

Running the examples

Dataset preprocessing

Start training

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

HSTU Training example

Parallelism Introduction

Environment Setup

Start from dockerfile

Start from source file

Dataset Introduction

Dataset Information

MovieLens

KuaiRand

Running the examples

Dataset preprocessing

Start training