Skip to content

Latest commit

 

History

History
108 lines (78 loc) · 5.29 KB

File metadata and controls

108 lines (78 loc) · 5.29 KB

HSTU Training example

We have supported both retrieval and ranking model whose backbones are HSTU layers. In this example collection, we allow user to specify the model structures via gin-config file. Supported datasets are listed below. Regarding the gin-config interface, please refer to inline comments .

Parallelism Introduction

To facilitate large embedding tables and scaling-laws of HSTU dense, we have integrate TorchRec that does shard embedding tables and Megatron-LM that enable dense parallelism(e.g Data, Tensor, Sequence, Pipeline, and Context parallelism) in this example. This integration ensures efficient training by coordinating sparse (embedding) and dense (context/data) parallelisms within a single model. parallelism

Environment Setup

Start from dockerfile

We provide dockerfile for users to build environment.

git clone --recursive https://github.com/NVIDIA/recsys-examples.git && cd recsys-examples
docker build -f docker/Dockerfile --platform linux/amd64 -t recsys-examples:latest .

If you want to build image for Grace, you can use

git clone --recursive https://github.com/NVIDIA/recsys-examples.git && cd recsys-examples
docker build -f docker/Dockerfile --platform linux/arm64 -t recsys-examples:latest .

Note: The --recursive flag is required to fetch submodules (e.g. third_party/FBGEMM for HSTU attention kernels). If you already cloned without it, run git submodule update --init --recursive. You can also set your own base image with args --build-arg <BASE_IMAGE>.

Start from source file

Before running examples, build and install libs following the instructions below:

HSTU attention kernels are provided by the fbgemm_gpu_hstu package (import name: hstu), included as a git submodule at third_party/FBGEMM. Install it from source:

git submodule update --init --recursive
cd third_party/FBGEMM/fbgemm_gpu/experimental/hstu && pip install .

On top of those two core libs, Megatron-Core along with other libs are required. You can install them via pypi package:

pip install torchx gin-config torchmetrics==1.0.3 typing-extensions iopath megatron-core==0.12.1

If you fail to install the megatron-core package, usually due to the python version incompatibility, please try to clone and then install the source code.

git clone -b core_v0.12.1 https://github.com/NVIDIA/Megatron-LM.git megatron-lm && \
pip install -e ./megatron-lm

We provide our custom HSTU CUDA operators for enhanced performance. You need to install these operators using the following command:

cd /workspace/recsys-examples/examples/hstu && \
python setup.py install

Dataset Introduction

We have supported several datasets as listed in the following sections:

Dataset Information

MovieLens

refer to MovieLens 1M and MovieLens 20M for details.

KuaiRand

dataset # users seqlen max seqlen min seqlen mean seqlen median # items
kuairand_pure 27285 910 1 1 39 7551
kuairand_1k 1000 49332 10 5038 3379 4369953
kuairand_27k 27285 228000 100 11796 8591 32038725

refer to KuaiRand for details.

Running the examples

Before getting started, please make sure that all pre-requisites are fulfilled. You can refer to Get Started section in the root directory of the repo to set up the environment.

Dataset preprocessing

In order to prepare the dataset for training, you can use our hstu_data_preprocessor.py under the commons folder of the project.

cd <root-to-repo>/examples/commons && 
mkdir -p ./tmp_data && python3 ./hstu_data_preprocessor.py --dataset_name <"ml-1m"|"ml-20m"|"kuairand-pure"|"kuairand-1k"|"kuairand-27k">

Start training

The entrypoint for training are pretrain_gr_retrieval.py or pretrain_gr_ranking.py. We use gin-config to specify the model structure, training arguments, hyper-params etc.

Command to run retrieval task with MovieLens 20m dataset:

# Before running the `pretrain_gr_retrieval.py`, make sure that current working directory is `hstu`
cd <root-to-project>examples/hstu 
PYTHONPATH=${PYTHONPATH}:$(realpath ../) torchrun --nproc_per_node 1 --master_addr localhost --master_port 6000  ./training/pretrain_gr_retrieval.py --gin-config-file ./training/configs/movielen_retrieval.gin

To run ranking task with MovieLens 20m dataset:

# Before running the `pretrain_gr_ranking.py`, make sure that current working directory is `hstu`
cd <root-to-project>examples/hstu 
PYTHONPATH=${PYTHONPATH}:$(realpath ../) torchrun --nproc_per_node 1 --master_addr localhost --master_port 6000  ./training/pretrain_gr_ranking.py --gin-config-file ./training/configs/movielen_ranking.gin