DynamicEmb

DynamicEmb is a Python package that provides model-parallel dynamic embedding tables and embedding lookup functionalities for TorchREC, specifically targeting the sparse training aspects of recommendation systems. DynamicEmb uses a GPU-optimized scored hash table backend to store key-value (feature-embedding) pairs in the high-bandwidth memory (HBM) of GPUs as well as in host memory.

The lookup kernel algorithms implemented in DynamicEmb primarily leverage portions of the algorithms from the EMBark paper (Embedding Optimization for Training Large-scale Deep Learning Recommendation Systems with EMBark).

Features
Pre-requisites
- Version Compatibility
Installation
DynamicEmb APIs
Usage Notes
- DynamicEmb Insertion Behavior Checking Modes
Getting Started
Future Plans
Acknowledgements

Features

Dynamic Embedding Table Support: DynamicEmb supports embedding tables backed by hash tables, allowing for optimal utilization of both GPU memory and host memory within the system. Hash tables can accept any specified indices type values, unlike static tables which only support index values.
Seamless Integration with TorchREC: DynamicEmb inherits the API from TorchREC, ensuring that its usage is largely consistent with TorchREC. Users can easily modify their existing code to run recommendation system models with dynamic embedding tables alongside TorchREC. dynamicemb provides a high-performance hash table to support dynamic embedding and leverages torchrec to implement sharding logic on multiple GPUs. This explains why dynamicemb largely reuses the user interface of torchrec while adding some new configuration options related to dynamic embedding.
Embedded in DistributedGR Repository Supporting Generative-Recommenders(GR) Models: Currently, DynamicEmb is integrated into the DistributedGR repository, serving as an embedding backend for GR models.
Support for creating dynamic embedding tables within EmbeddingBagCollection and EmbeddingCollection in TorchREC, allowing for embedding storage and lookup, and enabling coexistence with native Torch embedding tables within Torch models.
Pooling Mode Support: DynamicEmb supports SUM, MEAN, and NONE (sequence) pooling modes with fused CUDA kernels for both forward and backward passes. Tables with different embedding dimensions (mixed-D) are fully supported in pooling mode.
Support for optimizer types: EXACT_SGD,ADAM,EXACT_ADAGRAD,EXACT_ROWWISE_ADAGRAD.
Support for automatically parallel dump/load of embedding weights in dynamic embedding tables.

Pre-requisites

Currently, dynamicemb is integrated into latest TorchRec main branch, while TorchRec requires FBGEMM_GPU main branch, both of which are not packaged. Temporarily, installing from source code is required. Before installing the 2 libraries, make sure you have PyTorch CUDA version installed (refer to PyTorch documentation).

FBGEMM_GPU

Please follow below instructions to build fbgemm_gpu from source code. It may take minutes to finish.

# install setup tools
pip install --no-cache setuptools==69.5.1 setuptools-git-versioning scikit-build
git clone --recursive -b main https://github.com/pytorch/FBGEMM.git fbgemm
cd fbgemm/fbgemm_gpu
git checkout 642ccb980d05aa1be00ccd131c5991b0914e2e64
# please specify the proper TORCH_CUDA_ARCH_LIST for your ENV
python setup.py bdist_wheel --package_variant=cuda -DTORCH_CUDA_ARCH_LIST="8.0 9.0"
python setup.py install --package_variant=cuda -DTORCH_CUDA_ARCH_LIST="8.0 9.0"

Once above processing is done, please execute python -c 'import fbgemm_gpu' to make sure it's properly installed.

TorchRec

torchrec >= v1.2.0

Thanks to the torchrec team for their support, torchrec v1.2.0 added support for custom embedding lookup module.

After fbgemm_gpu is installed, you can install TorchRec with below commands.

# torchrec depends on below 2 libs
pip install --no-deps tensordict orjson
git clone --recursive -b main https://github.com/pytorch/torchrec.git torchrec
cd torchrec && git checkout 6aaf1fa72e884642f39c49ef232162fa3772055e
# with --no-deps to prevent from installing dependencies
pip install --no-deps .

Once above processing is done, please execute python -c 'import torchrec' to make sure it's properly installed.

Installation

To install DynamicEmb, please use the following command:

python setup.py install

DynamicEmb APIs

Regarding how to use the DynamicEmb APIs and their parameters, please refer to the DynamicEmb_APIs.md file in the same folder as this document.

Usage Notes

Only the following optimizer types are supported: EXACT_SGD, ADAM, EXACT_ADAGRAD,EXACT_ROWWISE_ADAGRAD. This behavior is to maintain consistency with TorchREC.
The sharding method for dynamic embedding tables is always row-wise sharding, which will be evenly distributed across all GPUs within the TorchREC scope, unlike the table-wise and other sharding methods in TorchREC.
The allocated memory for dynamic embedding tables may have slight differences from the specified num_embeddings because each dynamic embedding table aligns capacity to DEMB_TABLE_ALIGN_SIZE (= 16). This is automatically calculated by the code.
The lookup process for each dynamic embedding table incurs additional overhead from unique or radix sort operations. Therefore, if you request a large number of small dynamic embedding tables for lookup, the performance will be poor. Since the lookup range of dynamic embedding tables is particularly large (using the entire range of int64_t), it is recommended to create one large embedding table and perform a fused lookup for multiple features.
Although dynamic embedding tables can be trained together with TorchREC tables, they cannot be fused together for embedding lookup. Therefore, it is recommended to select dynamic embedding tables for all model-parallel tables during training.
DynamicEmb supports training with TorchREC's EmbeddingBagCollection (pooling mode: SUM/MEAN) and EmbeddingCollection (sequence mode). Both modes use fused CUDA kernels for embedding lookup and gradient reduction. Tables with different embedding dimensions are supported in pooling mode.
(New) DynamicEmb supports Torch exportable embedding tables InferenceEmbeddingTable for inference. It is implemented with the hashing mechanism according to ScoredHashTable (freezed at export and inference time) from DynamicEmb, and the LinearUVMEmbedding from NVEmbedding, which supports both sequence mode and pooling mode (SUM, MEAN only). It takes in DynamicEmbTableOptions to initialize, and loads from DynamicEmb dumped embedding files.

DynamicEmb Insertion Behavior Checking Modes

DynamicEmb uses a hashtable as the backend. If the embedding table capacity is small and the number of indices in a single feature is large, it is easy for too many indices to be allocated to the same hash table bucket in one lookup, resulting in the inability to insert indices into the hashtable. DynamicEmb resolves this issue by setting the lookup results of indices that cannot be inserted to 0.

Fortunately, in a hashtable with a large capacity, such insertion failures are very rare and almost never occur. This issue is more frequent in hashtables with small capacities, which can affect training accuracy. Therefore, we do not recommend using dynamic embedding tables for very small embedding tables.

To prevent this behavior from affecting training without user awareness, DynamicEmb provides a safe check mode. Users can set whether to enable safe check when configuring DynamicEmbTableOptions. Enabling safe check will add some overhead, but it can provide insights into whether the hash table frequently fails to insert indices. If the number of insertion failures is high and the proportion of affected indices is large, it is recommended to either increase the dynamic embedding capacity or avoid using dynamic embedding tables for small embedding tables.

Example

from dynamicemb import DynamicEmbTableOptions, DynamicEmbCheckMode

# Configure the DynamicEmbTableOptions with safe check mode enabled
table_options = DynamicEmbTableOptions(
    safe_check_mode=DynamicEmbCheckMode.WARNING
)

# Use the table_options in your dynamic embedding setup
# ...

Getting Started

We provide benchmark and unit test code to demonstrate how to use DynamicEmb. Please visit the benchmark and test folders. Below is a pseudocode example demonstrating how to convert TorchREC code to use DynamicEmb.

To get started with DynamicEmb, we highly recommend checking out the example.py. It walks you through the entire process of modifying your code and setting up a training script with model parallelism. You can quickly experiment with DynamicEmb and see its benefits in a practical setting.

Future Plans

Support the latest version of TorchREC and continuously follow TorchREC's version updates.
Support the separation of backward and optimizer update (required by certain large language model frameworks like Megatron), to better support large-scale GR training.
Add more shard types for dynamic embedding tables, including table-wise, table-row-wise and column-wise.

Acknowledgements

We would like to thank the Meta team and specially Huanyu He for their support in TorchRec.

We also acknowledge the HierarchicalKV project, which inspired the scored hash table design used in DynamicEmb.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DynamicEmb

Table of Contents

Features

Pre-requisites

Installation

DynamicEmb APIs

Usage Notes

DynamicEmb Insertion Behavior Checking Modes

Example

Getting Started

Future Plans

Acknowledgements

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

DynamicEmb

Table of Contents

Features

Pre-requisites

Installation

DynamicEmb APIs

Usage Notes

DynamicEmb Insertion Behavior Checking Modes

Example

Getting Started

Future Plans

Acknowledgements