DynamicEmb is a Python package that provides model-parallel dynamic embedding tables and embedding lookup functionalities for TorchREC, specifically targeting the sparse training aspects of recommendation systems. DynamicEmb uses a GPU-optimized scored hash table backend to store key-value (feature-embedding) pairs in the high-bandwidth memory (HBM) of GPUs as well as in host memory.
The lookup kernel algorithms implemented in DynamicEmb primarily leverage portions of the algorithms from the EMBark paper (Embedding Optimization for Training Large-scale Deep Learning Recommendation Systems with EMBark).
- Features
- Pre-requisites
- Installation
- DynamicEmb APIs
- Usage Notes
- Getting Started
- Future Plans
- Acknowledgements
-
Dynamic Embedding Table Support: DynamicEmb supports embedding tables backed by hash tables, allowing for optimal utilization of both GPU memory and host memory within the system. Hash tables can accept any specified
indicestype values, unlike static tables which only support index values. -
Seamless Integration with TorchREC: DynamicEmb inherits the API from TorchREC, ensuring that its usage is largely consistent with TorchREC. Users can easily modify their existing code to run recommendation system models with dynamic embedding tables alongside TorchREC. dynamicemb provides a high-performance hash table to support dynamic embedding and leverages torchrec to implement sharding logic on multiple GPUs. This explains why dynamicemb largely reuses the user interface of torchrec while adding some new configuration options related to dynamic embedding.
-
Embedded in DistributedGR Repository Supporting Generative-Recommenders(GR) Models: Currently, DynamicEmb is integrated into the DistributedGR repository, serving as an embedding backend for GR models.
-
Support for creating dynamic embedding tables within
EmbeddingBagCollectionandEmbeddingCollectionin TorchREC, allowing for embedding storage and lookup, and enabling coexistence with native Torch embedding tables within Torch models. -
Pooling Mode Support: DynamicEmb supports
SUM,MEAN, andNONE(sequence) pooling modes with fused CUDA kernels for both forward and backward passes. Tables with different embedding dimensions (mixed-D) are fully supported in pooling mode. -
Support for optimizer types:
EXACT_SGD,ADAM,EXACT_ADAGRAD,EXACT_ROWWISE_ADAGRAD. -
Support for automatically parallel
dump/loadof embedding weights in dynamic embedding tables.
Currently, dynamicemb is integrated into latest TorchRec main branch, while TorchRec requires FBGEMM_GPU main branch, both of which are not packaged. Temporarily, installing from source code is required. Before installing the 2 libraries, make sure you have PyTorch CUDA version installed (refer to PyTorch documentation).
- FBGEMM_GPU
Please follow below instructions to build fbgemm_gpu from source code. It may take minutes to finish.
# install setup tools
pip install --no-cache setuptools==69.5.1 setuptools-git-versioning scikit-build
git clone --recursive -b main https://github.com/pytorch/FBGEMM.git fbgemm
cd fbgemm/fbgemm_gpu
git checkout 642ccb980d05aa1be00ccd131c5991b0914e2e64
# please specify the proper TORCH_CUDA_ARCH_LIST for your ENV
python setup.py bdist_wheel --package_variant=cuda -DTORCH_CUDA_ARCH_LIST="8.0 9.0"
python setup.py install --package_variant=cuda -DTORCH_CUDA_ARCH_LIST="8.0 9.0"Once above processing is done, please execute python -c 'import fbgemm_gpu' to make sure it's properly installed.
- TorchRec
torchrec >= v1.2.0
Thanks to the torchrec team for their support, torchrec v1.2.0 added support for custom embedding lookup module.
After fbgemm_gpu is installed, you can install TorchRec with below commands.
# torchrec depends on below 2 libs
pip install --no-deps tensordict orjson
git clone --recursive -b main https://github.com/pytorch/torchrec.git torchrec
cd torchrec && git checkout 6aaf1fa72e884642f39c49ef232162fa3772055e
# with --no-deps to prevent from installing dependencies
pip install --no-deps .Once above processing is done, please execute python -c 'import torchrec' to make sure it's properly installed.
To install DynamicEmb, please use the following command:
python setup.py installRegarding how to use the DynamicEmb APIs and their parameters, please refer to the DynamicEmb_APIs.md file in the same folder as this document.
- Only the following optimizer types are supported:
EXACT_SGD,ADAM,EXACT_ADAGRAD,EXACT_ROWWISE_ADAGRAD. This behavior is to maintain consistency with TorchREC. - The sharding method for dynamic embedding tables is always
row-wise sharding, which will be evenly distributed across all GPUs within the TorchREC scope, unlike thetable-wiseand other sharding methods in TorchREC. - The allocated memory for dynamic embedding tables may have slight differences from the specified
num_embeddingsbecause each dynamic embedding table aligns capacity toDEMB_TABLE_ALIGN_SIZE(= 16). This is automatically calculated by the code. - The lookup process for each dynamic embedding table incurs additional overhead from unique or radix sort operations. Therefore, if you request a large number of small dynamic embedding tables for lookup, the performance will be poor. Since the lookup range of dynamic embedding tables is particularly large (using the entire range of
int64_t), it is recommended to create one large embedding table and perform a fused lookup for multiple features. - Although dynamic embedding tables can be trained together with TorchREC tables, they cannot be fused together for embedding lookup. Therefore, it is recommended to select dynamic embedding tables for all model-parallel tables during training.
- DynamicEmb supports training with TorchREC's
EmbeddingBagCollection(pooling mode: SUM/MEAN) andEmbeddingCollection(sequence mode). Both modes use fused CUDA kernels for embedding lookup and gradient reduction. Tables with different embedding dimensions are supported in pooling mode. - (New) DynamicEmb supports Torch exportable embedding tables
InferenceEmbeddingTablefor inference. It is implemented with the hashing mechanism according toScoredHashTable(freezed at export and inference time) from DynamicEmb, and theLinearUVMEmbeddingfrom NVEmbedding, which supports both sequence mode and pooling mode (SUM, MEAN only). It takes inDynamicEmbTableOptionsto initialize, and loads from DynamicEmb dumped embedding files.
DynamicEmb uses a hashtable as the backend. If the embedding table capacity is small and the number of indices in a single feature is large, it is easy for too many indices to be allocated to the same hash table bucket in one lookup, resulting in the inability to insert indices into the hashtable. DynamicEmb resolves this issue by setting the lookup results of indices that cannot be inserted to 0.
Fortunately, in a hashtable with a large capacity, such insertion failures are very rare and almost never occur. This issue is more frequent in hashtables with small capacities, which can affect training accuracy. Therefore, we do not recommend using dynamic embedding tables for very small embedding tables.
To prevent this behavior from affecting training without user awareness, DynamicEmb provides a safe check mode. Users can set whether to enable safe check when configuring DynamicEmbTableOptions. Enabling safe check will add some overhead, but it can provide insights into whether the hash table frequently fails to insert indices. If the number of insertion failures is high and the proportion of affected indices is large, it is recommended to either increase the dynamic embedding capacity or avoid using dynamic embedding tables for small embedding tables.
from dynamicemb import DynamicEmbTableOptions, DynamicEmbCheckMode
# Configure the DynamicEmbTableOptions with safe check mode enabled
table_options = DynamicEmbTableOptions(
safe_check_mode=DynamicEmbCheckMode.WARNING
)
# Use the table_options in your dynamic embedding setup
# ...We provide benchmark and unit test code to demonstrate how to use DynamicEmb. Please visit the benchmark and test folders. Below is a pseudocode example demonstrating how to convert TorchREC code to use DynamicEmb.
To get started with DynamicEmb, we highly recommend checking out the example.py. It walks you through the entire process of modifying your code and setting up a training script with model parallelism. You can quickly experiment with DynamicEmb and see its benefits in a practical setting.
- Support the latest version of TorchREC and continuously follow TorchREC's version updates.
- Support the separation of backward and optimizer update (required by certain large language model frameworks like Megatron), to better support large-scale GR training.
- Add more shard types for dynamic embedding tables, including
table-wise,table-row-wiseandcolumn-wise.
We would like to thank the Meta team and specially Huanyu He for their support in TorchRec.
We also acknowledge the HierarchicalKV project, which inspired the scored hash table design used in DynamicEmb.