Add INT8, INT16, and UINT8 type support for CUDA TopK operator#27862
Add INT8, INT16, and UINT8 type support for CUDA TopK operator#27862tianleiwu merged 3 commits intomicrosoft:mainfrom
Conversation
Add type constraints and dispatch cases for int8_t, int16_t, and uint8_t in the CUDA TopK kernel (opset 1-9, 10, 11-23, 24), along with three new .cu template instantiation files. This is the CUDA counterpart to the CPU support added in microsoft#27860. Fixes microsoft#27859
|
Please also update |
|
/azp run Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Remove these types from opset 1–10 as they are only supported starting from opset 11.
|
Commenter does not have sufficient privileges for PR 27862 in repo microsoft/onnxruntime |
|
@tianleiwu Hi, I've addressed the review feedback in the latest commit (restricting int8/int16/uint8 TopK CUDA kernels to opset 11+). Could you please trigger the CI pipeline so I can download the updated OperatorKernels.md from the artifact? Thanks! |
|
/azp run Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@tianleiwu Done — updated docs/OperatorKernels.md with the artifact from the Windows GPU Doc Gen CI Pipeline. Thanks for the review! |
|
@tianleiwu Friendly ping — I've addressed all the feedback and updated OperatorKernels.md. The CPU counterpart (#27860) has already been merged. Is there anything else needed for this to move forward? |
|
/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline, Web CI Pipeline, ONNX Runtime WebGPU Builds |
|
Azure Pipelines successfully started running 4 pipeline(s). |
|
@tianleiwu Hi, it looks like the CI failure was due to a self-hosted runner losing connection — not related to code changes. Could you re-trigger the pipeline when you get a chance? Thanks! |
tianleiwu
left a comment
There was a problem hiding this comment.
Thanks for tightening the opset gating after the earlier review. One remaining blocker is test coverage: this PR enables new CUDA TopK runtime paths, but currently only the compile is covered. Please add CUDA-covered TopK cases for the newly registered int8/int16/uint8 types before merging.
|
@tianleiwu Hi! Thanks for the review and approval! |
Add type constraints and dispatch cases for int8_t, int16_t, and uint8_t in the CUDA TopK kernel (opset 1-9, 10, 11-23, 24), along with three new .cu template instantiation files. This is the CUDA counterpart to the CPU support added in #27860.
Fixes #27859
Description
Add CUDA kernel type dispatch and template specializations for
int8_t,int16_t, anduint8_ttypes in the CUDA TopK operatorChanged files:
onnxruntime/core/providers/cuda/math/topk.cc— type constraints + dispatch cases for int8/int16/uint8onnxruntime/core/providers/cuda/math/topk_impl_i8.cu— new template instantiation for int8_tonnxruntime/core/providers/cuda/math/topk_impl_u8.cu— new template instantiation for uint8_tonnxruntime/core/providers/cuda/math/topk_impl_i16.cu— new template instantiation for int16_tMotivation and Context
This is the CUDA counterpart to #27860 (CPU TopK INT8/INT16/UINT8 support).
The ONNX specification (opset 11+) lists
INT8,INT16, andUINT8as valid input types for the TopK operator. After #27860 added CPU support, the CUDA execution provider still lacked kernels for these types, causing models to fall back to CPU or fail when usingCUDAExecutionProvider.The existing CUDA TopK implementation uses a split-compilation pattern (one
.cufile per type) withToCudaType<T>mapping. Since the default template maps integer types to themselves andNumericLimits<T>usesstd::numeric_limits<T>, no algorithmic changes were needed — only:ComputeInternal.cufiles for template instantiationAll 64 TopK tests pass (including 8 tests for the new types, running on both CPU and CUDA providers).
Test Results