GPU NMS kernel and refactor of NMS operator#4893
Conversation
There was a problem hiding this comment.
Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit
format.py
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
AMDMIGraphX/src/targets/gpu/prepare_nonmaxsuppression.cpp
Lines 170 to 187 in 5ca611f
[format.py] reported by reviewdog 🐶
AMDMIGraphX/src/targets/gpu/prepare_nonmaxsuppression.cpp
Lines 191 to 192 in 5ca611f
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/verify/test_nms.cpp
Lines 54 to 57 in 5ca611f
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/verify/test_nms.cpp
Line 201 in 5ca611f
beacuse of operator::flatten same name clashing.
There was a problem hiding this comment.
Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit
format.py
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 220 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 299 to 302 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 346 to 348 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 378 to 384 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 388 to 397 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 405 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 433 to 449 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 451 to 455 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 463 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 481 to 487 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 491 to 557 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 559 to 575 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 583 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 601 to 607 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 611 to 620 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 622 to 629 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 637 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 655 to 661 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 665 to 798 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 800 to 866 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 874 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 879 in c2ddb73
There was a problem hiding this comment.
Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit
format.py
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
AMDMIGraphX/src/targets/gpu/prepare_nonmaxsuppression.cpp
Lines 168 to 182 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 220 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 299 to 302 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 346 to 348 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 379 to 385 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 389 to 398 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 406 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 435 to 451 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 453 to 457 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 465 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 484 to 490 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 494 to 560 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 562 to 578 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 586 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 605 to 611 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 615 to 624 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 626 to 633 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 641 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 660 to 666 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 670 to 803 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 805 to 871 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 879 in b5c1e77
There was a problem hiding this comment.
Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit
format.py
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
AMDMIGraphX/src/targets/gpu/prepare_nonmaxsuppression.cpp
Lines 168 to 182 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 220 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 299 to 302 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 346 to 348 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 379 to 385 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 389 to 398 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 406 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 435 to 451 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 453 to 457 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 465 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 484 to 490 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 494 to 560 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 562 to 578 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 586 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 605 to 611 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 615 to 624 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 626 to 633 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 641 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 660 to 666 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 670 to 803 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 805 to 871 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 879 in 1011256
There was a problem hiding this comment.
Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit
format.py
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 453 to 457 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 465 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 484 to 490 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 494 to 560 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 562 to 578 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 586 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 605 to 611 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 615 to 624 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 626 to 633 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 641 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 660 to 666 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 670 to 803 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 805 to 871 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 879 in 32c779d
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #4893 +/- ##
===========================================
- Coverage 92.88% 92.67% -0.20%
===========================================
Files 587 588 +1
Lines 30348 30414 +66
===========================================
- Hits 28187 28186 -1
- Misses 2161 2228 +67
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Adds a GPU implementation of ONNX NonMaxSuppression (NMS) and refactors the core nonmaxsuppression op to better represent “dynamic” output sizing in MIGraphX IR by returning a tuple {indices, num_selected} (with optional slicing applied by the ONNX parser).
Changes:
- Introduces a 3-stage GPU NMS pipeline (
gpu::nms_sort→gpu::nms_filter→gpu::nms_compact) with new kernels and JIT compilers, and updates GPU lowering to select the pipeline for static shapes (ref fallback for dynamic inputs). - Refactors
op::nonmaxsuppressionto always return a tuple (raw indices + selected count) and updates ONNX parsing/tests to extract (and optionally slice) the indices output. - Renames shape tuple flattening helper to
flatten_shapesand updates GPU JIT paths to use it; adds/updates documentation and tests (including new GPU NMS tests).
Reviewed changes
Copilot reviewed 33 out of 35 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| test/verify/test_nms.cpp | Removes legacy verify-based NMS tests. |
| test/ref/nonmaxsuppression.cpp | Updates ref tests to slice indices using the tuple’s num_selected. |
| test/op_shape_test.cpp | Updates NMS shape expectations to a tuple {indices, num_selected}. |
| test/onnx/parse/nms_use_dyn_output_false_test.cpp | Removes test tied to removed use_dyn_output option/behavior. |
| test/onnx/parse/nms_test.cpp | Updates ONNX parse test to extract tuple elem 0 (indices). |
| test/onnx/parse/nms_dynamic_classes_test.cpp | Updates dynamic-classes parse test for tuple-return NMS. |
| test/onnx/parse/nms_dynamic_boxes_test.cpp | Updates dynamic-boxes parse test for tuple-return NMS. |
| test/onnx/parse/nms_dynamic_batch_test.cpp | Updates dynamic-batch parse test for tuple-return NMS. |
| test/onnx/gen_onnx.py | Removes custom use_dyn_output attribute generation in NMS ONNX test models. |
| test/multi_target/multitarget_test.cpp | Updates multitarget test to slice indices using tuple count. |
| test/gpu/nonmaxsuppression.cpp | Adds extensive GPU NMS functional tests (plus a dynamic-shape fallback test). |
| src/targets/gpu/nms_ops.cpp | Adds new internal GPU ops (gpu::nms_sort/filter/compact) and their shapes. |
| src/targets/gpu/lowering.cpp | Lowers nonmaxsuppression to the GPU pipeline for static shapes; ref fallback for dynamic. |
| src/targets/gpu/kernels/include/migraphx/kernels/sort.hpp | Adds block-wide bitonic sort helpers used by NMS kernels. |
| src/targets/gpu/kernels/include/migraphx/kernels/nonmaxsuppression.hpp | Adds GPU kernel implementations for sort/filter/compact stages. |
| src/targets/gpu/jit/topk.cpp | Switches to flatten_shapes when building HIP compile inputs. |
| src/targets/gpu/jit/reduce.cpp | Switches to flatten_shapes when building HIP compile inputs. |
| src/targets/gpu/jit/pointwise.cpp | Switches to flatten_shapes when building HIP compile inputs. |
| src/targets/gpu/jit/nonmaxsuppression.cpp | Adds JIT compilers for gpu::nms_sort/filter/compact kernels. |
| src/targets/gpu/include/migraphx/gpu/compile_hip_code_object.hpp | Clarifies meaning of global/local fields in compile options. |
| src/targets/gpu/device/include/migraphx/gpu/device/scan.hpp | Adds/clarifies block scan documentation/overload. |
| src/targets/gpu/compile_hip_code_object.cpp | Documents compute_block_size intent. |
| src/targets/gpu/code_object_op.cpp | Uses flatten_shapes for shape compatibility checks with tuple inputs. |
| src/targets/gpu/CMakeLists.txt | Adds nms_ops.cpp to GPU target build. |
| src/shape.cpp | Renames shape flattening helper to flatten_shapes. |
| src/onnx/parse_nonmaxsuppression.cpp | Parses NMS as tuple-return op and optionally slices indices under MIGRAPHX_USE_DYNAMIC_NMS. |
| src/onnx/onnx.cpp | Removes propagation of removed use_dyn_output option into parser. |
| src/onnx/include/migraphx/onnx/onnx_parser.hpp | Removes onnx_parser::use_dyn_output member. |
| src/include/migraphx/shape.hpp | Renames public API flatten → flatten_shapes. |
| src/include/migraphx/op/nonmaxsuppression.hpp | Refactors NMS op to return tuple {indices, num_selected} and adds runtime shape validation. |
| src/include/migraphx/onnx.hpp | Removes public onnx_options::use_dyn_output. |
| docs/reference/MIGraphX-dev-env-vars.rst | Documents MIGRAPHX_USE_DYNAMIC_NMS env var behavior. |
| docs/dev/onnx_operators.rst | Updates operator matrix entry for NonMaxSuppression. |
| const auto iou_packed = num_boxes * (num_boxes - 1) / 2; | ||
|
|
| // num_batches/num_classes/num_boxes are kept as op attributes because the filter inputs | ||
| // is a scratch buffer from which these can't be recovered. | ||
| // inputs = {sorted_boxes, sorted_scores, sorted_box_indices, output_indices, output_bc_counts} |
| const auto& boxes_s = inputs[0]; | ||
| const auto& scores_s = inputs[1]; | ||
| const auto num_batches = boxes_s.lens()[0]; | ||
| const auto num_boxes = boxes_s.lens()[1]; | ||
| const auto num_classes = scores_s.lens()[1]; |
There was a problem hiding this comment.
[format.py] reported by reviewdog 🐶
| const auto& boxes_s = inputs[0]; | |
| const auto& scores_s = inputs[1]; | |
| const auto num_batches = boxes_s.lens()[0]; | |
| const auto num_boxes = boxes_s.lens()[1]; | |
| const auto num_classes = scores_s.lens()[1]; | |
| const auto& boxes_s = inputs[0]; | |
| const auto& scores_s = inputs[1]; | |
| const auto num_batches = boxes_s.lens()[0]; | |
| const auto num_boxes = boxes_s.lens()[1]; | |
| const auto num_classes = scores_s.lens()[1]; |
| const auto num_batches = v.at("num_batches").to<std::size_t>(); | ||
| const auto num_classes = v.at("num_classes").to<std::size_t>(); | ||
| const auto num_boxes = v.at("num_boxes").to<std::size_t>(); |
There was a problem hiding this comment.
[format.py] reported by reviewdog 🐶
| const auto num_batches = v.at("num_batches").to<std::size_t>(); | |
| const auto num_classes = v.at("num_classes").to<std::size_t>(); | |
| const auto num_boxes = v.at("num_boxes").to<std::size_t>(); | |
| const auto num_batches = v.at("num_batches").to<std::size_t>(); | |
| const auto num_classes = v.at("num_classes").to<std::size_t>(); | |
| const auto num_boxes = v.at("num_boxes").to<std::size_t>(); |
| __device__ void block_sort(index idx, T& buf) const | ||
| { | ||
| static_assert(is_power_of_2(N), "N must be a power of 2"); | ||
| //NOLINTNEXTLINE(hicpp-signed-bitwise) |
There was a problem hiding this comment.
[format.py] reported by reviewdog 🐶
| //NOLINTNEXTLINE(hicpp-signed-bitwise) | |
| // NOLINTNEXTLINE(hicpp-signed-bitwise) |
| //NOLINTNEXTLINE(hicpp-signed-bitwise) | ||
| for(index_int k = 2; k <= N; k <<= 1) | ||
| { | ||
| //NOLINTNEXTLINE(hicpp-signed-bitwise) |
There was a problem hiding this comment.
[format.py] reported by reviewdog 🐶
| //NOLINTNEXTLINE(hicpp-signed-bitwise) | |
| // NOLINTNEXTLINE(hicpp-signed-bitwise) |
| __device__ void block_sort_indexed(index idx, CompareAt compare_at, SwapAt swap_at) const | ||
| { | ||
| static_assert(is_power_of_2(N), "N must be a power of 2"); | ||
| //NOLINTNEXTLINE(hicpp-signed-bitwise) |
There was a problem hiding this comment.
[format.py] reported by reviewdog 🐶
| //NOLINTNEXTLINE(hicpp-signed-bitwise) | |
| // NOLINTNEXTLINE(hicpp-signed-bitwise) |
| //NOLINTNEXTLINE(hicpp-signed-bitwise) | ||
| for(index_int k = 2; k <= N; k <<= 1) | ||
| { | ||
| //NOLINTNEXTLINE(hicpp-signed-bitwise) |
There was a problem hiding this comment.
[format.py] reported by reviewdog 🐶
| //NOLINTNEXTLINE(hicpp-signed-bitwise) | |
| // NOLINTNEXTLINE(hicpp-signed-bitwise) |
TedThemistokleous
left a comment
There was a problem hiding this comment.
Few review comments
More some additional place to add asserts and questions about removal of some other legacy tests. Mainly to ensure num_boxes doesn't resolve to zero somewhere.
I think you can simplify one of the kernels to avoid always doing a bunch of extra max, and mul operations for area calculations by adding an early return if either height/width is zero.
| double iou_threshold, | ||
| double score_threshold) const | ||
| { | ||
| // NOTE: should not need to fill with 0 |
There was a problem hiding this comment.
Why don't we just remove this then?
There was a problem hiding this comment.
It's to preserve the previous behavior for now. Technically the operator after NMS should never be reading the values after num_selected_indices.
| // `init`: initializer | ||
| // `fs`: striding function for thread work distribution. | ||
| // `input`: input with input(index_int). | ||
| // `output`: output with output(index_int, inclusive_scan_value_at_index_int). |
There was a problem hiding this comment.
Appreciate the added comments here.
| const auto num_batches = boxes_s.lens()[0]; | ||
| const auto num_boxes = boxes_s.lens()[1]; | ||
| const auto num_classes = scores_s.lens()[1]; | ||
| const auto iou_packed = num_boxes * (num_boxes - 1) / 2; |
There was a problem hiding this comment.
Add an assert for num_boxes > 0 so we if there is a weird issue when num_boxes = 0 we'll catch in debug build
| mod->insert_instruction(ins, make_op("hip::copy_to_gpu"), cpu_sub, gpu_alloc)); | ||
| } | ||
|
|
||
| // TODO: this needs cleanup |
There was a problem hiding this comment.
Make this a separate issue task to handle after then
| const auto& cnt_s = inputs[0]; | ||
| const auto& indices_s = inputs[1]; | ||
| const auto num_batch_class = cnt_s.elements(); | ||
| const auto num_boxes = indices_s.elements() / (num_batch_class * std::size_t{3}); |
There was a problem hiding this comment.
Add an assert here to ensure that this doesn't become zero if this operation results in 0 with remainder since this is size_t. If inidices_s < (num_batch_class / 3) that could pose an issue.
| const auto area_a = max(a[2] - a[0], 0.f) * max(a[3] - a[1], 0.f); | ||
| const auto area_b = max(b[2] - b[0], 0.f) * max(b[3] - b[1], 0.f); | ||
| const auto un = area_a + area_b - inter; | ||
| if(area_a <= 0.f or area_b <= 0.f or un <= 0.f) |
There was a problem hiding this comment.
Why not do the check after each calculation here wouldn't that save compute/calculations since NMS you're typically looking over many items?
You don't need to do another calculation of un if either area_a or area_b fails,
Also if inter == 0 you can skip all the calculations meaning if w or h results in zero the rest of the calculations don't matter. There can be an early return you can do as well.
|
|
||
| idx.local_stride(half, [&](auto i) { | ||
| fill_row(i); | ||
| fill_row(NumBoxes - 1 - i); |
There was a problem hiding this comment.
Are we not concerned if this is negative on the fill? Shouldn't we add a max here?
max(NumBoxes - 1 -i, 0)
There was a problem hiding this comment.
Why remove this test?
There was a problem hiding this comment.
No more dyn_output flag in NMS
There was a problem hiding this comment.
same as above - shouldn't we have the same verify tests as before or was there an error here/some non relevant piece with this test now?
There was a problem hiding this comment.
NMS verify between ref and GPU doesn't make sense with the random data generated by verify.
There was a problem hiding this comment.
are we tracking this? adding static input to verify
There was a problem hiding this comment.
Yeah, it's an internal ticket on our JIRA assigned to Paul.
| | | | | ``use_dyn_output`` | | ||
| | | | | set | | ||
| | NonMaxSuppression | ✅ | FP8, FP16, | | | ||
| | | | FP32, FP64 | | |
There was a problem hiding this comment.
is there a reason the notes column is empty for NMS? I know you intend to remove the env var in a follow-up PR but might be nice to have notes for posterity's sake, especially since the default doesn't match the onnx spec
There was a problem hiding this comment.
Good point, I'll add a note
| if(enabled(MIGRAPHX_USE_DYNAMIC_NMS{})) | ||
| { | ||
| // TODO: planning to make this the default behavior and removing the env var. | ||
| auto num_selected = |
There was a problem hiding this comment.
this path matches the onnx spec, correct? when you remove the env var, will you be allowing the other path still to be taken? is the other path, without the slicing, used by us anywhere or important to have somehow?
There was a problem hiding this comment.
When I remove the env var in a followup, the static shape output path will be allowed after I create a matcher. The matcher will have to figure out that the static output shape path is allowable and produces valid results in the part of the model after the NMS instruction.
| max_out_l, | ||
| iou_threshold, | ||
| score_threshold); | ||
| auto idx = gpu_mod->add_instruction(migraphx::make_op("get_tuple_elem", {{"index", 0}}), nms); |
There was a problem hiding this comment.
so the get_tuple_elems are a new IR expectation, you update our tests but do we know of any external users/uses that this will be a breaking change for? when you change the default behavior, will the IR change again or is this the standard from here on out?
| // [batch_index, class_index, box_index]. The maximum possible | ||
| // [batch_index, class_index, box_index]. The maximum possible | ||
| // num_selected_indices = num_batches * num_classes * spatial_dimension. | ||
| // TODO: can also be limited by max_output_boxes_per_class |
| { | ||
| hip_compile_options options; | ||
| options.inputs = flatten(inputs); | ||
| options.inputs = flatten_shapes(inputs); |
There was a problem hiding this comment.
are we tracking this? adding static input to verify
| const auto un = area_a + area_b - inter; | ||
| if(area_a <= 0.f or area_b <= 0.f or un <= 0.f) | ||
| return false; | ||
| return (inter / un) > threshold; |
There was a problem hiding this comment.
for sanity, can you add a test case that specifically tests this? the onnx spec specifically states that boxes w threshold == IOU should be kept. even though the code here implements that, it isn't explicit and I think it'd benefit future readers if it were
| scores_s = {migraphx::shape::float_type, {{1, 3}, {1, 3}, {6, 6}}}; | ||
| output_s = {migraphx::shape::int64_type, {{0, 18}, {3, 3}}}; | ||
| expect_shape(output_s, | ||
| migraphx::make_op("nonmaxsuppression", |
There was a problem hiding this comment.
might be worth mentioning that I do not see a test specifically related to the env var. I know you plan on removing that soon and making it the default path, but could potentially be useful to have a test in the meantime
| return std::make_pair(sc, box_idx - 1); | ||
| }); | ||
| } | ||
| // Sort by the higher score; or if equal then the early (i.e. lower) index of the box |
There was a problem hiding this comment.
for my sanity, is this the same behavior as the gpu impl? it seems like gpu only uses scores, and doesn't break ties like this
| // Block-wide bitonic sort of an N-element buffer (N must be a power of 2; | ||
| // pad with sentinels when the logical length is smaller). | ||
| template <index_int N, class T> | ||
| __device__ void block_sort(index idx, T& buf) const |
There was a problem hiding this comment.
are there any tests for this in isolation?
Motivation
Technical Details
Changelog Category
Add a
CHANGELOG.mdentry for any option other thanNot Applicable