Skip to content

Rotated bounding box NMS implementation for GPU#9478

Draft
zy1git wants to merge 26 commits into
pytorch:mainfrom
zy1git:rotated-NMS-GPU
Draft

Rotated bounding box NMS implementation for GPU#9478
zy1git wants to merge 26 commits into
pytorch:mainfrom
zy1git:rotated-NMS-GPU

Conversation

@zy1git
Copy link
Copy Markdown
Contributor

@zy1git zy1git commented Apr 16, 2026

Summary:
Enabled rotated box NMS on CUDA, building on the CPU implementation from #9450. The CUDA kernel is adapted from Detectron2's nms_rotated_cuda_kernel, using single_box_iou_rotated instead of devIoU for IoU computation, and storing 5 values per box in shared memory instead of 4. The result gathering reuses TorchVision's existing gather_keep_from_mask GPU kernel, avoiding the GPU→CPU→GPU transfer that Detectron2's implementation requires.

Test Plan:

Added device parametrization (cpu_and_cuda()) to all rotated NMS tests.

Run pytest test/test_ops.py::TestNMS -v -k "rotated" All 110 tests pass locally (CPU and CUDA).

Zhitao Yu added 26 commits March 22, 2026 21:12
…gorithm is standard TorchVision code; attribution already in box_iou_rotated_utils.h
…ve boxes in test_nms_rotated_different_angles function
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 16, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9478

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 2 New Failures

As of commit 2d22de0 with merge base d7400a3 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@zy1git zy1git marked this pull request as draft April 16, 2026 07:56
@meta-cla meta-cla Bot added the cla signed label Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant