Summary
migraphx-driver compile --gpu crashes when the graph contains a Reduce* (Max/Sum/Mean) whose reduced axes all have length 1. The op is mathematically an identity, but it reaches GPU lowering as a fused_reduce<{N, 1}>, for which the HIP JIT has no valid kernel configuration.
Real-world scenario: any YOLO26-pose export from Ultralytics, the post-processing emits ReduceMax over axes of length 1. Num_classes for these models = 1 (person, animal etc)
Repro A: Ultralytics → ONNX → migraphx
pip install ultralytics onnx
python -c "from ultralytics import YOLO; YOLO('yolo26s-pose.pt').export(format='onnx', imgsz=640, simplify=False, device='cpu')"
migraphx-driver compile --onnx yolo26s-pose.onnx --gpu
Crashes during HIP codegen of fused_reduce<{8400, 1}> (the ReduceMax in the pose head reduces over an axis of length 1).
Repro B: minimal standalone graph (ONNX Python API)
import onnx
from onnx import helper, TensorProto
x = helper.make_tensor_value_info("x", TensorProto.FLOAT, [1, 21, 56])
y = helper.make_tensor_value_info("y", TensorProto.FLOAT, [1, 21, 1])
starts = helper.make_tensor("starts", TensorProto.INT64, [1], [55])
ends = helper.make_tensor("ends", TensorProto.INT64, [1], [56])
axes_t = helper.make_tensor("axes", TensorProto.INT64, [1], [2])
slice_n = helper.make_node("Slice", ["x", "starts", "ends", "axes"], ["sliced"])
reduce_n = helper.make_node("ReduceMax", ["sliced"], ["y"], axes=[-1], keepdims=1)
onnx.save(
helper.make_model(
helper.make_graph([slice_n, reduce_n], "repro", [x], [y],
initializer=[starts, ends, axes_t]),
opset_imports=[helper.make_opsetid("", 13)],
),
"repro.onnx",
)
migraphx-driver compile --onnx repro.onnx --gpu
Actual
Module:
x0 = @param:x0 -> float_type, {1, 21, 1}
@1 = reduce_max[axes={2}](x0) -> float_type, {1, 21, 1}
@2 = @return(@1)
Error gpu::compile_ops: ...:215: benchmark: No valid tuned compilation for fused_reduce with <no problem key>
terminate called after throwing an instance of 'migraphx::exception'
Expected
Reduce* over axes of length 1 is an identity. Compile should succeed, either the op is folded away, or fused_reduce should not crash on this shape.
Environment
- MIGraphX
2.15.0.20250912-17-201-g0c6368a2e (current develop reproduces too)
- ROCm 7.2.2, GPU
gfx1201
Summary
migraphx-driver compile --gpucrashes when the graph contains aReduce*(Max/Sum/Mean) whose reduced axes all have length 1. The op is mathematically an identity, but it reaches GPU lowering as afused_reduce<{N, 1}>, for which the HIP JIT has no valid kernel configuration.Real-world scenario: any YOLO26-pose export from Ultralytics, the post-processing emits
ReduceMaxover axes of length 1. Num_classes for these models = 1 (person, animal etc)Repro A: Ultralytics → ONNX → migraphx
pip install ultralytics onnx python -c "from ultralytics import YOLO; YOLO('yolo26s-pose.pt').export(format='onnx', imgsz=640, simplify=False, device='cpu')" migraphx-driver compile --onnx yolo26s-pose.onnx --gpuCrashes during HIP codegen of
fused_reduce<{8400, 1}>(theReduceMaxin the pose head reduces over an axis of length 1).Repro B: minimal standalone graph (ONNX Python API)
Actual
Expected
Reduce*over axes of length 1 is an identity. Compile should succeed, either the op is folded away, orfused_reduceshould not crash on this shape.Environment
2.15.0.20250912-17-201-g0c6368a2e(currentdevelopreproduces too)gfx1201