Skip to content

[Issue]: Segfault during GPU compile on Reduce* over axes of length 1 (YOLO-pose models) #4884

@itikhono

Description

@itikhono

Summary

migraphx-driver compile --gpu crashes when the graph contains a Reduce* (Max/Sum/Mean) whose reduced axes all have length 1. The op is mathematically an identity, but it reaches GPU lowering as a fused_reduce<{N, 1}>, for which the HIP JIT has no valid kernel configuration.

Real-world scenario: any YOLO26-pose export from Ultralytics, the post-processing emits ReduceMax over axes of length 1. Num_classes for these models = 1 (person, animal etc)

Repro A: Ultralytics → ONNX → migraphx

pip install ultralytics onnx
python -c "from ultralytics import YOLO; YOLO('yolo26s-pose.pt').export(format='onnx', imgsz=640, simplify=False, device='cpu')"

migraphx-driver compile --onnx yolo26s-pose.onnx --gpu

Crashes during HIP codegen of fused_reduce<{8400, 1}> (the ReduceMax in the pose head reduces over an axis of length 1).

Repro B: minimal standalone graph (ONNX Python API)

import onnx
from onnx import helper, TensorProto

x = helper.make_tensor_value_info("x", TensorProto.FLOAT, [1, 21, 56])
y = helper.make_tensor_value_info("y", TensorProto.FLOAT, [1, 21, 1])

starts = helper.make_tensor("starts", TensorProto.INT64, [1], [55])
ends   = helper.make_tensor("ends",   TensorProto.INT64, [1], [56])
axes_t = helper.make_tensor("axes",   TensorProto.INT64, [1], [2])

slice_n  = helper.make_node("Slice", ["x", "starts", "ends", "axes"], ["sliced"])
reduce_n = helper.make_node("ReduceMax", ["sliced"], ["y"], axes=[-1], keepdims=1)

onnx.save(
    helper.make_model(
        helper.make_graph([slice_n, reduce_n], "repro", [x], [y],
                          initializer=[starts, ends, axes_t]),
        opset_imports=[helper.make_opsetid("", 13)],
    ),
    "repro.onnx",
)
migraphx-driver compile --onnx repro.onnx --gpu

Actual

Module:
  x0 = @param:x0 -> float_type, {1, 21, 1}
  @1 = reduce_max[axes={2}](x0) -> float_type, {1, 21, 1}
  @2 = @return(@1)

Error gpu::compile_ops: ...:215: benchmark: No valid tuned compilation for fused_reduce with <no problem key>
terminate called after throwing an instance of 'migraphx::exception'

Expected

Reduce* over axes of length 1 is an identity. Compile should succeed, either the op is folded away, or fused_reduce should not crash on this shape.

Environment

  • MIGraphX 2.15.0.20250912-17-201-g0c6368a2e (current develop reproduces too)
  • ROCm 7.2.2, GPU gfx1201

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions