Skip to content

[release/2.11] Align Radeon targets for supported and preferred hipBLASLt backend with release/2.10#3135

Open
mstankov-amd wants to merge 1 commit into
release/2.11from
align_radeon_targets_preffered_blas_backend
Open

[release/2.11] Align Radeon targets for supported and preferred hipBLASLt backend with release/2.10#3135
mstankov-amd wants to merge 1 commit into
release/2.11from
align_radeon_targets_preffered_blas_backend

Conversation

@mstankov-amd
Copy link
Copy Markdown

Motivation

Align list of supported and preferred hipBLASLt targets (for Radeon GPU) with release/2.10 branch. Also, update the ROCm versions where target should be added to supported or preferred hipBLASLt list.

@rocm-repo-management-api
Copy link
Copy Markdown

rocm-repo-management-api Bot commented Apr 7, 2026

Jenkins build for c1f312dfd0059c24e3e63b64216b2dadab17b47e commit finished as FAILURE
Links: Pipeline Overview / Build artifacts / Test Results

Detected error during Pytorch building:

      |     ^
/var/lib/jenkins/pytorch/third_party/mslk/external//composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp:65:5: warning: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_ZN2ck16tensor_operation6device34kernel_grouped_gemm_multiple_d_xdlINS_34GridwiseGemmMultiD_xdl_cshuffle_v3INS_13tensor_layout4gemm8RowMajorENS5_11ColumnMajorENS_5TupleIJS6_S7_EEES6_DB8_SA_ffNS8_IJffEEEtNS0_12element_wise11PassThroughESD_NSC_16MultiplyMultiplyELNS1_18GemmSpecializationE0ELi256ELi224ELi256ELi128ELi16ELi16ELi16ELi16ELi7ELi8ENS_8SequenceIJLi8ELi32ELi1EEEENSG_IJLi1ELi0ELi2EEEESI_Li2ELi16ELi16ELb0ELi0ESH_SI_SI_Li2ELi16ELi16ELb0ELi0ELi1ELi2ENSG_IJLi1ELi32ELi1ELi8EEEENSG_IJLi8ELi8ELi1EEEELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE2ESA_SA_SA_SA_Lb0EEENS1_25GroupedGemmKernelArgumentILi2EEELSF_0ESA_SA_SB_tS6_S7_S9_S6_Li128ENS_25OffsettedBlockToCTileMap2INS_39BlockToCTileMap_Grouped_M00_N0_M01AdaptILi8ELi224ELi256EEEEESS_SD_SD_SE_LSL_0ELSM_2EEEvPU3AS4KviT13_T14_T15_': desired occupancy was 2, final occupancy is 1 [-Wpass-failed]
2 warnings generated when compiling for gfx942.
[7268/8165] Linking CXX static library lib/libmslk.a
[7269/8165] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_ck_gemm_float.hip.o
FAILED: caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_ck_gemm_float.hip.o /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_ck_gemm_float.hip.o 
cd /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip && /opt/conda/envs/py_3.12/lib/python3.12/site-packages/cmake/data/bin/cmake -E make_directory /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/. && /opt/conda/envs/py_3.12/lib/python3.12/site-packages/cmake/data/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=RELEASE -D generated_file:STRING=/var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/./torch_hip_generated_ck_gemm_float.hip.o -P /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_ck_gemm_float.hip.o.cmake
sccache: encountered fatal error
sccache: error: Failed to parse included file path
sccache: caused by: Failed to parse included file path
failed to execute:/opt/rocm/llvm/bin/clang++  --offload-arch=gfx90a --offload-arch=gfx908 --offload-arch=gfx942 -O3  -c -x hip /var/lib/jenkins/pytorch/aten/src/ATen/native/hip/ck_gemm_float.hip -o "/var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/./torch_hip_generated_ck_gemm_float.hip.o" --offload-compress -std=c++17 --rocm-device-lib-path=/opt/rocm/amdgcn/bitcode -fclang-abi-compat=17 -DUSE_NCCL -DUSE_ROCM -D__HIP_PLATFORM_AMD__ -DUSE_FLASH_ATTENTION -DFLASHATTENTION_DISABLE_ALIBI -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASH_NAMESPACE=pytorch_flash -DUNFUSE_FMA -DUSE_MEM_EFF_ATTENTION -DUSE_C10D_NCCL -DTORCH_CUDA_BUILD_MAIN_LIB -DROCM_VERSION=70201 -DTORCH_HIP_VERSION=702 -DUSE_LAYERNORM_FAST_RECIPROCAL -DONNX_ML=1 -DONNXIFI_ENABLE_EXT=1 -DONNX_NAMESPACE=onnx_torch -DIDEEP_USE_MKL -DHAVE_MMAP=1 -D_FILE_OFFSET_BITS=64 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_POSIX_FALLOCATE=1 -DUSE_EXTERNAL_MZCRC -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -D__HIP_PLATFORM_AMD__=1 -DUSE_PROF_API=1 -DAT_PER_OPERATOR_HEADERS -DUSE_DISTRIBUTED -DUSE_C10D_GLOO -DUSE_RPC -DUSE_TENSORPIPE -D__HIP_PLATFORM_AMD__ -DHIPBLASLT_USE_ROCROLLER -DFMT_HEADER_ONLY=1 -fPIC -D__HIP_PLATFORM_AMD__=1 -DCUDA_HAS_FP16=1 -DUSE_ROCM -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -DTORCH_HIP_VERSION=702 -Wno-shift-count-negative -Wno-shift-count-overflow -DCAFFE2_USE_MIOPEN -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_HIP -DHIPBLAS_V2 -DHIP_ENABLE_WARP_SYNC_BUILTINS -DHIPBLASLT_OUTER_VEC -DUSE_ROCM_CK_GEMM -fno-gpu-rdc -I/var/lib/jenkins/pytorch/build/aten/src -I/var/lib/jenkins/pytorch/aten/src -I/var/lib/jenkins/pytorch/build -I/var/lib/jenkins/pytorch -I/opt/rocm-7.2.1/include -I/var/lib/jenkins/pytorch/build/third_party/gloo -I/var/lib/jenkins/pytorch/cmake/../third_party/gloo -I/var/lib/jenkins/pytorch/cmake/../third_party/tensorpipe/third_party/libuv/include -I/var/lib/jenkins/pytorch/cmake/../third_party/googletest/googlemock/include -I/var/lib/jenkins/pytorch/cmake/../third_party/googletest/googletest/include -I/var/lib/jenkins/pytorch/third_party/protobuf/src -I/opt/conda/envs/py_3.12/include -I/var/lib/jenkins/pytorch/third_party/XNNPACK/include -I/var/lib/jenkins/pytorch/third_party/ittapi/include -I/var/lib/jenkins/pytorch/cmake/../third_party/eigen -I/opt/rocm/include -I/opt/rocm-7.2.1/include -I/var/lib/jenkins/pytorch/third_party/ideep/mkl-dnn/include/oneapi/dnnl -I/var/lib/jenkins/pytorch/third_party/ideep/include -I/var/lib/jenkins/pytorch/third_party/ideep/mkl-dnn/include/oneapi/dnnl -I/opt/conda/envs/py_3.12/include -I/var/lib/jenkins/pytorch/nlohmann -I/var/lib/jenkins/pytorch/INTERFACE -I/var/lib/jenkins/pytorch/third_party/nlohmann/include -I/var/lib/jenkins/pytorch/moodycamel -I/var/lib/jenkins/pytorch/INTERFACE -I/var/lib/jenkins/pytorch/third_party/concurrentqueue -I/var/lib/jenkins/pytorch/aten/src/THH -I/var/lib/jenkins/pytorch/third_party/mslk/include/ -I/var/lib/jenkins/pytorch/aten/src/ATen/hip -I/var/lib/jenkins/pytorch/aten/src/ATen/../../../third_party/composable_kernel/include -I/var/lib/jenkins/pytorch/aten/src/ATen/../../../third_party/composable_kernel/library/include -I/var/lib/jenkins/pytorch/aten/src/ATen/../../../third_party/composable_kernel/example/ck_tile/01_fmha -I/var/lib/jenkins/pytorch/build/caffe2/aten/src/ATen/composable_kernel -I/var/lib/jenkins/pytorch/aten/src/ATen/../../../third_party/aiter/csrc/include -I/var/lib/jenkins/pytorch/third_party/fmt/include -I/var/lib/jenkins/pytorch/aten/src -I/var/lib/jenkins/pytorch/build/caffe2/aten/src -I/var/lib/jenkins/pytorch/build/aten/src -I/var/lib/jenkins/pytorch/aten/src -I/var/lib/jenkins/pytorch/aten/src/ATen/.. -I/var/lib/jenkins/pytorch/torch/include -I/opt/rocm-7.2.1/include -I/opt/rocm/include -I/var/lib/jenkins/pytorch/c10/hip/../.. -I/var/lib/jenkins/pytorch/build -I/var/lib/jenkins/pytorch/c10/../ -I/var/lib/jenkins/pytorch/build -I/var/lib/jenkins/pytorch/torch/csrc/api -I/var/lib/jenkins/pytorch/torch/csrc/api/include -I/var/lib/jenkins/pytorch/third_party/protobuf/src -I/opt/conda/envs/py_3.12/include -I/opt/rocm-7.2.1/include -I/opt/rocm/include -I/opt/rocm-7.2.1/include -I/opt/rocm-7.2.1/include -I/opt/rocm-7.2.1/include -I/opt/rocm-7.2.1/include -I/opt/rocm-7.2.1/include -I/opt/rocm-7.2.1/include -I/opt/rocm-7.2.1/include -I/opt/rocm-7.2.1/include -I/opt/rocm-7.2.1/include/hiprand -I/opt/rocm-7.2.1/include -I/opt/rocm-7.2.1/include/rocrand -I/opt/rocm-7.2.1/include -I/opt/rocm-7.2.1/include -I/opt/rocm-7.2.1/include -I/opt/rocm-7.2.1/include -I/opt/rocm-7.2.1/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm-7.2.1/include -I/opt/rocm-7.2.1/include -I/opt/rocm-7.2.1/include -I/opt/rocm-7.2.1/include -I/opt/rocm/include -I/var/lib/jenkins/pytorch/build/third_party/gloo/hip -I/opt/rocm/magma/include -I/var/lib/jenkins/pytorch/build/aten/src -I/var/lib/jenkins/pytorch/aten/src -I/var/lib/jenkins/pytorch/build -I/var/lib/jenkins/pytorch -I/opt/rocm-7.2.1/include -I/var/lib/jenkins/pytorch/build/third_party/gloo -I/var/lib/jenkins/pytorch/cmake/../third_party/gloo -I/var/lib/jenkins/pytorch/cmake/../third_party/tensorpipe/third_party/libuv/include -I/var/lib/jenkins/pytorch/cmake/../third_party/googletest/googlemock/include -I/var/lib/jenkins/pytorch/cmake/../third_party/googletest/googletest/include -I/var/lib/jenkins/pytorch/third_party/protobuf/src -I/opt/conda/envs/py_3.12/include -I/var/lib/jenkins/pytorch/third_party/XNNPACK/include -I/var/lib/jenkins/pytorch/third_party/ittapi/include -I/var/lib/jenkins/pytorch/cmake/../third_party/eigen -I/opt/rocm/include -I/var/lib/jenkins/pytorch/third_party/ideep/mkl-dnn/include/oneapi/dnnl -I/var/lib/jenkins/pytorch/third_party/ideep/include -I/var/lib/jenkins/pytorch/nlohmann -I/var/lib/jenkins/pytorch/INTERFACE -I/var/lib/jenkins/pytorch/third_party/nlohmann/include -I/var/lib/jenkins/pytorch/moodycamel -I/var/lib/jenkins/pytorch/third_party/concurrentqueue

@mstankov-amd mstankov-amd changed the title Align Radeon targets for supported and preferred hipBLASLt backend with release/2.10 [release/2.11] Align Radeon targets for supported and preferred hipBLASLt backend with release/2.10 Apr 29, 2026
@mgehre-amd
Copy link
Copy Markdown

@mstankov-amd @jeffdaily, will this get merged?

@mgehre-amd
Copy link
Copy Markdown

And are you working on adding this to release/2.12 and the upstream pytorch main branch?

@fjankovi
Copy link
Copy Markdown

And are you working on adding this to release/2.12 and the upstream pytorch main branch?

Yes, we should now add gfx110x and gfx1151 to the prefer hipblaslt list on pytorch main

@mgehre-amd
Copy link
Copy Markdown

And are you working on adding this to release/2.12 and the upstream pytorch main branch?

Yes, we should now add gfx110x and gfx1151 to the prefer hipblaslt list on pytorch main

Great! @fjankovi, is there a ticket or PR to track that work?

@mstankov-amd
Copy link
Copy Markdown
Author

Created an upstream PR for adding gfx1100, gfx1101 and gfx1151 to hiupBLASLt preferred backend list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants