Fix CUDA build with contrib ops disabled by Copilot · Pull Request #28554 · microsoft/onnxruntime

Copilot · 2026-05-19T03:52:07Z

Description

The CUDA Attention kernel (core/providers/cuda/llm/attention.cc) depends on contrib_ops internals (flash attention, memory efficient attention, unfused attention helpers) but was compiled unconditionally. When building with --disable_contrib_ops, GetAttentionKernelOptions() is unavailable (guarded by #ifndef DISABLE_CONTRIB_OPS in cuda_kernel.h), causing a compile error.

Changes:

cmake/onnxruntime_providers_cuda.cmake — When contrib ops are disabled (and not in CUDA minimal mode), include the contrib_ops/cuda/bert/ attention infrastructure files (flash attention, memory efficient attention, unfused attention helpers, etc.) so the ONNX domain Attention kernel can compile and link. Uses elseif(onnxruntime_DISABLE_CONTRIB_OPS AND NOT onnxruntime_CUDA_MINIMAL) to avoid including these files in CUDA minimal builds where llm/attention.cc isn't compiled and cudnn_frontend.h isn't available.
onnxruntime/core/providers/cuda/cuda_execution_provider.h — Remove #ifndef DISABLE_CONTRIB_OPS guards from the AttentionKernelOptions include, GetAttentionKernelOptions() method, and attention_kernel_options_ member variable
onnxruntime/core/providers/cuda/cuda_kernel.h — Remove #ifndef DISABLE_CONTRIB_OPS guard from GetAttentionKernelOptions()

The CUDA Attention kernel and its underlying attention backends (flash, memory efficient, unfused) are now always available in full CUDA builds regardless of whether contrib ops are enabled. No changes are needed in cuda_execution_provider.cc since the Attention kernel registrations remain unconditional.

Motivation and Context

Building onnxruntime with CUDA enabled and --disable_contrib_ops fails:

error C2039: 'GetAttentionKernelOptions': is not a member of 'onnxruntime::cuda::Attention<float>'

This is a valid build configuration (useful for reducing compile time) that should be supported. Rather than excluding the CUDA Attention kernel when contrib ops are disabled, the necessary attention infrastructure from contrib_ops/cuda/bert/ is included in the build so the ONNX domain Attention op retains full CUDA acceleration. The fix is scoped to non-minimal CUDA builds only, since CUDA minimal builds use a non-recursive glob that doesn't include llm/attention.cc and don't have cudnn_frontend available.

The CUDA Attention kernel implementation (core/providers/cuda/llm/attention.cc) depends on contrib ops (flash attention, memory efficient attention, unfused attention helpers from contrib_ops/cuda/bert/). When DISABLE_CONTRIB_OPS is defined, these dependencies are unavailable causing compilation failures. Fix by: 1. Excluding attention.h/attention.cc from the CUDA provider build when contrib ops are disabled (cmake change). 2. Guarding the Attention kernel class declarations and registrations in cuda_execution_provider.cc with #ifndef DISABLE_CONTRIB_OPS. The CPU EP still provides the standard ONNX domain Attention kernel as fallback when the CUDA implementation is unavailable. Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/4bbef367-4e58-49e5-9bca-8d5a2c8ee872 Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>

github-actions

You can commit the suggested changes from lintrunner.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Instead of removing the CUDA Attention kernel when contrib ops are disabled, include the necessary attention infrastructure files from contrib_ops/cuda/bert/ (flash attention, memory efficient attention, unfused attention helpers, etc.) so the ONNX Attention kernel can compile and link. Also removes #ifndef DISABLE_CONTRIB_OPS guards from GetAttentionKernelOptions() in cuda_kernel.h and cuda_execution_provider.h since AttentionKernelOptions is now always available. Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/4e3d0a4a-58f7-4e16-81d1-54e4f2f282e9 Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>

The #ifndef DISABLE_CONTRIB_OPS guards around Attention kernel registrations are unnecessary since the necessary contrib_ops/cuda/bert/ files are now always included in the build. Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/fac2d4d5-add5-4431-95f1-c4f38205994e Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>

tianleiwu

There is CI build failure for Build Linux TensorRT CUDA Minimal x64 Release

@copilot, please update ‎cmake/onnxruntime_providers_cuda.cmake to handle cuda minimal build nicely.

…and not CUDA minimal The CUDA minimal build doesn't compile llm/attention.cc (non-recursive glob) so the bert/ infrastructure isn't needed. The cudnn_fmha files require cudnn_frontend.h which isn't available in minimal builds. Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/508d31a5-92d6-4418-94ad-ac0ca6d78b68 Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>

Copilot · 2026-05-21T05:46:49Z

There is CI build failure for Build Linux TensorRT CUDA Minimal x64 Release

python3 tools/ci_build/...

Fixed in 8b79ac7. The issue was that the else() block also triggered for CUDA minimal builds (where cudnn_frontend.h isn't available). Changed to elseif(onnxruntime_DISABLE_CONTRIB_OPS AND NOT onnxruntime_CUDA_MINIMAL) — since CUDA minimal uses a non-recursive glob that doesn't include llm/attention.cc, the bert/ infrastructure isn't needed there.

Initial plan

fabb553

Copilot AI assigned Copilot and tianleiwu May 19, 2026

Copilot started work on behalf of tianleiwu May 19, 2026 03:52 View session

Copilot AI linked an issue May 19, 2026 that may be closed by this pull request

[Build] Cannot build onnxruntime with CUDA enabled and contrib ops disabled #28537

Open

Copilot AI changed the title ~~[WIP] Fix onnxruntime build with CUDA enabled and contrib ops disabled~~ Fix CUDA build with contrib ops disabled May 19, 2026

Copilot finished work on behalf of tianleiwu May 19, 2026 03:59

Copilot AI requested a review from tianleiwu May 19, 2026 03:59

github-actions Bot reviewed May 19, 2026

View reviewed changes

Comment thread onnxruntime/core/providers/cuda/cuda_execution_provider.cc

Comment thread onnxruntime/core/providers/cuda/cuda_execution_provider.cc

Apply suggestions from code review

875f5b1

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

tianleiwu reviewed May 20, 2026

View reviewed changes

Comment thread cmake/onnxruntime_providers_cuda.cmake Outdated

tianleiwu requested changes May 20, 2026

View reviewed changes

Copilot started work on behalf of tianleiwu May 20, 2026 23:53 View session

Copilot AI requested a review from tianleiwu May 21, 2026 00:15

Copilot finished work on behalf of tianleiwu May 21, 2026 00:15

tianleiwu reviewed May 21, 2026

View reviewed changes

Comment thread onnxruntime/core/providers/cuda/cuda_execution_provider.cc Outdated

tianleiwu requested changes May 21, 2026

View reviewed changes

Copilot started work on behalf of tianleiwu May 21, 2026 00:18 View session

Copilot finished work on behalf of tianleiwu May 21, 2026 00:21

Copilot AI requested a review from tianleiwu May 21, 2026 00:21

tianleiwu requested changes May 21, 2026

View reviewed changes

Copilot started work on behalf of tianleiwu May 21, 2026 05:43 View session

Copilot finished work on behalf of tianleiwu May 21, 2026 05:47

Copilot AI requested a review from tianleiwu May 21, 2026 05:47

tianleiwu approved these changes May 21, 2026

View reviewed changes

tianleiwu marked this pull request as ready for review May 21, 2026 05:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CUDA build with contrib ops disabled#28554

Fix CUDA build with contrib ops disabled#28554
Copilot wants to merge 6 commits into
mainfrom
copilot/fix-onnxruntime-build-cuda

Copilot AI commented May 19, 2026 •

edited

Loading

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tianleiwu left a comment •

edited

Loading

Uh oh!

Copilot AI commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tianleiwu left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented May 19, 2026 •

edited

Loading

tianleiwu left a comment •

edited

Loading