Skip to content

Fix CUDA build with contrib ops disabled#28554

Open
Copilot wants to merge 6 commits into
mainfrom
copilot/fix-onnxruntime-build-cuda
Open

Fix CUDA build with contrib ops disabled#28554
Copilot wants to merge 6 commits into
mainfrom
copilot/fix-onnxruntime-build-cuda

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 19, 2026

Description

The CUDA Attention kernel (core/providers/cuda/llm/attention.cc) depends on contrib_ops internals (flash attention, memory efficient attention, unfused attention helpers) but was compiled unconditionally. When building with --disable_contrib_ops, GetAttentionKernelOptions() is unavailable (guarded by #ifndef DISABLE_CONTRIB_OPS in cuda_kernel.h), causing a compile error.

Changes:

  • cmake/onnxruntime_providers_cuda.cmake — When contrib ops are disabled (and not in CUDA minimal mode), include the contrib_ops/cuda/bert/ attention infrastructure files (flash attention, memory efficient attention, unfused attention helpers, etc.) so the ONNX domain Attention kernel can compile and link. Uses elseif(onnxruntime_DISABLE_CONTRIB_OPS AND NOT onnxruntime_CUDA_MINIMAL) to avoid including these files in CUDA minimal builds where llm/attention.cc isn't compiled and cudnn_frontend.h isn't available.
  • onnxruntime/core/providers/cuda/cuda_execution_provider.h — Remove #ifndef DISABLE_CONTRIB_OPS guards from the AttentionKernelOptions include, GetAttentionKernelOptions() method, and attention_kernel_options_ member variable
  • onnxruntime/core/providers/cuda/cuda_kernel.h — Remove #ifndef DISABLE_CONTRIB_OPS guard from GetAttentionKernelOptions()

The CUDA Attention kernel and its underlying attention backends (flash, memory efficient, unfused) are now always available in full CUDA builds regardless of whether contrib ops are enabled. No changes are needed in cuda_execution_provider.cc since the Attention kernel registrations remain unconditional.

Motivation and Context

Building onnxruntime with CUDA enabled and --disable_contrib_ops fails:

error C2039: 'GetAttentionKernelOptions': is not a member of 'onnxruntime::cuda::Attention<float>'

This is a valid build configuration (useful for reducing compile time) that should be supported. Rather than excluding the CUDA Attention kernel when contrib ops are disabled, the necessary attention infrastructure from contrib_ops/cuda/bert/ is included in the build so the ONNX domain Attention op retains full CUDA acceleration. The fix is scoped to non-minimal CUDA builds only, since CUDA minimal builds use a non-recursive glob that doesn't include llm/attention.cc and don't have cudnn_frontend available.

The CUDA Attention kernel implementation (core/providers/cuda/llm/attention.cc)
depends on contrib ops (flash attention, memory efficient attention, unfused
attention helpers from contrib_ops/cuda/bert/). When DISABLE_CONTRIB_OPS is
defined, these dependencies are unavailable causing compilation failures.

Fix by:
1. Excluding attention.h/attention.cc from the CUDA provider build when
   contrib ops are disabled (cmake change).
2. Guarding the Attention kernel class declarations and registrations in
   cuda_execution_provider.cc with #ifndef DISABLE_CONTRIB_OPS.

The CPU EP still provides the standard ONNX domain Attention kernel as
fallback when the CUDA implementation is unavailable.

Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/4bbef367-4e58-49e5-9bca-8d5a2c8ee872

Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix onnxruntime build with CUDA enabled and contrib ops disabled Fix CUDA build with contrib ops disabled May 19, 2026
Copilot AI requested a review from tianleiwu May 19, 2026 03:59
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

Comment thread onnxruntime/core/providers/cuda/cuda_execution_provider.cc
Comment thread onnxruntime/core/providers/cuda/cuda_execution_provider.cc
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Comment thread cmake/onnxruntime_providers_cuda.cmake Outdated
Instead of removing the CUDA Attention kernel when contrib ops are
disabled, include the necessary attention infrastructure files from
contrib_ops/cuda/bert/ (flash attention, memory efficient attention,
unfused attention helpers, etc.) so the ONNX Attention kernel can
compile and link.

Also removes #ifndef DISABLE_CONTRIB_OPS guards from
GetAttentionKernelOptions() in cuda_kernel.h and
cuda_execution_provider.h since AttentionKernelOptions is now always
available.

Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/4e3d0a4a-58f7-4e16-81d1-54e4f2f282e9

Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
Copilot AI requested a review from tianleiwu May 21, 2026 00:15
Comment thread onnxruntime/core/providers/cuda/cuda_execution_provider.cc Outdated
The #ifndef DISABLE_CONTRIB_OPS guards around Attention kernel
registrations are unnecessary since the necessary contrib_ops/cuda/bert/
files are now always included in the build.

Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/fac2d4d5-add5-4431-95f1-c4f38205994e

Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
Copilot AI requested a review from tianleiwu May 21, 2026 00:21
Copy link
Copy Markdown
Contributor

@tianleiwu tianleiwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is CI build failure for Build Linux TensorRT CUDA Minimal x64 Release

@copilot, please update ‎cmake/onnxruntime_providers_cuda.cmake to handle cuda minimal build nicely.

…and not CUDA minimal

The CUDA minimal build doesn't compile llm/attention.cc (non-recursive glob)
so the bert/ infrastructure isn't needed. The cudnn_fmha files require
cudnn_frontend.h which isn't available in minimal builds.

Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/508d31a5-92d6-4418-94ad-ac0ca6d78b68

Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 21, 2026

There is CI build failure for Build Linux TensorRT CUDA Minimal x64 Release

python3 tools/ci_build/...

Fixed in 8b79ac7. The issue was that the else() block also triggered for CUDA minimal builds (where cudnn_frontend.h isn't available). Changed to elseif(onnxruntime_DISABLE_CONTRIB_OPS AND NOT onnxruntime_CUDA_MINIMAL) — since CUDA minimal uses a non-recursive glob that doesn't include llm/attention.cc, the bert/ infrastructure isn't needed there.

Copilot AI requested a review from tianleiwu May 21, 2026 05:47
@tianleiwu tianleiwu marked this pull request as ready for review May 21, 2026 05:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Build] Cannot build onnxruntime with CUDA enabled and contrib ops disabled

2 participants