Skip to content

add support for injecting /dev/dri* devices nodes for gfx MIGs#1818

Merged
cdesiniotis merged 1 commit into
mainfrom
support-mig-gfx
May 15, 2026
Merged

add support for injecting /dev/dri* devices nodes for gfx MIGs#1818
cdesiniotis merged 1 commit into
mainfrom
support-mig-gfx

Conversation

@tariq1890
Copy link
Copy Markdown
Contributor

@tariq1890 tariq1890 commented May 12, 2026

Currently, the container toolkit skips adding the /dev/dri device nodes to a device if it's found to be a MIG device. Today, with the availability of GPUs like RTX Pro 6000D Blackwell, MIG slices with graphics capabilities can be created.

In this PR, we update the NVSandboxutils and NVML discoverers to check if the detected MIG device supports graphics and inject the /dev/dri device nodes if true.

@coveralls
Copy link
Copy Markdown

coveralls commented May 12, 2026

Coverage Report for CI Build 25753963182

Coverage decreased (-0.05%) to 43.296%

Details

  • Coverage decreased (-0.05%) from the base build.
  • Patch coverage: 34 uncovered changes across 3 files (17 of 51 lines covered, 33.33%).
  • 21 coverage regressions across 1 file.

Uncovered Changes

File Changed Covered %
internal/platform-support/dgpu/nvml.go 41 11 26.83%
internal/platform-support/dgpu/dgpu.go 5 3 60.0%
internal/platform-support/dgpu/nvsandboxutils.go 5 3 60.0%

Coverage Regressions

21 previously-covered lines in 1 file lost coverage.

File Lines Losing Coverage Coverage
cmd/nvidia-cdi-hook/cudacompat/cuda-elf-header.go 21 55.22%

Coverage Stats

Coverage Status
Relevant Lines: 14902
Covered Lines: 6452
Line Coverage: 43.3%
Coverage Strength: 0.48 hits per line

💛 - Coveralls

@tariq1890 tariq1890 requested review from cdesiniotis and elezar May 12, 2026 02:52
Copy link
Copy Markdown
Member

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable @tariq1890. Could we link the related go-nvlib PRs that will be required for this in the description?

One other thing to note. This handles the injection of these devices when nvidiasandbox-utils is used, but not when pure nvml is used. Is this a concern?

@tariq1890
Copy link
Copy Markdown
Contributor Author

Thanks for the review @elezar . There are no go-nvlib or go-nvml changes required for this PR. These toolkit changes just work as they are.

Regarding the NVML discoverer, I thought the changes would just work looking at the code. I can try testing that with the nvsandboxutils discoverer disabled

@tariq1890
Copy link
Copy Markdown
Contributor Author

I stand corrected :). Changes are indeed needed in the NVML MIG discoverer. I'll make those changes

Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
@tariq1890
Copy link
Copy Markdown
Contributor Author

tariq1890 commented May 12, 2026

This handles the injection of these devices when nvidiasandbox-utils is used, but not when pure nvml is used. Is this a concern?

This is now addressed. I was able to confirm through local testing that the pure NVML discoverer will inject the /dev/dri* device nodes if the MIG slices support graphics.

@tariq1890 tariq1890 self-assigned this May 13, 2026
@cdesiniotis cdesiniotis added this to the next-minor milestone May 13, 2026
@cdesiniotis cdesiniotis modified the milestones: next-minor, v1.19.1 May 15, 2026
@cdesiniotis
Copy link
Copy Markdown
Contributor

/cherry-pick release-1.19

@cdesiniotis cdesiniotis merged commit 7be2f73 into main May 15, 2026
20 checks passed
@cdesiniotis cdesiniotis deleted the support-mig-gfx branch May 15, 2026 19:08
@github-actions
Copy link
Copy Markdown

🤖 Backport PR created for release-1.19: #1831

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants