Skip to content

Support GPU Passthrough to VMs #106

Description

@MalteJ

Summary

This work item enables FeOS to attach one or more physical GPUs directly to a FeOS-managed Virtual Machine (VM) using PCIe passthrough.

This functionality is critical for supporting GPU-accelerated workloads such as Artificial Intelligence (AI), Machine Learning (ML), scientific computing, and high-performance graphics within VMs. The implementation will extend the VM API to allow specifying GPUs by their host PCIe address.


Scope

✅ In Scope

  • Extend the FeOS VM API to allow specifying one or more GPUs via their host PCIe address for attachment to a VM.
  • Implement the backend logic for PCIe passthrough of a complete physical GPU (e.g., using IOMMU / vfio-pci).
  • Ensure the guest VM can recognize the attached GPU and that appropriate vendor drivers (e.g., NVIDIA, AMD) can be installed and utilized.
  • Support for passing through multiple GPUs to a single VM.

❌ Out of Scope

  • GPU virtualization technologies like NVIDIA vGPU or AMD MxGPU (SR-IOV). This issue focuses exclusively on full device passthrough.
  • Live migration of VMs with attached GPUs.
  • Dynamic hot-plugging of GPUs. GPUs must be attached when the VM is created or started.
  • Host-side GPU driver installation and configuration. This issue assumes the host is correctly prepared for passthrough.

Responsible Areas

  • FeOS VM Management
  • FeOS API

Contributors


Acceptance Criteria

  • API

    • The VM API is extended to accept a list of PCIe addresses for GPUs in the VM specification.
    • The API performs validation to ensure the specified PCIe devices exist and are available for passthrough.
  • VM Runtime & Guest OS

    • A VM can be successfully launched with one or more GPUs passed through to it.
    • The guest operating system correctly identifies the hardware of the passed-through GPU(s) (e.g., visible in lspci).
    • Vendor-specific drivers (e.g., NVIDIA driver) can be installed successfully inside the guest OS.
    • A GPU-accelerated application or utility (e.g., nvidia-smi, a CUDA/OpenCL sample) runs successfully within the VM and can access the GPU's capabilities.
    • The FeOS host correctly isolates the device, preventing host-level drivers from claiming it while it is assigned to a VM.

Action Items

  • Design the API extension in the VM model for specifying GPU devices.
  • Implement the backend logic to configure the hypervisor for GPU passthrough (e.g., managing IOMMU groups, binding to vfio-pci).
  • Ensure that all functions of a GPU (e.g., graphics and audio components on the same PCIe card) are passed through together.
  • Add robust validation and error handling for cases where a GPU is unavailable or passthrough fails.
  • Create integration tests that:
    • Launch a VM with a single GPU and verify its functionality in the guest.
    • Launch a VM with multiple GPUs and verify their functionality in the guest.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions