Skip to content

[BUG] NULL pointer dereference in xe_display_flush_cleanup_work during runtime suspend (Alder Lake + Arc A370M) #82

@ExtremeLiquidCxr

Description

@ExtremeLiquidCxr

System Environment:

OS: Arch Linux (UKI boot, strict lockdown/security policies enabled)

Kernel: 7.0.3-arch1-2 (PREEMPT_DYNAMIC)

CPU/iGPU: Intel Core i7-1260P (Alder Lake-P) | ID: 46a6

dGPU: Intel Arc A370M (DG2) | ID: 5693

Issue Description:
The xe driver encounters a fatal NULL pointer dereference (address: 00000000000005d8) when the power management subsystem attempts to put the display into a runtime suspend state.

The crash occurs specifically during the memory cleanup phase within xe_display_flush_cleanup_work. It appears the driver fails to pass or retain a valid pointer to the display data structure before entering sleep mode. The processor correctly catches the invalid memory access and triggers a kernel panic to prevent data corruption.

Steps to Reproduce:

Boot the system with parameters to strictly bind both GPUs to xe:
xe.force_probe=46a6,5693 i915.force_probe=!46a6,!5693

Allow the system to initialize the DRM devices (both devices probe and initialize successfully).

Wait for the PM subsystem to trigger a runtime suspend (e.g., leaving the system idle shortly after boot).

System triggers a Kernel Panic.

Key Technical Observations:

Targeted Failure: The crash is strictly isolated to the xe driver's display power management logic. Other PCI subsystems (such as Intel AX211 Wi-Fi and Bluetooth) survive the graphical crash and remain fully initialized and functional in the background.

Consistency: The behavior is persistent and identical across multiple kernels (tested from 6.17.x up to 7.0.3), confirming a core logic error in the driver rather than a transient kernel bug.

Hardware State: The system is otherwise completely stable. Thermal states are normal (no fan spin-up or throttling), and security modules (AppArmor, Lockdown) do not interfere with the driver initialization prior to the suspend trigger.

Relevant Call Trace:
Plaintext

BUG: kernel NULL pointer dereference, address: 00000000000005d8
...
xe_display_flush_cleanup_work+0x96/0x140 [xe]
xe_display_pm_runtime_suspend+0x4b/0x90 [xe]
xe_pm_runtime_suspend+0x147/0x300 [xe]
xe_pci_runtime_suspend+0x2a/0xe0 [xe]
pci_pm_runtime_suspend+0x78/0x210

Full dmesg log attached below.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions