Skip to content

d3d12: dump DRED breadcrumbs when atlas creation fails with device-removed#217

Open
dfattal wants to merge 1 commit into
mainfrom
feat/d3d12-dred-breadcrumbs
Open

d3d12: dump DRED breadcrumbs when atlas creation fails with device-removed#217
dfattal wants to merge 1 commit into
mainfrom
feat/d3d12-dred-breadcrumbs

Conversation

@dfattal
Copy link
Copy Markdown
Collaborator

@dfattal dfattal commented May 11, 2026

Summary

Adds an `ID3D12DeviceRemovedExtendedData` query at the `create_atlas_texture` device-removed branch. When DRED is enabled at device creation, the runtime now logs the last GPU commands recorded by the driver before the device went away — useful for diagnosing the next device-removed crash without having to instrument from scratch.

Background

During the #216 / displayxr-unity#82 investigation, I needed to know which GPU command was removing the device. The standard approach is DRED auto-breadcrumbs, but they're only useful if (a) DRED was enabled before `D3D12CreateDevice` and (b) something on the host side actually queries them when a device-removed surfaces. (a) is on the app/system side; (b) is on us. This PR is (b).

The actual #216 / #82 bug ended up being a plugin-side format mismatch in `CopyTextureRegion`, which DRED couldn't see directly (the driver rejected the command before queuing it as a breadcrumb-tracked op). But the breadcrumb hook is generally useful, and `create_atlas_texture` is the first D3D12 API call after each frame's queue work — so it's the natural place for the runtime to first detect a device-removed and report context.

What's in the PR

  • Adds `static void log_dred_state(ID3D12Device *, const char *context)` to `comp_d3d12_renderer.cpp`. Queries `GetAutoBreadcrumbsOutput` + `GetPageFaultAllocationOutput`, logs the last ~16 GPU commands around each node's last-completed marker.
  • Calls it from `create_atlas_texture`'s `DXGI_ERROR_DEVICE_REMOVED` branch.
  • When DRED isn't enabled, the query returns `DXGI_ERROR_NOT_FOUND` and we log how to turn it on rather than erroring.

How to enable DRED for a given app

Documented in the helper's comment. Three options:

  1. Per-process (recommended for an app under active investigation):
    ```cpp
    ID3D12DeviceRemovedExtendedDataSettings *dred_settings = nullptr;
    D3D12GetDebugInterface(IID_PPV_ARGS(&dred_settings));
    dred_settings->SetAutoBreadcrumbsEnablement(D3D12_DRED_ENABLEMENT_FORCED_ON);
    dred_settings->SetPageFaultEnablement(D3D12_DRED_ENABLEMENT_FORCED_ON);
    // ... then D3D12CreateDevice()
    ```
  2. Per-app (registry, no code changes):
    `HKLM\SOFTWARE\Microsoft\Direct3D\AppCompat<exe>.exe` with REG_DWORD `AutoBreadcrumbsEnablement = 1` and `PageFaultEnablement = 1`.
  3. Globally: Windows Development Build or the DirectX Control Panel.

Test plan

  • Builds cleanly on Windows (`scripts/build_windows.bat build`).
  • No regression to opaque or transparent rendering paths in casual smoke test (cube_handle_d3d12_win still runs).
  • Future: ideally a runtime unit test that intentionally provokes device-removed (out-of-bounds dispatch?) and asserts that breadcrumb output is logged. Not in scope here.

🤖 Generated with Claude Code

…moved

Add a log_dred_state() helper that queries
ID3D12DeviceRemovedExtendedData for auto-breadcrumbs +
page-fault info and dumps the last ~16 GPU commands around
the failure point. Wire it into create_atlas_texture()'s
DXGI_ERROR_DEVICE_REMOVED branch — this is the first place
the runtime typically detects a device-removed state, since
CreateCommittedResource is the next D3D12 API call after
each frame's queue work.

DRED itself has to be enabled before device creation (we
don't own that — Unity / the app creates the device).
Helper documents the three enablement paths in a comment so
users debugging a device-removed know how to turn DRED on.

When DRED isn't enabled, QueryInterface returns
DXGI_ERROR_NOT_FOUND and we log how to enable it but don't
error.

Was used during DisplayXR/displayxr-unity#82 / #216
investigation; the actual bug ended up being plugin-side
(format mismatch in CopyTextureRegion under DComp
swapchain) but the breadcrumb hook stays as defensive
infrastructure for the next device-removed.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant