Skip to content

Commit a45d635

Browse files
guoqzhangregkh
authored andcommitted
drm/amdgpu: fix gpu page fault after hibernation on PF passthrough
[ Upstream commit eb6e7f5 ] On PF passthrough environment, after hibernate and then resume, coralgemm will cause gpu page fault. Mode1 reset happens during hibernate, but partition mode is not restored on resume, register mmCP_HYP_XCP_CTL and mmCP_PSP_XCP_CTL is not right after resume. When CP access the MQD BO, wrong stride size is used, this will cause out of bound access on the MQD BO, resulting page fault. The fix is to ensure gfx_v9_4_3_switch_compute_partition() is called when resume from a hibernation. KFD resume is called separately during a reset recovery or resume from suspend sequence. Hence it's not required to be called as part of partition switch. Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5d1b32c) Signed-off-by: Sasha Levin <sashal@kernel.org>
1 parent 2e62822 commit a45d635

2 files changed

Lines changed: 5 additions & 2 deletions

File tree

drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -555,7 +555,8 @@ static int aqua_vanjaram_switch_partition_mode(struct amdgpu_xcp_mgr *xcp_mgr,
555555
return -EINVAL;
556556
}
557557

558-
if (adev->kfd.init_complete && !amdgpu_in_reset(adev))
558+
if (adev->kfd.init_complete && !amdgpu_in_reset(adev) &&
559+
!adev->in_suspend)
559560
flags |= AMDGPU_XCP_OPS_KFD;
560561

561562
if (flags & AMDGPU_XCP_OPS_KFD) {

drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2297,7 +2297,9 @@ static int gfx_v9_4_3_cp_resume(struct amdgpu_device *adev)
22972297
r = amdgpu_xcp_init(adev->xcp_mgr, num_xcp, mode);
22982298

22992299
} else {
2300-
if (amdgpu_xcp_query_partition_mode(adev->xcp_mgr,
2300+
if (adev->in_suspend)
2301+
amdgpu_xcp_restore_partition_mode(adev->xcp_mgr);
2302+
else if (amdgpu_xcp_query_partition_mode(adev->xcp_mgr,
23012303
AMDGPU_XCP_FL_NONE) ==
23022304
AMDGPU_UNKNOWN_COMPUTE_PARTITION_MODE)
23032305
r = amdgpu_xcp_switch_partition_mode(

0 commit comments

Comments
 (0)