drm/amdgpu: break driver init process when it's bad GPU(v5)

When retrieving bad gpu tag from eeprom, GPU init should fail as the GPU needs to be retired for further check. v2: Fix spelling typo, correct the condition to detect bad gpu tag and refine error message. v3: Refine function argument name. v4: Fix missing check of returning value of i2c initialization error case. v5: Use dev_err to print PCI information in dmesg instead of DRM_ERROR. Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
author: Guchun Chen <guchun.chen@amd.com> 2020-07-23 15:42:19 +0800
committer: Alex Deucher <alexander.deucher@amd.com> 2020-08-04 17:26:31 -0400
commit: b82e65a93510465cb4c203c938245f137a4e95dc (patch)
tree: 82187ca76db2ab385bf2369fe162b4c3f020be59 /drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
parent: 1d6a9d122d2c7817ad1bae0e59c8a29450c2b14d (diff)
1 files changed, 9 insertions, 3 deletions
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 40caa7437ce2..905c5ab486a1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2055,13 +2055,19 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
 	 * it should be called after amdgpu_device_ip_hw_init_phase2  since
 	 * for some ASICs the RAS EEPROM code relies on SMU fully functioning
 	 * for I2C communication which only true at this point.
-	 * recovery_init may fail, but it can free all resources allocated by
-	 * itself and its failure should not stop amdgpu init process.
+	 *
+	 * amdgpu_ras_recovery_init may fail, but the upper only cares the
+	 * failure from bad gpu situation and stop amdgpu init process
+	 * accordingly. For other failed cases, it will still release all
+	 * the resource and print error message, rather than returning one
+	 * negative value to upper level.
 	 *
 	 * Note: theoretically, this should be called before all vram allocations
 	 * to protect retired page from abusing
 	 */
-	amdgpu_ras_recovery_init(adev);
+	r = amdgpu_ras_recovery_init(adev);
+	if (r)
+		goto init_failed;
 
 	if (adev->gmc.xgmi.num_physical_nodes > 1)
 		amdgpu_xgmi_add_device(adev);
author	Guchun Chen <guchun.chen@amd.com>	2020-07-23 15:42:19 +0800
committer	Alex Deucher <alexander.deucher@amd.com>	2020-08-04 17:26:31 -0400
commit	b82e65a93510465cb4c203c938245f137a4e95dc (patch)
tree	82187ca76db2ab385bf2369fe162b4c3f020be59 /drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
parent	1d6a9d122d2c7817ad1bae0e59c8a29450c2b14d (diff)