drm/amdgpu: handle IH ring1 overflow

IH ring1 is used to process GPU retry fault, overflow is enabled to drain retry fault because we want receive other interrupts while handling retry fault to recover range. There is no overflow flag set when wptr pass rptr. Use timestamp of rptr and wptr to handle overflow and drain retry fault. If fault timestamp goes backward, the fault is filtered and should not be processed. Drain fault is finished if processed_timestamp is equal to or larger than checkpoint timestamp. Add amdgpu_ih_functions interface decode_iv_ts for different chips to get timestamp from IV entry with different iv size and timestamp offset. amdgpu_ih_decode_iv_ts_helper is used for vega10, vega20, navi10. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
author: Philip Yang <Philip.Yang@amd.com> 2021-11-18 15:24:55 -0500
committer: Alex Deucher <alexander.deucher@amd.com> 2021-12-01 16:03:34 -0500
commit: 3c2d6ea27955cfac8590884d207353eece8c2cee (patch)
tree: 63a0a9fe1f8a640c6299a418a0a250e01046fe6b /drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
parent: 232d1d43b522b64266a16606e918ce92a8a0b244 (diff)
1 files changed, 7 insertions, 1 deletions
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index 08478fce00f2..2430d6223c2d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -350,6 +350,7 @@ static inline uint64_t amdgpu_gmc_fault_key(uint64_t addr, uint16_t pasid)
  * amdgpu_gmc_filter_faults - filter VM faults
  *
  * @adev: amdgpu device structure
+ * @ih: interrupt ring that the fault received from
  * @addr: address of the VM fault
  * @pasid: PASID of the process causing the fault
  * @timestamp: timestamp of the fault
@@ -358,7 +359,8 @@ static inline uint64_t amdgpu_gmc_fault_key(uint64_t addr, uint16_t pasid)
  * True if the fault was filtered and should not be processed further.
  * False if the fault is a new one and needs to be handled.
  */
-bool amdgpu_gmc_filter_faults(struct amdgpu_device *adev, uint64_t addr,
+bool amdgpu_gmc_filter_faults(struct amdgpu_device *adev,
+			      struct amdgpu_ih_ring *ih, uint64_t addr,
 			      uint16_t pasid, uint64_t timestamp)
 {
 	struct amdgpu_gmc *gmc = &adev->gmc;
@@ -366,6 +368,10 @@ bool amdgpu_gmc_filter_faults(struct amdgpu_device *adev, uint64_t addr,
 	struct amdgpu_gmc_fault *fault;
 	uint32_t hash;
 
+	/* Stale retry fault if timestamp goes backward */
+	if (amdgpu_ih_ts_after(timestamp, ih->processed_timestamp))
+		return true;
+
 	/* If we don't have space left in the ring buffer return immediately */
 	stamp = max(timestamp, AMDGPU_GMC_FAULT_TIMEOUT + 1) -
 		AMDGPU_GMC_FAULT_TIMEOUT;
author	Philip Yang <Philip.Yang@amd.com>	2021-11-18 15:24:55 -0500
committer	Alex Deucher <alexander.deucher@amd.com>	2021-12-01 16:03:34 -0500
commit	3c2d6ea27955cfac8590884d207353eece8c2cee (patch)
tree	63a0a9fe1f8a640c6299a418a0a250e01046fe6b /drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
parent	232d1d43b522b64266a16606e918ce92a8a0b244 (diff)