Age | Commit message (Collapse) | Author |
|
As Petri noticed there's signal delivery race which might lead to
situation when two children send SIGALRM to master process not in
the same time. Signal from first child just awake master process
which reinstalls previous signal handler. Second signal is then
handled by default handler so we got process exit and failure.
Lets move signal reinstall to moment when children will just exit.
Allocator warning which notices there's existing ipc queue is
then an effect, not the reason. Unfortunately multiprocess start/stop
were called in the test, not in the fixture, so test failure doesn't
call multiprocess stop at all (and we see there's dangling queue).
Deeper look to client() implementation allows to remove multiprocess
allocator dependency. Function runs in its own separated context
so we don't need to arbitrate and just call intel_allocator_init()
to become standalone allocator.
Fixes: https://gitlab.freedesktop.org/drm/intel/-/issues/4055
Signed-off-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
Cc: Petri Latvala <petri.latvala@intel.com>
Reviewed-by: Petri Latvala <petri.latvala@intel.com>
|
|
For newer gens we're not able to rely on relocations. Adopt to use
offsets acquired from the allocator.
Signed-off-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
Cc: Petri Latvala <petri.latvala@intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
|
|
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
|
|
The slow skl took 300ms just to print the ring/context state to dmesg
upon detecting a hang. It appears we need to allow even more leeway
before declaring the system broken.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
|
|
Since
commit 488dac0c9237647e9b8f788b6a342595bfa40bda
Author: Yicong Yang <yangyicong@hisilicon.com>
Date: Sat Nov 21 22:17:19 2020 -0800
libfs: fix error cast of negative value in simple_attr_write()
the kernel now rejects any negative values written to debugfs, rather
than casting them to u64. Since we are accustomed to having the -1 mean
U64_MAX, perform that conversion ourselves.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
|
|
If our CPU client is very slow to notice that the GPU spinner has
started, we may consume the full heartbeat interval without noticing.
This is bad if we are trying to test that a client that yield within the
heartbeat interval is not selected for termination.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1879
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
|
|
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Petri Latvala <petri.latvala@intel.com>
|
|
igt_require_gem() is a pecularity of i915/, move it out of the core.
Similar opportunistic move of gem_reopen_driver() and
gem_quiescent_gpu().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Be more lenient in waiting for resets as the CPU scheduler may be
pre-occupied.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Andi Shyti <andi.shyti@intel.com>
|
|
Be much more lenient in the face of scheduler woes that under duress
cannot give us the millisecond accuracy we require for distinguishing
good vs evil, every time.
In particular, we set the preempt timeout to greater than our 'good'
client so that when we do trigger the heartbeat check upon it, it can
always preempt in time to satisfy us.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Andi Shyti <andi.shyti@intel.com>
|
|
Make sure our request durations are precise, as the scheduler tries hard
to cause trouble.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Andi Shyti <andi.shyti@intel.com>
|
|
Refactor the set-attribute and reuse it for setting the
preempt_timeout_ms.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Andi Shyti <andi.shyti@intel.com>
|
|
To turn off preemption, we now require per-engine resets.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Andi Shyti <andi.shyti@intel.com>
|
|
We [will] expose various per-engine scheduling controls. One of which,
'heartbeat_duration_ms', defines how often we send a heartbeat down the
engine to check upon the health of the engine. If a heartbeat does not
complete within the interval (or two), the engine is declared hung.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
|