Age | Commit message (Collapse) | Author |
|
In 4d9396e67930 we have started storing the opts with which the spin was
created as part of igt_spin_t. The ahnd stored as part of igt_spin_t is
therefore redundant. We can get ahnd from opts.ahnd.
Cc: Zbigniew Kempczynski <zbigniew.kempczynski@intel.com>
Cc: Jasmine Newsome <jasmine.newsome@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
|
|
Just like the commit 74fc362b425c(i915/gem_busy: Prevent context
ban with right ctx id), some codes are using the constant ctx id 0
passed into igt_allow_hang(), it may cause test failures.
This patch is to correct them with right ctx id for the below
tests:
tests/i915/prime_busy
tests/i915/gem_ctx_persistence
tests/i915/gem_exec_schedule
tests/i915/gem_wait
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
|
|
The gem_ctx_persistence test in general has support for twiddling the
scheduling parameters via sysfs to tune timeouts. For some reason,
this was not being applied to the saturated_hostile test. The test was
also broken for platforms with dependent engine resets.
The test submits requests to all engines, kills one and expects the
rest to survive. However, the other engine requests were all marked as
not pre-emptible. On recent platforms, there is a reset dependency
across RCS and CCS engines. That is, if one of those engines is reset
then all engines must be reset. If a context executing on one of those
engines does not pre-empt first then it will be killed.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Priyanka Dandamudi <priyanka.dandamudi@intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: Arjun Melkaveri <arjun.melkaveri@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
|
|
For newer gens we're not able to rely on relocations. Change mostly
touches spinners creation where allocator handle is now mandatory variable
on gens where relocations are disabled.
Signed-off-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
Cc: Petri Latvala <petri.latvala@intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
|
|
The previous gem_has_cmdparser helper took an engine and did nothing
with it. We delete the engine parameter and use the general helper for
the ALL_ENGINES cases. For cases where we really do care about
something more precise, we add a version which takes an intel_ctx_cfg_t
and an engine specifier and is able to say whether or not that
particular engine has the command parser enabled.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: Lakshminarayana Vudum <lakshminarayana.vudum@intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
|
We need this for proper cmdparser detection
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
|
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
|
|
We're going to start disallowing non-trivial uses of setparam for
engines precisely to make races like this impossible. It'll also make
these test cases invalid.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
|
|
The entire CONTEXT_CLONE_* API is being removed from upstream i915.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
|
|
Just reset the spinner once before launching and killing many
non-persistent contexts.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
|
|
Check that closing many contexts does not cause a ban.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
|
|
Improve the differentiation between the legacy ring selector ABI and the
more recent engine selection API.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
|
|
The timeout for a context to be killed / banned with GuC submission is a
bit noisier than with execlist submission so reset the timeout to the
default value on each engine in the hostile / hang sections rather than
using the timeout from the previous engine.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Fixed engine-hang subtest to set context with available engines
Signed-off-by: Rahul Kumar Singh <rahul.kumar.singh@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Verify that the wait for userptr cleanup is after we have cancelled the
non-persistent hanging context.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
|
|
For the legacy execbuf engine selection, we have to be careful in
handling vcs if there is more than one engine, and specify which one we
actually want.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
|
We expose the heartbeat interval on each engine, allowing the sysadmin to
disable them if they prefer avoiding any interruption for their GPU
tasks. A caveat to allowing the contexts to run without checks is that
we require such contexts to be non-persistent and so cleaned up on
closure (including abnormal process termination). However, we also need
to flush any persistent contexts that are still inflight at that time,
lest they continue to run unchecked.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
|
The debugfs modparams are more picky and refuse to do the implicit
unsigned conversion.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
|
|
We need each test in an isolated context, so that bad results from one
test do not interfere with the next. In particular, we want to clean up
the device and reset it to the defaults so that they are known for the
next test, and the test can focus on behaviour it wants to control.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
|
|
Add a small delay before we wait on the rcu barrier to allow slower
machines to flush the process tables first.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1528
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
|
|
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Petri Latvala <petri.latvala@intel.com>
|
|
We have generic helpers for sysfs access in igt_sysfs.c, but we also
have a number of module parameter access specific helpers scattered here
and there. Start gathering the latter into a file of its own.
For i915, the long-term goal is to migrate from module parameters to
device specific debugfs parameters. With all igt module param access
centralized in one place, we can make the transition much easier.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Petri Latvala <petri.latvala@intel.com>
|
|
Beware of using gem_ring_measure_inflight() as it takes a ring
identifier and not the engine, should you overwrite the defaults.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1848
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
|
|
igt_require_gem() is a pecularity of i915/, move it out of the core.
Similar opportunistic move of gem_reopen_driver() and
gem_quiescent_gpu().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Since dynamic subtests we run multiple subtests in one binary, we
encounter situations where a bug in one subtest percolates into the next
subtest. Explicitly cleanup before each test to disarm our own shotgun.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
|
|
Since we execute a few smokers in parallel, at worst we may have to wait
for all smokers to be reset before we ourselves are. We need to increase
our leniency for the smoketest and allow a longer timeout to accommodate
the parallelism.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
|
|
To set a property on an engine, we need to use its canonical name
(%class%instance).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
When we can control the preempt_timeout_ms property on an engine, we can
specify a much faster timeout and so expect our tests to run much
faster.
Then we can also avoid the embarrassment if the preempt reset is disabled
and the tests start failing because we are not waiting 10+s for the
hangcheck.
Closes: https://gitlab.freedesktop.org/drm/intel/issues/1440
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
The default preempt_timeout_ms is a shocking 640ms. To be resilient
against false positives, we should include an engineering safety factor
of about 2x into our fail criteria, so that we only cry foul when we are
truly unresponsive.
Closes: https://gitlab.freedesktop.org/drm/intel/issues/679
Closes: https://gitlab.freedesktop.org/drm/intel/issues/570
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
|
|
Check that if we have to remove a hostile request from a non-persistent
context, we do so without harming any other concurrent users.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Since the library call of igt_spin_new() asserts if it spots an error,
we must protect it from the races we are imposing upon ourselves.
However, to keep those races active, delegate the potentially failing
calls to the children.
References: https://gitlab.freedesktop.org/drm/intel/issues/1241
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
|
|
Tvrtko spotted a race condition between replacing a set of hanging
engines and closing the context. So exercise it.
5s is not much time to hit the small window, but a little bit of testing
several times a day is better than nothing.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
As the kernel loses track of the context's old engines, if we request
that the context is non-persistent then any request on the untracked
engines must be cancelled.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Since gem_ctx_peristence requires and insists upon having working reset,
the test will not run on a system without. If a previous has clobbered
i915.reset, we need to restore the modparam for ourselves.
Closes: https://gitlab.freedesktop.org/drm/intel/issues/1099
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
|
|
If using a cmdparser, it may be intelligent enough to not execute the
invalid batch leading to an unwritten breadcrumb and
igt_spin_busywait_until_started() in an infinite loop.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
|
|
As we abuse hangcheck within some of the tests, we then need to make
sure we restore hangcheck on exit, in case we detect a failure and
abort.
Closes: https://gitlab.freedesktop.org/drm/intel/issues/1082
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
|
|
Converts all per-engine tests into dynamic subtests and in the process:
* Put back I915_EXEC_BSD legacy coverage.
* Remove one added static engine list usage.
* Compact code by driving two groups of the name/func table.
v2:
* Convert smoketest to proper all engines.
v3:
* Undo subgroup mistake. (Chris)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Bommu Krishnaiah <krishnaiah.bommu@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Update the context with supported engines on the platform with set_property
I915_CONTEXT_PARAM_ENGINES to make sure the work load is submitted to
the available engines only.
Signed-off-by: Bommu Krishnaiah <krishnaiah.bommu@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
[ickle: fix the flailing around[
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Just in case the user submits an invalid batch, check we can clean up
afterwards.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
|
|
Some spinners are used with the intent of never ending and being
declared hung by the kernel. In some cases, these are being used to
simulate invalid payloads and so we can use an invalid command to
trigger a GPU hang. (Other cases, they are simulating infinite workloads
that truly never end, but we still need to be able to curtail to provide
multi-tasking). This patch adds IGT_SPIN_INVALID_CS to request the
injection of 0xdeadbeef into the command stream that should trigger a
GPU hang.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
|
|
gem_ctx_isolation/persistence"
This reverts commit 343aae776a58a67fa153825385e6fe90e3185c5b.
__for_each_physical_engine() reprograms the context, invalidating the
use of e->flags to select engines, necessitating e->index instead.
Without also fixing up the engine selection, the result is that random
engines were being used to read registers from the intended engine.
This does not end well.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Andi Shyti <andi.shyti@intel.com>
|
|
Align with gem_exec_basic and other tests using the newer
engine query interface into i915 to enumerate active engines.
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
As the default preempt-reset timeout has been increased from 100ms to
640ms, we need a corresponding increase in our own timeout so that we
allow enough time for the preempt-reset to occur and close the hung
contexts.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112401
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
|
|
Fixes: 3fa72891269b ("i915/gem_ctx_persistence: Double the fput hammer!")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
|
|
Deferred rcu work is tricky to pin down and encourage to run, so try
again...
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112277
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
|
|
casting unsigned char pointer to int pointer causes gcc to be unhappy with
comment: "warning: dereferencing type-punned pointer will break
strict-aliasing rules"
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Petri Latvala <petri.latvala@intel.com>
|
|
After any process termination, use an rcu-barrier to be sure that any
deferred struct file cleanup has been performed. By being consistent in
our paranoia here means that we can rule out more false positives and so
focus on what remains.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Andi Shyti <andi.shyti@intel.com>
|
|
It seems the first rcu barrier may race with the addition of the file to
the rcu task list; so wait again.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Andi Shyti <andi.shyti@intel.com>
|
|
And initialise fence to -1 to avoid closing stdin (fd:0)!
The delayed fput is first queued with schedule (task_work) before being
rcu freed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Andi Shyti <andi.shyti@intel.com>
|
|
Sanity test existing persistence and new exciting non-persistent context
behaviour.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
|