Age | Commit message (Collapse) | Author |
|
A missing newline at the end of an assert, so add it for clarity in the
error message.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Check we can reset the GPU before running the reset test.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Remaining tests that have been overlooked and don't need any
invasive changes to limit the skipping to only the relevant parts.
v2: [A rebase gone wrong]
v3: Move the misplaced hunk to the right patch (Antonio).
v4: Rebase, kms_content_protection is new.
v5: Rebase - need to adjust kms_lease.c too.
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> (v3)
Cc: Antonio Argenziano <antonio.argenziano@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Acked-By: Antonio Argenziano <antonio.argenziano@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
|
If KMS is not supported on the device, drmModeGetResources() will return
NULL, often this is an indication that we should not attempt to run the
test. Although it would be preferred to use something like
igt_require_display() as the canonical check and assert that
drmModeGetResources() did not hit an error, it is not always practical
as the tests do not utilize the common igt_display abstraction.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
|
|
Normalize the variance to stddev, and remove some redundant steps in
computing the time from itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Our observation is that the systematic error is proportional to the
number of iterations we perform; the suspicion is that it directly
correlates with the number of sleeps. Reduce the number of iterations,
to try and keep the error in check.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Avoid running subsequent subtests in non-default setup by restoring the
runtime PM config in one particular subtest.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
In order to make adding more options easier, expose the full set of
options to the caller.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Petri Latvala <petri.latvala@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
There is a chance new kernel or new firmware fixed the CPU0 hotplug hang
issue. Remove the skip to check if that's true.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Jani Saarinen <jani.saarinen@intel.com>
|
|
We may not idle immediately after a hang, and indeed may send a pulse
down the pipeline periodically to become idle. Rather than make a flimsy
assumption about how long we need to sleep before the system idles,
wait for the system to declare itself idle; flushing it to idle in the
process!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Antonio Argenziano <antonio.argenziano@intel.com>
|
|
Realtime scheduling interferes with execlists submission (tasklet) so try
to simplify the PWM loop in a few ways:
* Drop RT.
* Longer batches for smaller systematic error.
* More truthful test duration calculation.
* Less clock queries.
* No self-adjust - instead just report the achieved cycle and let the
parent check against it.
* Report absolute cycle error.
v2:
* Bring back self-adjust. (Chris Wilson)
(But slightly fixed version with no overflow.)
v3:
* Log average and mean calibration for each pass.
v4:
* Eliminate development leftovers.
* Fix variance logging.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Wrong file descriptor was passed to the iterator. This had currently no
effect, since it wasn't used in the macro, but needs to be fixed.
At the same time make the macro consistent by checking for engine presence
like the other iterators do.
Added __for_each_engine_class_instance which does not check for engine
presence and so is useful for enumerating all possible engines - like for
instance for subtest enumeration.
And another 'wrong fd used' fixlet in the render node subtests.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reported-by: Michel Thierry <michel.thierry@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
|
|
The test is whether with all but one engine busy we record the correct
load on each engine. If we only have one engine, this test degenerates
into all-idle/all-busy, so we can skip to avoid crashing on the
assumption that we have a busy spinner.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
More than one test assumes that the spinner is running pretty much
immediately after we have create or submitted it.
In actuality there is a variable delay, especially on execlists platforms,
between submission and spin batch starting to run on the hardware.
To enable tests which care about this level of timing to account for this,
we add a new spin batch constructor which provides an output field which
can be polled to determine when the batch actually started running.
This is implemented via MI_STOREDW_IMM from the spin batch, writing into
memory mapped page shared with userspace.
Using this facility from perf_pmu, where applicable, should improve very
occasional test fails across the set and platforms.
v2:
Chris Wilson:
* Use caching mapping if available.
* Handle old gens better.
* Use gem_can_store_dword.
* Cache exec obj array in spin_batch_t for easier resubmit.
v3:
* Forgot I915_EXEC_NO_RELOC. (Chris Wilson)
v4:
* Mask out all non-engine flags in gem_can_store_dword.
* Added some debug logging.
v5:
* Fix relocs and batch munmap. (Chris)
* Added assert idle spinner batch looks as expected.
v6:
* Skip accuracy tests when !gem_can_store_dword.
v7:
* Fix batch recursion reloc address.
v8:
Chris Wilson:
* Pull up gem_can_store_dword check before we start submitting.
* Build spinner batch in a way we can skip store dword when not
needed so we can run on SandyBridge.
v9:
* Fix wait on spinner.
* More tweaks to accuracy test.
v10:
* Dropped accuracy subtest changes due problems with RT thread and
tasklet submission.
v11:
* Use READ_ONCE.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> # IRC
|
|
We need to use absolute tolerance when asserting on percentages. Relative
tolerance in this case is unfair and inaccurate since it's strictness
varies with relative target busyness.
v2:
* Do not include spin batch edit and submit into measured time.
* Open PMU before child is in test PWM phase.
* No need to emit test PWM for twice as long with the new explicit
synchroniazation via pipe.
* Log test duration in ms for better readability.
* Drop inverse assert. (Chris Wilson)
v3: Explain tasklet delay. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
CPU hotplug, especially CPU0, can be flaky on commodity hardware.
To improve test reliability and reponse times when testing larger runs we
need to handle those cases better.
Handle failures to off-line a CPU by immediately skipping the test, and
failures to on-line a CPU by immediately rebooting the machine.
This patch includes igt_sysrq_reboot implementation from Chris Wilson.
v2: Halt by default, reboot if env variable IGT_REBOOT_ON_FATAL_ERROR is
set. (Petri Latvala)
v3: Add missign docs and update stale comment. (Petri Latvala)
v4: Use pause instead of sleep. (Chris Wilson)
v5: Newlines! (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Petri Latvala <petri.latvala@intel.com>
Cc: Tomi Sarvela <tomi.p.sarvela@intel.com>
Reviewed-by: Petri Latvala <petri.latvala@intel.com>
|
|
Verify that the reported busyness is in line with what would we expect
from a batch which causes a hang and gets kicked out from the engine.
v2: Change to explicit igt_force_gpu_reset instead of guessing when a spin
batch will hang. (Chris Wilson)
v3: Assert and comment test expectations. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
My bad,
perf_pmu.c: In function ‘accuracy’:
perf_pmu.c:1533:4: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 5 has type ‘uint64_t’ [-Wformat]
perf_pmu.c:1533:4: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 6 has type ‘uint64_t’ [-Wformat]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Apollolake machine in the shards cannot bring the CPU0 back online so
skip the test on all Broxtons for now.
v2: Fix inverted check.
v3: igt_skip_on. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Convert the busy pwm from using a single calibration pass with a fixed
target into a self-correcting pwm that tries to adjust how long to sleep
on each pwm in order to converge at the target busy %%.
Being self-correcting, it should fare better against the more variable
systems CI presents.
v2: Be fair and equally strict for low/high busy %%
v3: target_idle_us and calculate expected from timing of each individual pass
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105157
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Correct printf format for uint64_t and one "may be uninitialized".
v2: Fix one more "may be uninitialized". (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Since the spin batch contains a relocation to itself, when we resubmit
the spinner, we must ensure that it is executed at the same location.
While the spinner is busy, resubmitting will reuse the same location,
but if it is idle, the kernel may move it between execution. In this
case, we need to record the previous location (in obj.offset) and then
demand the kernel reuse the location using EXEC_OBJECT_PINNED.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
A subtest to verify that the engine busyness is reported with expected
accuracy on platforms where the feature is available.
We test three patterns: 2%, 50% and 98% load per engine.
v2:
* Use spin batch instead of nop calibration.
* Various tweaks.
v3:
* Change loops to be time based.
* Use __igt_spin_batch_new inside timing sensitive loops.
* Fixed PWM sleep handling.
v4:
* Use restarting spin batch.
* Calibrate more carefully by looking at the real PWM loop.
v5:
* Made standalone.
* Better info messages.
* Tweak sleep compensation.
v6:
* Some final tweaks. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
We need more data to debug sporadic test failures.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
We get occasional errors like:
(perf_pmu:21315) CRITICAL: Test assertion failure function sema_wait, file perf_pmu.c:631:
(perf_pmu:21315) CRITICAL: Failed assertion: (double)(val[1] - val[0]) <= (1.0 + (tolerance)) * (double)(slept) && (double)(val[1] - val[0]) >= (1.0 - (tolerance)) * (double)(slept)
(perf_pmu:21315) CRITICAL: 'val[1] - val[0]' != 'slept' (450000000.000000 not within 5.000000% tolerance of 500129618.000000)
Suggesting a time disagreement between userspace and the PMU.
At the moment I got no better ideas than fiddling with delays to see if it
improves things.
v2: Wait for sampling to start instead of hardcoded sleep. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Some systems cannot reach the advertised maximum frequency due throttling.
Handle them by considering a 100MHz lower limit.
v2: Use more relaxed tolerance only in the downward direction.
(Chris Wilson)
v3: Improved assert message. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Use perf timestamps in more places where possible.
v2: Log measure_usleep vs perf timestamps. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
We don't expect to be able to open the I915_SAMPLE_SEMA on gen5 and
earlier as the HW doesn't support semaphores.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Test to check that the RC6 counter works as expected during and after
runtime suspend.
v2:
* Use correct sysfs root by using IGT helpers.
* Turn off display to allow runtime suspend. (Imre)
* Two subtest flavours.
v3:
* drmModeFreeResources. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
We can use perf reported timestamps to potentially get a more accurate
frequency average.
Lets see if this improves the situation for sporadic failures like on APL:
Frequency: min=100, max=750, boost=750 MHz
Min frequency: requested 90.0, actual 90.0
Max frequency: requested 749.8, actual 647.9
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
This patch adds a context creation ioctl wrapper that returns the error
for the caller to consume. Multiple tests that implemented this already,
have been changed to use the new library function.
v2:
- Add gem_require_contexts() to check for contexts support (Chris)
v3:
- Add gem_has_contexts to check for contexts support and change
gem_require_contexts to skip if contests support is not available.
(Chris)
v4:
- Cosmetic changes and use lib function in gem_ctx_create where
possible. (Michal)
v5:
- Use gem_contexts_require() in tests and fixtures. (Chris)
Signed-off-by: Antonio Argenziano <antonio.argenziano@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
This test emits a spin batch which runs roughly for N CPU cores seconds
As such these can be declared as GPU hangs, so work around that by looping
with shorter batches.
v2:
* Use overlapping spinners. (Chris Wilson)
* Go back to igt_fork. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Test will succeed if present engine can be opened, or if the missing
engine reports the correct error code.
v2:
* Use the right errno.
* Close fd only on success. (Chris Wilson)
v3:
* Only sample errno on failure. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Always skip missing engines to make tests skips very early and avoid
losing time in tests which need to do setups or waits before they would
otherwise detect this.
To ensure PMU is rejecting opening missing engines we will add an explicit
test later.
v2: Use subtest groups for engine checking. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Test that the PMU can be safely enabled in face of interrupt-heavy load on
an engine.
The test is probabilistic so run it for ten seconds in a loop to increase
the odds of hitting the race.
v2: Repeat the test a few times, until a timeout. (Chris Wilson)
v3: Added note in code and commit about probabilistic nature of the test.
(Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Additional set of tests which stops the batch and sleeps for a bit
before sampling the counter in order to test that the busyness stop
being recorded correctly.
v2: Reorganize end_spin and guards a bit. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> # v1
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Will need more modes soon.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Stop relying on timers to end spin batches but use measured sleep, which
was established to work better, in all time based tests.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Where we use measured sleeps, take PMU samples immediately before and
after and look at their delta in order to minimize the effect of any
test setup delays.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
In cases where we manually terminate the busy batch, we always want to
sample busyness while the batch is running, just before we will
terminate it, and not the other way around. This way we make the window
for unwated idleness getting sampled smaller.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
While developing a fix for an accounting hole in busy stats we realized
lite-restore is a potential edge case which would be interesting to check
is properly handled.
It is unfortnately quite timing sensitive to hit lite-restore in the
fashion test needs, so downside of this test is that it sufferes from a
high rate of false negatives.
v2:
* Make the sleep unconditional and use scientific notiation for large
constants. (Chris Wilson)
* Use gem_quiscent_gpu instead of gem_sync+usleep to ensure context
complete was received under execlists. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Make sure busyness is correctly reported when PMU is enabled after the
engine is already busy with a single long batch.
v2:
* Make the sleep unconditional and use scientific notiation for large
constants. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Each call to igt_spin_batch_new_fence will do a stalling check to verify
that GEM is functional before submitting the spinning batch. In a loop,
this means that we may end up waiting for our earlier spinning
batches...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
|
|
Rather than calibrate and emit nop batches, use a manually signalled chain
of spinners to generate the desired interrupts.
v2: Two flavours of interrupt generation. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Use the normal batch_duration_ns and display the sampled frequency:
Frequency: min=100, max=750, boost=750 MHz
Min frequency: requested 100.0, actual 100.0
Max frequency: requested 755.6, actual 755.6
v2: Remove the early spin_batch_end and assert the measured frequencies
are within tolerance of our target.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Don't rely on the timer being precise when we can sleep for a known
duration.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Don't rely on the timer being precise when we can sleep for a known
duration.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Create all the spinners before starting the sampler and then measure how
long we sleep.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104160
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Sleep for a known duration. In particular, CI once saw a measurement for
busyness greater than the intended batch_duration!
v2: Go back to starting pmu sampling outside of spinner; the GPU should
be idle.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104241
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Still CI remains obstinate that RC6 is not smoothly incrementing during
the sample period. Tweak the wait_for_rc6() to first wait for the
initial Evaluation Interval before polling.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|