Age | Commit message (Collapse) | Author |
|
Don't rely on the timer being precise when we can sleep for a known
duration.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Don't rely on the timer being precise when we can sleep for a known
duration.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Create all the spinners before starting the sampler and then measure how
long we sleep.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104160
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Sleep for a known duration. In particular, CI once saw a measurement for
busyness greater than the intended batch_duration!
v2: Go back to starting pmu sampling outside of spinner; the GPU should
be idle.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104241
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Still CI remains obstinate that RC6 is not smoothly incrementing during
the sample period. Tweak the wait_for_rc6() to first wait for the
initial Evaluation Interval before polling.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Instead of trying to sleep for 2 evaluations intervals and then assuming
that rc6 is working, poll the rc6 residency instead.
v2: dce
References: https://bugs.freedesktop.org/show_bug.cgi?id=103929
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Record the before/after semaphore-wait values around the sleep to try to
reduce the inaccuracy from scheduler delays. Previously, the samples
were taken before submitting the batch and then after synchronising its
completion. The measurement will then be the total that the semaphore
was being sampled, but with the extra syscalls intervening may have
drifted from the sleep duration. To further reduce the disparity, wait
for the batch to start executing before taking our samples.
References: https://bugs.freedesktop.org/show_bug.cgi?id=104013
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
I was thinking of the RC6 threshold parameter, but needed to consider
the RC6 evaluation interval instead. RC6 doesn't enable until activity
is below the threshold inside an evaluation interval, therefore we need
to wait at least 2 EI after idling before we can expect RC6 to be
enabled.
Fixes: 55a17bc2d040 ("igt/perf_pmu: Reduce arbitrary delays before rc6")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
i915 started returning -EINVAL for incorrect CPU.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
We have chosen batch_duration_ns to be the minimum duration we need to
meet our accuracy requirements for legacy ringbuffer PMU sampling. As
such, we need to be careful to use multiples of it during tests, and not
split it into different phases within a test, like multi_client does.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Since i915 PMU is removing separate RC6 counters and now aggregates all
under a single one, catch up the test and intel-gpu-overlay with those
changes.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
We have to be careful in our calibration loop, too slow and we timeout,
too fast and we don't emit an interrupt! On fast legacy devices, we
would overflow the calibration calcuation...
v2: Give the time constants a name.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Program the MI_WAIT_FOR_EVENT without reference to DERRMR by knowing its
state is ~0u when not in use, and is only in use when userspace requires
it. By not touching intel_regsiter_access we completely eliminate the
risk that we leak the forcewake ref, which can cause later rc6 to fail.
At the same time, note that vlv/chv use a different mechanism (read
none) for coupling between the render engine and display.
v2: Note that we assume DERRMR should be ~0u when not in use. For
futureproofing one might like to do SRM/LRM (but I believe that if the
HW changes that much, we are likely to need a bigger boat).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
gem_quiescent_gpu() is supposed to ensure that the HW is idle, and in
the process kick the GPU into rc6, so we should not need a long delay
afterwards to ensure that we are indeed in rc6. We do however need a
small delay in order to be sure that rc6 cycle counter has started and
stopped.
v2: Apply to rc6p as well.
v3: The longest rc6 timeout (before the HW kicks in and enables rc6 on
an idle GPU) is 50ms, so make sure that at least that time has passed
since we were busy.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
As Chris has discover 100ms is not long enough to cover the sampling
error in general, fix the semaphore subtest as well to measure for
500ms.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Put the values into the assert so we know which step failed.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Since the legacy ringbuffer uses a sampling technique, it is limited to
an accuracy based on a 200Hz timer, or 5ms. We assert that measurements
are within 5%, so with a 100ms duration that gives us no room for the
systematic error in our sampling. Bump the duration to 500ms to give us
plenty of safety margin (back of the paper estimate is that we need
300ms to provide the accuracy we request for legacy submission), if it
then fails, it should not be due to the sampling!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Be careful not to underflow into a very large positive usec value and so
sleep forever^W until boredom kicks in.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
In particular we fix the evaluation of "(double)x" when x is a compound.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
The kernel doesn't like it when you set max_freq < min_freq and
complains bitterly. So before setting max_freq = min_freq, first set
min_freq to min_freq!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Reset the errno after testing each invalid attr so that it is carried
over on a false positive.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Use a more compact expression to detect when the kernel doesn't support
perf.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Check for the perf event prior to calibrating the nop batch, so that we
don't spend 4.5s before skipping.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
A bunch of tests for the new i915 PMU feature.
Parts of the code were initialy sketched by Dmitry Rogozhkin.
v2: (Most suggestions by Chris Wilson)
* Add new class/instance based engine list.
* Add gem_has_engine/gem_require_engine to work with class/instance.
* Use the above two throughout the test.
* Shorten tests to 100ms busy batches, seems enough.
* Add queued counter sanity checks.
* Use igt_nsec_elapsed.
* Skip on perf -ENODEV in some tests instead of embedding knowledge locally.
* Fix multi ordering for busy accounting.
* Use new guranteed_usleep when sleep time is asserted on.
* Check for no queued when idle/busy.
* Add queued counter init test.
* Add queued tests.
* Consolidate and increase multiple busy engines tests to most-busy and
all-busy tests.
* Guarantte interrupts by using fences.
* Test RC6 via forcewake.
v3:
* Tweak assert in interrupts subtest.
* Sprinkle of comments.
* Fix multi-client test which got broken in v2.
v4:
* Measured instead of guaranteed sleep.
* Missing sync in no_sema.
* Log busyness before asserts for debug.
* access(2) instead of open(2) to determine if cpu0 is hotpluggable.
* Test frequency reporting via min/max setting instead assuming.
^^ All above suggested by Chris Wilson. ^^
* Drop queued subtests to match i915.
* Use long batches with fences to ensure interrupts.
* Test render node as well.
v5:
* Add to meson build. (Petri Latvala)
* Use 1eN constants. (Chris Wilson)
* Add tests for semaphore and event waiting.
v6:
* Fix interrupts subtest by polling the fence from the "outside".
(Chris Wilson)
v7:
* Assert number of initialized engines matches the expectation.
(Chris Wilson)
* Warn instead of skipping if we couldn't restore the initial
frequency. (Chris Wilson)
* Move all asserts to after the test cleanup (just a tidy).
* More 1eN notation for timeouts.
* Bump the tolerance to 5% since I saw a few noisy runs with
sampling counters.
* Always start the PMU before submitting batches to lower
reliance on i915 doing the delayed engine busy stats disable.
v8:
* Update for upstream engine class enum.
v9:
* Add meson build support.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|