summaryrefslogtreecommitdiff
path: root/tests/perf_pmu.c
AgeCommit message (Collapse)Author
2017-12-22igt/perf_pmu: Speed up frequency measurementChris Wilson
Use the normal batch_duration_ns and display the sampled frequency: Frequency: min=100, max=750, boost=750 MHz Min frequency: requested 100.0, actual 100.0 Max frequency: requested 755.6, actual 755.6 v2: Remove the early spin_batch_end and assert the measured frequencies are within tolerance of our target. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-12-22igt/perf_pmu: Measure the reference batch for all-busy-check-allChris Wilson
Don't rely on the timer being precise when we can sleep for a known duration. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-12-22igt/perf_pmu: Measure the reference batch for busy-check-allChris Wilson
Don't rely on the timer being precise when we can sleep for a known duration. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-12-22igt/perf_pmu: Tighten measurements for most-busyChris Wilson
Create all the spinners before starting the sampler and then measure how long we sleep. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104160 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-12-22igt/perf_pmu: Tighten busy measurementChris Wilson
Sleep for a known duration. In particular, CI once saw a measurement for busyness greater than the intended batch_duration! v2: Go back to starting pmu sampling outside of spinner; the GPU should be idle. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104241 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-12-07igt/perf_pmu: Tweak wait_for_rc6, yet againChris Wilson
Still CI remains obstinate that RC6 is not smoothly incrementing during the sample period. Tweak the wait_for_rc6() to first wait for the initial Evaluation Interval before polling. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-12-05igt/perf_pmu: Replace hard-coded sleep before rc6 with a probeChris Wilson
Instead of trying to sleep for 2 evaluations intervals and then assuming that rc6 is working, poll the rc6 residency instead. v2: dce References: https://bugs.freedesktop.org/show_bug.cgi?id=103929 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-12-04igt/perf_pmu: Tighten semaphore-wait measurementChris Wilson
Record the before/after semaphore-wait values around the sleep to try to reduce the inaccuracy from scheduler delays. Previously, the samples were taken before submitting the batch and then after synchronising its completion. The measurement will then be the total that the semaphore was being sampled, but with the extra syscalls intervening may have drifted from the sleep duration. To further reduce the disparity, wait for the batch to start executing before taking our samples. References: https://bugs.freedesktop.org/show_bug.cgi?id=104013 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-11-30igt/perf_pmu: Increase delay for rc6 to startChris Wilson
I was thinking of the RC6 threshold parameter, but needed to consider the RC6 evaluation interval instead. RC6 doesn't enable until activity is below the threshold inside an evaluation interval, therefore we need to wait at least 2 EI after idling before we can expect RC6 to be enabled. Fixes: 55a17bc2d040 ("igt/perf_pmu: Reduce arbitrary delays before rc6") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-11-28tests/perf_pmu: Sync invalid-init with i915 changesTvrtko Ursulin
i915 started returning -EINVAL for incorrect CPU. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-11-27igt/perf_pmu: Keep batch_duration_ns as the minimum measurement durationChris Wilson
We have chosen batch_duration_ns to be the minimum duration we need to meet our accuracy requirements for legacy ringbuffer PMU sampling. As such, we need to be careful to use multiples of it during tests, and not split it into different phases within a test, like multi_client does. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-11-24intel/pmu: Catch-up with i915 RC6 aggregation changesTvrtko Ursulin
Since i915 PMU is removing separate RC6 counters and now aggregates all under a single one, catch up the test and intel-gpu-overlay with those changes. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-11-24igt/perf_pmu: Recalibrate interrupt loop.Chris Wilson
We have to be careful in our calibration loop, too slow and we timeout, too fast and we don't emit an interrupt! On fast legacy devices, we would overflow the calibration calcuation... v2: Give the time constants a name. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-11-24igt/perf_pmu: Stop peeking at intel_mmio registersChris Wilson
Program the MI_WAIT_FOR_EVENT without reference to DERRMR by knowing its state is ~0u when not in use, and is only in use when userspace requires it. By not touching intel_regsiter_access we completely eliminate the risk that we leak the forcewake ref, which can cause later rc6 to fail. At the same time, note that vlv/chv use a different mechanism (read none) for coupling between the render engine and display. v2: Note that we assume DERRMR should be ~0u when not in use. For futureproofing one might like to do SRM/LRM (but I believe that if the HW changes that much, we are likely to need a bigger boat). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-11-23igt/perf_pmu: Reduce arbitrary delays before rc6Chris Wilson
gem_quiescent_gpu() is supposed to ensure that the HW is idle, and in the process kick the GPU into rc6, so we should not need a long delay afterwards to ensure that we are indeed in rc6. We do however need a small delay in order to be sure that rc6 cycle counter has started and stopped. v2: Apply to rc6p as well. v3: The longest rc6 timeout (before the HW kicks in and enables rc6 on an idle GPU) is 50ms, so make sure that at least that time has passed since we were busy. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-11-23tests/perf_pmu: Bump measuring duration for semaphores as wellTvrtko Ursulin
As Chris has discover 100ms is not long enough to cover the sampling error in general, fix the semaphore subtest as well to measure for 500ms. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-11-23tests/perf_pmu: Make the invalid-init failure place more obviousTvrtko Ursulin
Put the values into the assert so we know which step failed. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-11-23igt/perf_pmu: Bump batch_duration for legacy sampling inaccuracyChris Wilson
Since the legacy ringbuffer uses a sampling technique, it is limited to an accuracy based on a 200Hz timer, or 5ms. We assert that measurements are within 5%, so with a 100ms duration that gives us no room for the systematic error in our sampling. Bump the duration to 500ms to give us plenty of safety margin (back of the paper estimate is that we need 300ms to provide the accuracy we request for legacy submission), if it then fails, it should not be due to the sampling! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-11-23igt/perf_pmu: Avoid underflow in measured_sleep()Chris Wilson
Be careful not to underflow into a very large positive usec value and so sleep forever^W until boredom kicks in. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-11-23igt/perf_pmu: Protect macro argsChris Wilson
In particular we fix the evaluation of "(double)x" when x is a compound. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-11-23igt/perf_pmu: Explicitly reset min_freq before max_freqChris Wilson
The kernel doesn't like it when you set max_freq < min_freq and complains bitterly. So before setting max_freq = min_freq, first set min_freq to min_freq! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-11-23igt/perf_pmu: Clear errno in between invalid atrr testsChris Wilson
Reset the errno after testing each invalid attr so that it is carried over on a false positive. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-11-23igt/perf_pmu: Tidy skip message for unsupported kernelsChris Wilson
Use a more compact expression to detect when the kernel doesn't support perf. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-11-23igt/perf_pmu: Test for supported perf before starting test_interruptsChris Wilson
Check for the perf event prior to calibrating the nop batch, so that we don't spend 4.5s before skipping. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-11-22tests/perf_pmu: Tests for i915 PMU APITvrtko Ursulin
A bunch of tests for the new i915 PMU feature. Parts of the code were initialy sketched by Dmitry Rogozhkin. v2: (Most suggestions by Chris Wilson) * Add new class/instance based engine list. * Add gem_has_engine/gem_require_engine to work with class/instance. * Use the above two throughout the test. * Shorten tests to 100ms busy batches, seems enough. * Add queued counter sanity checks. * Use igt_nsec_elapsed. * Skip on perf -ENODEV in some tests instead of embedding knowledge locally. * Fix multi ordering for busy accounting. * Use new guranteed_usleep when sleep time is asserted on. * Check for no queued when idle/busy. * Add queued counter init test. * Add queued tests. * Consolidate and increase multiple busy engines tests to most-busy and all-busy tests. * Guarantte interrupts by using fences. * Test RC6 via forcewake. v3: * Tweak assert in interrupts subtest. * Sprinkle of comments. * Fix multi-client test which got broken in v2. v4: * Measured instead of guaranteed sleep. * Missing sync in no_sema. * Log busyness before asserts for debug. * access(2) instead of open(2) to determine if cpu0 is hotpluggable. * Test frequency reporting via min/max setting instead assuming. ^^ All above suggested by Chris Wilson. ^^ * Drop queued subtests to match i915. * Use long batches with fences to ensure interrupts. * Test render node as well. v5: * Add to meson build. (Petri Latvala) * Use 1eN constants. (Chris Wilson) * Add tests for semaphore and event waiting. v6: * Fix interrupts subtest by polling the fence from the "outside". (Chris Wilson) v7: * Assert number of initialized engines matches the expectation. (Chris Wilson) * Warn instead of skipping if we couldn't restore the initial frequency. (Chris Wilson) * Move all asserts to after the test cleanup (just a tidy). * More 1eN notation for timeouts. * Bump the tolerance to 5% since I saw a few noisy runs with sampling counters. * Always start the PMU before submitting batches to lower reliance on i915 doing the delayed engine busy stats disable. v8: * Update for upstream engine class enum. v9: * Add meson build support. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>