Age | Commit message (Collapse) | Author |
|
In the middle of a line of double calculations is (int * int) and a
potential overflow.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Just a silly benchmark to stress prime_fd_to_handle and
prime_handle_to_fd.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
benchmarks/ doesn't have the same automagic .gitignore so add the
missing ones.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
The benchmark was failing with:
gem_busy.c:158:8: error: implicit declaration of function 'intel_gen'
is invalid in C99 [-Werror,-Wimplicit-function-declaration]
gen = intel_gen(intel_get_drm_devid(fd));
The root cause was due to the local lib directory not being specified
in benchmarks/Android.mk, resulting in intel_chipset.h from drm being
used instead.
This patch adds the lib path to the LOCAL_C_INCLUDES
Signed-off-by: Derek Morton <derek.j.morton@intel.com>
Signed-off-by: Marius Vlad <marius.c.vlad@intel.com>
|
|
Instead of causing each engine to execute serially (and so only testing
on fence underneath the sync_file) merge them all together (as we do for
the other interfaces).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Useful for comparing the cost of explict fences versus implicit.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
And include poll(dmabuf) for comparison.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Makfile.sources
Replace the automake specific name of listings in Makefile.sources
with something not automake specific.
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
|
Use the HAS_INTEL automake flag to avoid building benchmarks that won't
compile unless libdrm_intel is available in the build system.
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
|
Primarily to check that we have the WC read/write disparity.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Currently, we look at the throughput for submitting a read batch to a
single engine or any. The kernel optimises for this by allowing multiple
engine to read at the same time, but writes are exclusive to a single
engine. So lets try to measure the impact of inserting the barriers
between writes on different engines.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Oops, because -nightly doesn't handle 64bit alignments correctly, this
was not working quite as expected.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
If we specify an unobtainable alignment (e.g, 63bits) the kernel will
eviction the object from the GTT and fail to rebind it. We can use this,
to measure how long it takes to move objects around in the GTT by
running execbuf followed by the unbind. For small objects, this will be
dominated by the nop execution time, but for larger objects this will be
ratelimited by how fast we can rewrite the PTE.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
v2: Initially added Werror by default. Make it optional so it doesn't
break android build and (potential) distros maintaing the package
(Hinted by Damien Lespiau).
--enable-werror will enable -Werror compiler flag.
Signed-off-by: Marius Vlad <marius.c.vlad@intel.com>
Acked-by: Damien Lespiau <damien.lespiau@intel.com>
|
|
In theory, we need to only worry about concurrent mmio writes to the
same cacheline. So far, disabling the spinlock hasn't hung the machine.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
When waiting for the producers to start, use the cond/mutex of the
Nth producer and not always the first.
Spotted-by: "Goel, Akash" <akash.goel@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
If the object is at offset 0, quite likely using full-ppgtt, then the
presumed_offset set also to 0 causes the relocation to be skipped.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Just a set of scripts to integrate these benchmarks with ezbench. They
need to be revised to plugin into latest version of ezbench.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
For sync, it really is just the average latency across all rings, but
for continuous we can expect to see the effect of concurrent dispatch
across rings. Hopefully.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
If we listen to the uevents from the kernel, we can detect when the GPU
hangs. This requires us to fork a helper process to do so and send a
signal back to the parent.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
To simplify and speed up running interruptible tests, use a custom
ioctl() function that control the signaling and detect when we need no
more iterations to trigger an interruption.
We use a realtime timer to inject the signal after a certain delay,
increasing the delay on every loop to try and exercise different code
paths within the function. The first delay is very short such that we
hopefully enter the kernel with a pending signal.
Clients should use
struct igt_sigiter iter = {};
while (igt_sigiter_repeat(&iter, enable_interrupts=true))
do_test()
to automatically repeat the test until we can inject no more signals
into the ioctls. This is condensed into a macro
igt_interruptible(enable_interrupts=true)
do_test();
for convenience.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Android defines __USE_GNU but does not provide pthread_attr_setaffinity_np()
so added an extra guard arround pthread_attr_setaffinity_np().
Signed-off-by: Derek Morton <derek.j.morton@intel.com>
|
|
Only if the trial __gem_execbuf reports an error do we want to remove
the fancy LUT flags.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
pthread_setaffinity_np is a GNU extensions, so add some __USE_GNU
ifdeffry and hope for the best if unavailable.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Since clock_gettime() should be a fixed overhead that adds to the
latency result, subtract it from the result.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
In order to keep the latency as low as possible for the idle load, we
need to keep the CPU awake. Otherwise we end up with the busy workload
having lower latency than the idle workload!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Also useful to know how much worse than baseline the latency is when the
gem load is applied. For slower systems, presenting in nanoseconds makes
it hard to read, so switch to microseconds for output.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Instead of measuring the wakeup latency of a GEM client, we turn the
tables here and ask what is the wakeup latency of a normal process
competing with GEM. In particular, a realtime process that expects
deterministic latency.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Use a simpler statically allocated struct for computing the mean as
otherwise we many run out of memeory!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
This patch moves userptr definitions and helpers implementation that were
locally in gem_userptr_benchmark and gem_userptr_blits to the library, so other
tests can make use of them as well. There's no functional changes.
v2: added __ function to differentiate when errors want to be handled back in
the caller; bring gem_userptr_sync back to gem_userptr_blits; added gtkdoc.
v8: remove local_i915_gem_userptr from gem_concurrent_all.c to use the global
helpers instead.
Signed-off-by: Tiago Vignatti <tiago.vignatti@intel.com>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
|
|
An oft-repeated function to check EXECBUFFER2 for a particular fail
condition.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Well, 24000 years.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Like the previous patch to gem_exec_ctx, retrict gem_exec_nop to running
for a fixed length of time, rather than over a range of different
execution counts. In order to retain some measurement of that range,
allow measuring individual execution versus continuous dispatch.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Rather than investigate the curve for dispatch latency, just run for a
fixed time and report an average latency. Instead offer two modes,
average single dispatch latency, average continuous dispatch latency.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
If we autotune the workload to only take 0.1s and then repeat the
measurements over 2s, we can bound the benchmark runtime. (Roughly of
course! Sometimes the dispartity between main memory CPU bandwidth, and
GPU execution bandwidth throws off the runtime, but that's the purpose
of the benchmark!)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
The joy of our hardware; don't let two threads attempt to read the same
register at the same time.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Try and gauge the amount of CPU time used for each dispatch/wait cycle.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|