Age | Commit message (Collapse) | Author |
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
The joy of our hardware; don't let two threads attempt to read the same
register at the same time.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Try and gauge the amount of CPU time used for each dispatch/wait cycle.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Allow the producers to be set with maximum RT priority to verify that
the waiters are not exhibiting priorty-inversion.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Reading BCS_TIMESTAMP just returns 0...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Try a different pattern to cascade the cancellation from producers to
their consumers in order to avoid one potential deadlock.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Do the workload before the nop, so that if combining both, there is a
better chance for the spurious interrupts. Emit just one workload batch
(use the nops to generate spurious interrupts) and apply the factor to
the number of copies to make inside the workload - the intention is that
this gives sufficient time for all producers to run concurrently.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Just to make it easier to integrate into ezbench.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Split the distinct phases (generate interrupts, busywork, measure
latency) into separate batches for finer control.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Allow the user to choose a time to run for, default 10s
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Allow the user to select how many batches each producer submits before
waiting.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Knowing how long it takes to execute the workload (and how that scales)
is interesting to put the latency figures into perspective.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Late last night I forgot I had only added the llc CPU mmaping and not
the !llc GTT mapping for byt/bsw.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Superseded by gem_latency.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
The goal is measure how long it takes for clients waiting on results to
wakeup after a buffer completes, and in doing so ensure scalibilty of
the kernel to large number of clients.
We spawn a number of producers. Each producer submits a busyload to the
system and records in the GPU the BCS timestamp of when the batch
completes. Then each producer spawns a number of waiters, who wait upon
the batch completion and measure the current BCS timestamp register and
compare against the recorded value.
By varying the number of producers and consumers, we can study different
aspects of the design, in particular how many wakeups the kernel does
for each interrupt (end of batch). The more wakeups on each batch, the
longer it takes for any one client to finish.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
wait-ioctl skips a couple of side-effects of retiring, so provoke them
using set-domain before we sleep.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Switching between fds also involves a context switch, include it amongst
the measurements.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Benchmark the overhead of changing from GTT to CPU domains and vice
versa. Effectively this measures the cost of a clflush, and how well the
driver can avoid them.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
When extending the batch for multiple copies, we need to remember to
flag it as being in the CPU write domain so that the new values get
flushed out to main memory before execution. We also have to be careful
not to specify NO_RELOC for the extended batch as the execobjects will
have been updated but we write the wrong presumed offsets. Subsequent
iterations will be correct and we can tell the kernel then to skip the
relocations entirely.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Thomas Wood <thomas.wood@intel.com>
|
|
Add a README to introduce the ezbench.sh benchmark runner.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Report the highest throughput measured from a large set of runs to
improve sensitivity.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Apparently the pthread shim on Android doesn't have pthread cancellation,
so use the plain old volatile to terminate the CPU hogs.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
One scenario under recent discussion is that of having a thundering herd
in i915_wait_request - where the overhead of waking up every waiter for
every batchbuffer was significantly impacting customer throughput. This
benchmark tries to replicate something to that effect by having a large
number of consumers generating a busy load (a large copy followed by
lots of small copies to generate lots of interrupts) and tries to wait
upon all the consumers concurrenctly (to reproduce the thundering herd
effect). To measure the overhead, we have a bunch of cpu hogs - less
kernel overhead in waiting should allow more CPU throughput.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
To fix a build error on android
Signed-off-by: Derek Morton <derek.j.morton@intel.com>
Signed-off-by: Thomas Wood <thomas.wood@intel.com>
|
|
gem_mmap__{cpu,gtt,wc}() already has the assert built in, so replace
__gem_mmap__{cpu,gtt,wc}() + igt_assert() with it.
Mostly done with coccinelle, with some manual help:
@@
identifier I;
expression E1, E2, E3, E4, E5, E6;
@@
(
- I = __gem_mmap__gtt(E1, E2, E3, E4);
+ I = gem_mmap__gtt(E1, E2, E3, E4);
...
- igt_assert(I);
|
- I = __gem_mmap__cpu(E1, E2, E3, E4, E5);
+ I = gem_mmap__cpu(E1, E2, E3, E4, E5);
...
- igt_assert(I);
|
- I = __gem_mmap__wc(E1, E2, E3, E4, E5);
+ I = gem_mmap__wc(E1, E2, E3, E4, E5);
...
- igt_assert(I);
)
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Stochastically-reviwewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Rename the current gem_mmap__{cpu,gtt,wc}() functions into
__gem_mmap__{cpu,gtt,wc}(), and add back wrappers with the original name
that assert that the pointer is valid. Most callers will expect a valid
pointer and shouldn't have to bother with failures.
To avoid changing anything (yet), sed 's/gem_mmap__/__gem_mmap__/g'
over the entire codebase.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Stochastically-reviwewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Do the following
ptr = gem_mmap__{cpu,gtt,wc}()
+igt_assert(ptr);
whenever the code doesn't handle the NULL ptr in any kind of
specific way.
Makes it easier to move the assert into gem_mmap__{cpu,gtt,wc}() itself.
Mostly done with coccinelle, with some manual cleanups:
@@
identifier I;
@@
<... when != igt_assert(I)
when != igt_require(I)
when != igt_require_f(I, ...)
when != I != NULL
when != I == NULL
(
I = gem_mmap__gtt(...);
+ igt_assert(I);
|
I = gem_mmap__cpu(...);
+ igt_assert(I);
|
I = gem_mmap__wc(...);
+ igt_assert(I);
)
...>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Stochastically-reviwewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Add an option to do more than one copy per batch.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Execute N blits and time how long they complete to measure both GPU
limited bandwidth and submission overhead.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Android M-Dessert treats implicit declaration of function warnings
as errors resulting in igt failing to build.
This patch fixes the errors by including missing header files as
required. Mostly this involved including igt.h in the benchmarks.
Signed-off-by: Derek Morton <derek.j.morton@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
|
Apply the new API to all call sites within the test suite using the following
semantic patch:
// Semantic patch for replacing drm_open_any* with arch-specific drm_open_driver* calls
@@
identifier i =~ "\bdrm_open_any\b";
@@
- i()
+ drm_open_driver(DRIVER_INTEL)
@@
identifier i =~ "\bdrm_open_any_master\b";
@@
- i()
+ drm_open_driver_master(DRIVER_INTEL)
@@
identifier i =~ "\bdrm_open_any_render\b";
@@
- i()
+ drm_open_driver_render(DRIVER_INTEL)
@@
identifier i =~ "\b__drm_open_any\b";
@@
- i()
+ __drm_open_driver(DRIVER_INTEL)
Signed-off-by: Micah Fedke <micah.fedke@collabora.co.uk>
Signed-off-by: Thomas Wood <thomas.wood@intel.com>
|
|
Signed-off-by: Thomas Wood <thomas.wood@intel.com>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Avoid the globals and make the dispatch one huge function and hope GCC
works some magic.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
SNA starts by feeding in deliberately bad ioctls in order to detect the
kernel interface versions. A quick solution is to always feed it to the
ioctl and only record the trace if it is valid.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
The commit "benchmarks: Do not install to system-wide bin/" changed
the benchmark file list from bin_PROGRAMS to benchmarks_PROGRAMS.
However Android.mk was not updated, resulting in IGT failing to
build for Android.
This commit adds that change. It also adds LOCAL_MODULE_PATH to
specify where the built benchmarks should be put.
v2: I discovered that the existing definitions of LOCAL_MODULE_PATH
were creating what should have been an invalid path. Not sure how it
was ever working previously, but fixed now.
Signed-off-by: Derek Morton <derek.j.morton@intel.com>
Signed-off-by: Thomas Wood <thomas.wood@intel.com>
|
|
Allow specification of the many different busyness modes and relocation
interfaces, along with the number of buffers to use and relocations.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Just on the off chance someone is replaying a bunch of traces, remember
to cleanup up.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Use madvise(MADV_SEQUENTIAL) to let the kernel optimise for our
straightforward sequential read pattern.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Now that we actually install the benchmarks into a sane location,
slightly abuse it to put the tracer for gem_exec_trace alongside.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
When reallocing the bo array, remember to set the new entries to 0.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
These benchmarks are first-and-foremost development tools, not aimed at
general users. As such they should not be installed into the system-wide
bin/ directory, but installed into libexec/.
v2: Now actually install beneath ${libexec}
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
This slightly idealises the behaviour of clients with the aim of
measuring the kernel overhead of different workloads. This test focuses
on the cost of relocating batchbuffers.
A trace file is generated with an LD_PRELOAD intercept around
execbuffer, which we can then replay at our leisure. The replay replaces
the real buffers with a set of empty ones so the only thing that the
kernel has to do is parse the relocations. but without a real workload
we lose the impact of having to rewrite active buffers.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Recently added tools / benckmarks have the same module name as
existing tests. Android does not allow duplicate modules. This
patch appends _benchmark and _tool to the module names used when
building benckmarks and tools to prevent clashes with tests of
the same name.
Signed-off-by: Derek Morton <derek.j.morton@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
|
A basic measurement, how fast can we create and populate an object with
backing storage?
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Measure the overhead of execution when doing nothing, switching between
a pair of contexts, or creating a new context every time.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|