Age | Commit message (Collapse) | Author |
|
Superseded by gem_latency.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
The goal is measure how long it takes for clients waiting on results to
wakeup after a buffer completes, and in doing so ensure scalibilty of
the kernel to large number of clients.
We spawn a number of producers. Each producer submits a busyload to the
system and records in the GPU the BCS timestamp of when the batch
completes. Then each producer spawns a number of waiters, who wait upon
the batch completion and measure the current BCS timestamp register and
compare against the recorded value.
By varying the number of producers and consumers, we can study different
aspects of the design, in particular how many wakeups the kernel does
for each interrupt (end of batch). The more wakeups on each batch, the
longer it takes for any one client to finish.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Benchmark the overhead of changing from GTT to CPU domains and vice
versa. Effectively this measures the cost of a clflush, and how well the
driver can avoid them.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
One scenario under recent discussion is that of having a thundering herd
in i915_wait_request - where the overhead of waking up every waiter for
every batchbuffer was significantly impacting customer throughput. This
benchmark tries to replicate something to that effect by having a large
number of consumers generating a busy load (a large copy followed by
lots of small copies to generate lots of interrupts) and tries to wait
upon all the consumers concurrenctly (to reproduce the thundering herd
effect). To measure the overhead, we have a bunch of cpu hogs - less
kernel overhead in waiting should allow more CPU throughput.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Execute N blits and time how long they complete to measure both GPU
limited bandwidth and submission overhead.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Allow specification of the many different busyness modes and relocation
interfaces, along with the number of buffers to use and relocations.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
These benchmarks are first-and-foremost development tools, not aimed at
general users. As such they should not be installed into the system-wide
bin/ directory, but installed into libexec/.
v2: Now actually install beneath ${libexec}
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
This slightly idealises the behaviour of clients with the aim of
measuring the kernel overhead of different workloads. This test focuses
on the cost of relocating batchbuffers.
A trace file is generated with an LD_PRELOAD intercept around
execbuffer, which we can then replay at our leisure. The replay replaces
the real buffers with a set of empty ones so the only thing that the
kernel has to do is parse the relocations. but without a real workload
we lose the impact of having to rewrite active buffers.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
A basic measurement, how fast can we create and populate an object with
backing storage?
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Measure the overhead of execution when doing nothing, switching between
a pair of contexts, or creating a new context every time.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
By measuring both the query and the event round trip time, we can make a
reasonable estimate of how long it takes for the query to send the
vblank following an interrupt.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
This adds a small benchmark for the new userptr functionality.
Apart from basic surface creation and destruction, also tested is the
impact of having userptr surfaces in the process address space. Reason
for that is the impact of MMU notifiers on common address space
operations like munmap() which is per process.
v2:
* Moved to benchmarks.
* Added pointer read/write tests.
* Changed output to say iterations per second instead of
operations per second.
* Multiply result by batch size for multi-create* tests
for a more comparable number with create-destroy test.
v3:
* Use ALIGN macro.
* Catchup with big lib/ reorganization.
* Removed unused code and one global variable.
* Fixed up some warnings.
v4:
* Fixed feature test, does not matter here but makes it
consistent with gem_userptr_blits and clearer.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
|
They build fine so give them some exposure.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
Signed-off-by: Thomas Wood <thomas.wood@intel.com>
|