Age | Commit message (Collapse) | Author |
|
|
|
Also re-sort the file.
Signed-off-by: Petri Latvala <petri.latvala@intel.com>
|
|
benchmarks/ doesn't have the same automagic .gitignore so add the
missing ones.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Primarily to check that we have the WC read/write disparity.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
In order to keep the latency as low as possible for the idle load, we
need to keep the CPU awake. Otherwise we end up with the busy workload
having lower latency than the idle workload!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Superseded by gem_latency.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
The goal is measure how long it takes for clients waiting on results to
wakeup after a buffer completes, and in doing so ensure scalibilty of
the kernel to large number of clients.
We spawn a number of producers. Each producer submits a busyload to the
system and records in the GPU the BCS timestamp of when the batch
completes. Then each producer spawns a number of waiters, who wait upon
the batch completion and measure the current BCS timestamp register and
compare against the recorded value.
By varying the number of producers and consumers, we can study different
aspects of the design, in particular how many wakeups the kernel does
for each interrupt (end of batch). The more wakeups on each batch, the
longer it takes for any one client to finish.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Benchmark the overhead of changing from GTT to CPU domains and vice
versa. Effectively this measures the cost of a clflush, and how well the
driver can avoid them.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
One scenario under recent discussion is that of having a thundering herd
in i915_wait_request - where the overhead of waking up every waiter for
every batchbuffer was significantly impacting customer throughput. This
benchmark tries to replicate something to that effect by having a large
number of consumers generating a busy load (a large copy followed by
lots of small copies to generate lots of interrupts) and tries to wait
upon all the consumers concurrenctly (to reproduce the thundering herd
effect). To measure the overhead, we have a bunch of cpu hogs - less
kernel overhead in waiting should allow more CPU throughput.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Execute N blits and time how long they complete to measure both GPU
limited bandwidth and submission overhead.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Allow specification of the many different busyness modes and relocation
interfaces, along with the number of buffers to use and relocations.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
This slightly idealises the behaviour of clients with the aim of
measuring the kernel overhead of different workloads. This test focuses
on the cost of relocating batchbuffers.
A trace file is generated with an LD_PRELOAD intercept around
execbuffer, which we can then replay at our leisure. The replay replaces
the real buffers with a set of empty ones so the only thing that the
kernel has to do is parse the relocations. but without a real workload
we lose the impact of having to rewrite active buffers.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
A basic measurement, how fast can we create and populate an object with
backing storage?
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Measure the overhead of execution when doing nothing, switching between
a pair of contexts, or creating a new context every time.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
This adds a small benchmark for the new userptr functionality.
Apart from basic surface creation and destruction, also tested is the
impact of having userptr surfaces in the process address space. Reason
for that is the impact of MMU notifiers on common address space
operations like munmap() which is per process.
v2:
* Moved to benchmarks.
* Added pointer read/write tests.
* Changed output to say iterations per second instead of
operations per second.
* Multiply result by batch size for multi-create* tests
for a more comparable number with create-destroy test.
v3:
* Use ALIGN macro.
* Catchup with big lib/ reorganization.
* Removed unused code and one global variable.
* Fixed up some warnings.
v4:
* Fixed feature test, does not matter here but makes it
consistent with gem_userptr_blits and clearer.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
|
Reusing xorg code saves maintenance in the long term.
Now that m4/.gitignore is removed, the -I m4 ${ACLOCAL_FLAGS}
must be removed to avoid build breakage as m4 is generated and not
part of the git source.
Acked-by: Cyril Brulebois <kibi@debian.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Gaetan Nadon <memsize@videotron.ca>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|