summaryrefslogtreecommitdiff
path: root/benchmarks
AgeCommit message (Collapse)Author
2016-03-01benchmarks/gem_blt: Measure the throughput of synchronous copiesChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-02-11lib: Add gem_userptr and __gem_userptr helpersTiago Vignatti
This patch moves userptr definitions and helpers implementation that were locally in gem_userptr_benchmark and gem_userptr_blits to the library, so other tests can make use of them as well. There's no functional changes. v2: added __ function to differentiate when errors want to be handled back in the caller; bring gem_userptr_sync back to gem_userptr_blits; added gtkdoc. v8: remove local_i915_gem_userptr from gem_concurrent_all.c to use the global helpers instead. Signed-off-by: Tiago Vignatti <tiago.vignatti@intel.com> Reviewed-by: Stéphane Marchesin <marcheu@chromium.org> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2016-01-27lib: Share common __gem_execbuf()Chris Wilson
An oft-repeated function to check EXECBUFFER2 for a particular fail condition. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-01-06benchmarks/gem_latency: Allow setting an infinite timeChris Wilson
Well, 24000 years. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-01-02benchmarks/gem_mmap: Convert to run over a fixed periodChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-01-02benchmarks/gem_exec_nop: Convert to running for a fixed timeChris Wilson
Like the previous patch to gem_exec_ctx, retrict gem_exec_nop to running for a fixed length of time, rather than over a range of different execution counts. In order to retain some measurement of that range, allow measuring individual execution versus continuous dispatch. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-01-02benchmarks/gem_exec_ctx: Run for a fixed timeChris Wilson
Rather than investigate the curve for dispatch latency, just run for a fixed time and report an average latency. Instead offer two modes, average single dispatch latency, average continuous dispatch latency. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-01-01benchmarks/gem_exec_ctx: Fix fd switching between default contextsChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-31benchmarks/gem_blt: Estimate memory bandwidth to improve test runtimeChris Wilson
If we autotune the workload to only take 0.1s and then repeat the measurements over 2s, we can bound the benchmark runtime. (Roughly of course! Sometimes the dispartity between main memory CPU bandwidth, and GPU execution bandwidth throws off the runtime, but that's the purpose of the benchmark!) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-21benchmarks/gem_latency: Hide spinlocks for androidChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-21benchmarks/gem_latency: Serialise mmio readsChris Wilson
The joy of our hardware; don't let two threads attempt to read the same register at the same time. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-21benchmarks/gem_latency: Guard against inferior pthreads.hChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-20benchmarks/gem_latency: Measure CPU usageChris Wilson
Try and gauge the amount of CPU time used for each dispatch/wait cycle. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-20benchmarks/gem_latency: Measure effect of using RealTime priorityChris Wilson
Allow the producers to be set with maximum RT priority to verify that the waiters are not exhibiting priorty-inversion. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-20benchmarks/gem_latency: Use RCS on SandybridgeChris Wilson
Reading BCS_TIMESTAMP just returns 0... Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-20benchmarks/gem_latency: Rearrange thread cancellationChris Wilson
Try a different pattern to cascade the cancellation from producers to their consumers in order to avoid one potential deadlock. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-20benchmarks/gem_latency: Tweak workloadChris Wilson
Do the workload before the nop, so that if combining both, there is a better chance for the spurious interrupts. Emit just one workload batch (use the nops to generate spurious interrupts) and apply the factor to the number of copies to make inside the workload - the intention is that this gives sufficient time for all producers to run concurrently. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-19benchmarks/gem_latency: Add output field specifierChris Wilson
Just to make it easier to integrate into ezbench. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-19benchmarks/gem_latency: Split the nop/work/latency measurementChris Wilson
Split the distinct phases (generate interrupts, busywork, measure latency) into separate batches for finer control. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-19benchmarks/gem_latency: Add time controlChris Wilson
Allow the user to choose a time to run for, default 10s Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-19benchmarks/gem_latency: Add nop dispatch latency measurementChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-19benchmarks/gem_latency: Expose the workload factorChris Wilson
Allow the user to select how many batches each producer submits before waiting. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-19benchmarks/gem_latency: Measure whole execution throughputChris Wilson
Knowing how long it takes to execute the workload (and how that scales) is interesting to put the latency figures into perspective. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-19benchmarks/gem_latency: Fix for !LLCChris Wilson
Late last night I forgot I had only added the llc CPU mmaping and not the !llc GTT mapping for byt/bsw. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-19benchmarks: Remove gem_waitChris Wilson
Superseded by gem_latency. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-19benchmark: Measure of latency of producers -> consumers, gem_latencyChris Wilson
The goal is measure how long it takes for clients waiting on results to wakeup after a buffer completes, and in doing so ensure scalibilty of the kernel to large number of clients. We spawn a number of producers. Each producer submits a busyload to the system and records in the GPU the BCS timestamp of when the batch completes. Then each producer spawns a number of waiters, who wait upon the batch completion and measure the current BCS timestamp register and compare against the recorded value. By varying the number of producers and consumers, we can study different aspects of the design, in particular how many wakeups the kernel does for each interrupt (end of batch). The more wakeups on each batch, the longer it takes for any one client to finish. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-12-04benchmarks/gem_exec_nop: Flush retirement lists before executingChris Wilson
wait-ioctl skips a couple of side-effects of retiring, so provoke them using set-domain before we sleep. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-11-27benchmarks/gem_exec_ctx: Measure switching between fdsChris Wilson
Switching between fds also involves a context switch, include it amongst the measurements. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-11-22benchmarks: Add a set-domain benchmarkChris Wilson
Benchmark the overhead of changing from GTT to CPU domains and vice versa. Effectively this measures the cost of a clflush, and how well the driver can avoid them. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-11-12benchmarks/gem_blt: Fixup a couple of non-llc foiblesChris Wilson
When extending the batch for multiple copies, we need to remember to flag it as being in the CPU write domain so that the new values get flushed out to main memory before execution. We also have to be careful not to specify NO_RELOC for the extended batch as the execobjects will have been updated but we write the wrong presumed offsets. Subsequent iterations will be correct and we can tell the kernel then to skip the relocations entirely. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-11-11Fix comparison of unsigned integersThomas Wood
Signed-off-by: Thomas Wood <thomas.wood@intel.com>
2015-11-10benchmarks: Add READMEChris Wilson
Add a README to introduce the ezbench.sh benchmark runner. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-11-10benchmarks/gem_blt: Report peak throughputChris Wilson
Report the highest throughput measured from a large set of runs to improve sensitivity. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-10-30benchmarks/gem_wait: Remove pthread_cancel()Chris Wilson
Apparently the pthread shim on Android doesn't have pthread cancellation, so use the plain old volatile to terminate the CPU hogs. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-10-30benchmark/gem_wait: poc for benchmarking i915_wait_request overheadChris Wilson
One scenario under recent discussion is that of having a thundering herd in i915_wait_request - where the overhead of waking up every waiter for every batchbuffer was significantly impacting customer throughput. This benchmark tries to replicate something to that effect by having a large number of consumers generating a busy load (a large copy followed by lots of small copies to generate lots of interrupts) and tries to wait upon all the consumers concurrenctly (to reproduce the thundering herd effect). To measure the overhead, we have a bunch of cpu hogs - less kernel overhead in waiting should allow more CPU throughput. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-10-15benchmarks/gem_blt: Include igt.h in gem_blt.cDerek Morton
To fix a build error on android Signed-off-by: Derek Morton <derek.j.morton@intel.com> Signed-off-by: Thomas Wood <thomas.wood@intel.com>
2015-10-12Replace __gem_mmap__{cpu,gtt,wc}() + igt_assert() with gem_mmap__{cpu,gtt,wc}()Ville Syrjälä
gem_mmap__{cpu,gtt,wc}() already has the assert built in, so replace __gem_mmap__{cpu,gtt,wc}() + igt_assert() with it. Mostly done with coccinelle, with some manual help: @@ identifier I; expression E1, E2, E3, E4, E5, E6; @@ ( - I = __gem_mmap__gtt(E1, E2, E3, E4); + I = gem_mmap__gtt(E1, E2, E3, E4); ... - igt_assert(I); | - I = __gem_mmap__cpu(E1, E2, E3, E4, E5); + I = gem_mmap__cpu(E1, E2, E3, E4, E5); ... - igt_assert(I); | - I = __gem_mmap__wc(E1, E2, E3, E4, E5); + I = gem_mmap__wc(E1, E2, E3, E4, E5); ... - igt_assert(I); ) Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Stochastically-reviwewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-10-09Make gem_mmap__{cpu,gtt,wc}() assert on failureVille Syrjälä
Rename the current gem_mmap__{cpu,gtt,wc}() functions into __gem_mmap__{cpu,gtt,wc}(), and add back wrappers with the original name that assert that the pointer is valid. Most callers will expect a valid pointer and shouldn't have to bother with failures. To avoid changing anything (yet), sed 's/gem_mmap__/__gem_mmap__/g' over the entire codebase. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Stochastically-reviwewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-10-09Sprinkle igt_assert(ptr) after gem_mmap__{cpu,gtt,wc}Ville Syrjälä
Do the following ptr = gem_mmap__{cpu,gtt,wc}() +igt_assert(ptr); whenever the code doesn't handle the NULL ptr in any kind of specific way. Makes it easier to move the assert into gem_mmap__{cpu,gtt,wc}() itself. Mostly done with coccinelle, with some manual cleanups: @@ identifier I; @@ <... when != igt_assert(I) when != igt_require(I) when != igt_require_f(I, ...) when != I != NULL when != I == NULL ( I = gem_mmap__gtt(...); + igt_assert(I); | I = gem_mmap__cpu(...); + igt_assert(I); | I = gem_mmap__wc(...); + igt_assert(I); ) ...> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Stochastically-reviwewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-10-06benchmarks/gem_blt: Fix compilation after rebase and add batch-sizeChris Wilson
Add an option to do more than one copy per batch. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-10-06benchmarks: Measure BLT performanceChris Wilson
Execute N blits and time how long they complete to measure both GPU limited bandwidth and submission overhead. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-10-02benchmarks: Fix build errors on Android M-DessertDerek Morton
Android M-Dessert treats implicit declaration of function warnings as errors resulting in igt failing to build. This patch fixes the errors by including missing header files as required. Mostly this involved including igt.h in the benchmarks. Signed-off-by: Derek Morton <derek.j.morton@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-09-11convert drm_open_any*() calls to drm_open_driver*(DRIVER_INTEL) calls with cocciMicah Fedke
Apply the new API to all call sites within the test suite using the following semantic patch: // Semantic patch for replacing drm_open_any* with arch-specific drm_open_driver* calls @@ identifier i =~ "\bdrm_open_any\b"; @@ - i() + drm_open_driver(DRIVER_INTEL) @@ identifier i =~ "\bdrm_open_any_master\b"; @@ - i() + drm_open_driver_master(DRIVER_INTEL) @@ identifier i =~ "\bdrm_open_any_render\b"; @@ - i() + drm_open_driver_render(DRIVER_INTEL) @@ identifier i =~ "\b__drm_open_any\b"; @@ - i() + __drm_open_driver(DRIVER_INTEL) Signed-off-by: Micah Fedke <micah.fedke@collabora.co.uk> Signed-off-by: Thomas Wood <thomas.wood@intel.com>
2015-09-08build: fix unused-result warningsThomas Wood
Signed-off-by: Thomas Wood <thomas.wood@intel.com>
2015-08-21benchmarks/gem_exec_reloc: Allow profiling 0 relocsChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-08-14benchmark/gem_exec_trace: Inline everythingChris Wilson
Avoid the globals and make the dispatch one huge function and hope GCC works some magic. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-08-14benchmark/gem_exec_tracer: Tweak to handle SNAChris Wilson
SNA starts by feeding in deliberately bad ioctls in order to detect the kernel interface versions. A quick solution is to always feed it to the ioctl and only record the trace if it is valid. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-08-13benckmarks/Android.mk: Fix building benchmarks for AndroidDerek Morton
The commit "benchmarks: Do not install to system-wide bin/" changed the benchmark file list from bin_PROGRAMS to benchmarks_PROGRAMS. However Android.mk was not updated, resulting in IGT failing to build for Android. This commit adds that change. It also adds LOCAL_MODULE_PATH to specify where the built benchmarks should be put. v2: I discovered that the existing definitions of LOCAL_MODULE_PATH were creating what should have been an invalid path. Not sure how it was ever working previously, but fixed now. Signed-off-by: Derek Morton <derek.j.morton@intel.com> Signed-off-by: Thomas Wood <thomas.wood@intel.com>
2015-08-11benchmarks: Add a microbenchmark for relocation overheadChris Wilson
Allow specification of the many different busyness modes and relocation interfaces, along with the number of buffers to use and relocations. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-08-10benchmarks/gem_exec_trace: Unmap each trace after replayChris Wilson
Just on the off chance someone is replaying a bunch of traces, remember to cleanup up. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>