summaryrefslogtreecommitdiff
path: root/benchmarks
AgeCommit message (Collapse)Author
2017-05-11gem_wsim: Minimize startup gapTvrtko Ursulin
Do the GPU quiesce as close as possible in time to the start of the workload in order to minimize the gap in engine utilization tracking. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-05-11wsim: Set the seqno/time stamp on each batch to every engineChris Wilson
We need to track queue depth not just when we are asked to load balance but for specific engine requests so that we include those in our balancing. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-05-10wsim: Use a loop over engines to calculate RTChris Wilson
Simple code tidy to reduce some duplication. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-05-10wsim: Feed qd into ewmaChris Wilson
Thinking of qd as an estimator for the speed/latency of execution, applying an average to produce a more stable estimation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-05-10wsim: Switch off heartbeat by defaultChris Wilson
For reasons I haven't completely fathomed, adding a heartbeat per frame is upsetting the qd-balancer when run with just a single client, e.g. ./gem_wsim -n 341724 -w wsim/media_mfe2_480p.wsim -c 1 -r 100 -b 1 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-05-10wsim: Limit the information updated during the heartbeatChris Wilson
The heartbeat should be sent after a period of idleness, it's sole purpose is to update our idea of the external load. As we are sending it currently in the middle of a busy period, we can reduce it to just bumping the queue-depth and reusing the most recent clocks, so that we do not perturb the RT timings greatly. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-05-10wsim: Limit heartbeats to sync pointsChris Wilson
We only need a heartbeat if we don't have any inflight batches to those engine (the idea is to try and keep track of external usage). It only needs to be periodic, but use once a wait point for simplicity. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-05-10wsim: Cache the heartbeat batch and locationChris Wilson
Skip having to emit a nop batch to allocation a page and retrieve its address from the kernel by using a persistent object. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-05-10wsim: Avoid the workload_step being tracked simultaneously on multiple enginesChris Wilson
If the workload is repeated without a full barrier, we may try to add the step onto a new request list before we remove it from its last. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-05-10wsim: Stop treating wrk->status_page as just a single uint32_tChris Wilson
Map the whole page since we actually do use it! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-05-10gem_wsim: Refactor balancer selection and help textTvrtko Ursulin
Moved it to a table of balancer structs for easier maintenance and less chance of partial updates. Also added qdr and rand balancers. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-05-10gem_wsim: Fix master workload handling and statsTvrtko Ursulin
Handling was broken in quiet mode which got expose by the conversion to verbosity. And I want to have the runtime stats for the master workload printed in the default verbosity as well. Also tidy help text adding rtavg to id and group cmd line options not taking any parameters together. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-05-10benchmarks: Make sure ewma.h and ilog2.h are bundled in distPetri Latvala
dist tarball doesn't build otherwise, as indicated by distcheck. Fixes: 9e55cca889cd ("wsim: Add rtavg balancer") Signed-off-by: Petri Latvala <petri.latvala@intel.com>
2017-05-10Revert "autotools requires headers to be listed"Petri Latvala
This reverts commit b348107351c14cc7371ca65eea067d9a88ab7048. Signed-off-by: Petri Latvala <petri.latvala@intel.com>
2017-05-10autotools requires headers to be listedChris Wilson
Too used to kbuild where you only specify objects. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-05-09wsim: Compact the per-engine heartbeat into a single bufferChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-05-09wsim: Convert the RT multiple reads to a latched readChris Wilson
Use a latch value to detect read/write tearing of the individual reads. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-05-09wsim: Fixup breaking the read loop after the seqno advances.Chris Wilson
Oops. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-05-09wsim: Loop over the multiple u32 reads from the status pageChris Wilson
Use the seqno as a seqlock so that we hopefully only read a coherent set of results for a single update. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-05-09wsim: qd throttling now works independently of balancer->get_qdChris Wilson
Since we are tracking the request depth outside of the balancer, we can apply the qd thottle unconditionally. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-05-09wsim: Introduce verbosityChris Wilson
Slightly more fine-grained than a single on/off switch. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-05-09wsim: Add rtavg balancerChris Wilson
An improved version of the rt balancer that tracks the average latency on each engine, rather than the last instantaneous execution latency. This makes it much less sensitive to rapid changes, which is both a positive and a negative. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-05-09wsim: Improve rt balancer to use history across sync pointsChris Wilson
Include the next batch in our estimated run time (that is the time to execute the remaining queued requests plus ours). This is a better estimate of queue length and also provides a bit of history to guide judgement after a sync point. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-05-09wsim: Add a small tolerance to rt balancingChris Wilson
Discard the low cycles when comparing queue length between engines as we expect a small amount of instabilty. The threshold is chosen to be around 50us. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-05-09wsim: Send a periodic depth stamp down each queueChris Wilson
In order to measure external load (from other clients), we need a periodic marker along each queue. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-05-09wsim: per-engine throttlingChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-05-09wsim: Per-client prng pool for miscellaneous randomsChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-05-09gem_wsim: Slightly more robust workload parsingTvrtko Ursulin
Stronger checks for some of the fields like dependencies out of range and invalid values. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-05-09gem_wsim: Allow symbolic balancer selectionTvrtko Ursulin
Instead of numbers passed to -b also allow strings. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-05-09gem_wsim: Seed random numbers per clientTvrtko Ursulin
It makes more sense for the sequence of random batch durations to be different between different clients. Also add a new command line option (-S) to enable forcing the same sequence if so can be useful for testing. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-05-09gem_wsim: Enable initial per-workload round-robin VCS engine assignmentTvrtko Ursulin
New command option -R enables the initial VCS engine assignment to be round-robinned across all clients. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-05-09gem_wsim: Add RTR balancerTvrtko Ursulin
This one flips a coin when it fails to decide where to schedule. It works well with the hd12 workload, either for one or multiple clients, on which other balancers fail at the moment. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-05-08gem_wsim: More simulated transcoding workloadsTvrtko Ursulin
hd06mp2, hd12: Want many parallel clients (20+) and lets itself be balanced. fhd26u7, 4k12u7: Simulates either encoder or decoder with VCS1 (HEVC) dependency and some balancing VCS usage. Needs fewer clients (3-6). hd01, hd17i4: Mostly RCS limited targetting maximum execution speed for a single client. Must not be hampered by incorrect balancing decisions. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-05-08gem_wsim: Fix implicit sync on last workload stepTvrtko Ursulin
Last step has to be a batch for the implicit sync to make sense. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-05-08gem_wsim: Support VCS2 remappingTvrtko Ursulin
This enables media balancing workloads to be ran on GT2 SKUs. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-05-08gem_wsim: Support multiple dependenciesTvrtko Ursulin
Multiple dependencies separated by forward slashes are now supported. Some media workloads also updated to use this for better efficiency. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-05-05gem_wsim: Simplify batch creationTvrtko Ursulin
Put the batch in exec object array last instead of moving it after checking have there been any dependencies requested. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-05-05gem_wsim: Add some generic media workloadsTvrtko Ursulin
Workloads generated from a high level description of how things usually work in the transcoding world. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-05-04benchmarks: Add gem_wsim to .gitignorePetri Latvala
2017-04-25benchmarks/gem_wsim: Fix no reloc handlingTvrtko Ursulin
One copy and paste error sneaked in the no reloc changes. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-04-25benchmarks/gem_wsim: Command submission workload simulatorTvrtko Ursulin
Tool which emits batch buffers to engines with configurable sequences, durations, contexts, dependencies and userspace waits. Unfinished but shows promise so sending out for early feedback. v2: * Load workload descriptors from files. (also -w) * Help text. * Calibration control if needed. (-t) * NORELOC | LUT to eb flags. * Added sample workload to wsim/workload1. v3: * Multiple parallel different workloads (-w -w ...). * Multi-context workloads. * Variable (random) batch length. * Load balancing (round robin and queue depth estimation). * Workloads delays and explicit sync steps. * Workload frequency (period) control. v4: * Fixed queue-depth estimation by creating separate batches per engine when qd load balancing is on. * Dropped separate -s cmd line option. It can turn itself on automatically when needed. * Keep a single status page and lie about the write hazard as suggested by Chris. * Use batch_start_offset for controlling the batch duration. (Chris) * Set status page object cache level. (Chris) * Moved workload description to a README. * Tidied example workloads. * Some other cleanups and refactorings. v5: * Master and background workloads (-W / -w). * Single batch per step is enough even when balancing. (Chris) * Use hars_petruska_f54_1_random IGT functions and see to zero at start. (Chris) * Use WC cache domain when WC mapping. (Chris) * Keep seqnos 64-bytes apart in the status page. (Chris) * Add workload throttling and queue-depth throttling commands. (Chris) v6: * Added two more workloads. * Merged RT balancer from Chris. v7: * Merged NO_RELOC patch from Chris. * Added missing RT balancer to help text. TODO list: * Fence support. * Batch buffer caching (re-use pool). * Better error handling. * Less 1980's workload parsing. * More workloads. * Threads? * ... ? Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
2017-04-19benchmarks/gem_latency: Provide LOCAL defines for old libdrmChris Wilson
In order to bend over backwards to keep supporting Android. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-03-29benchmarks/gem_exec_trace: Randomise batch durationChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-03-29benchmarks/gem_exec_trace: Calibrate dummy workloadChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-03-28benchmarks/gem_exec_trace: Use a constant delayChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-03-28benchmarks/gem_exec_trace: Make the tracer threadsafeChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-03-28benchmarks/gem_exec_trace: Include client wait pointsChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-03-28benchmarks/gem_exec_trace: Enhanced multi-context captureChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-03-21Restore "lib: Open debugfs files for the given DRM device"Chris Wilson
This reverts commit 25fbae15262cf570e207e62f50e7c5233e06bc67, restoring commit 301ad44cdf1b868b1ab89096721da91fa8541fdc Author: Tomeu Vizoso <tomeu.vizoso@collabora.com> Date: Thu Mar 2 10:37:11 2017 +0100 lib: Open debugfs files for the given DRM device with fixes. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-03-21Revert "lib: Open debugfs files for the given DRM device"Tomeu Vizoso
This reverts commit 301ad44cdf1b868b1ab89096721da91fa8541fdc. When a render-only device is opened and gem_quiescent_gpu is called, we need to use the debugfs dir for the master device instead. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>