Age | Commit message (Collapse) | Author |
|
Simple code tidy to reduce some duplication.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Thinking of qd as an estimator for the speed/latency of execution,
applying an average to produce a more stable estimation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
For reasons I haven't completely fathomed, adding a heartbeat per frame
is upsetting the qd-balancer when run with just a single client, e.g.
./gem_wsim -n 341724 -w wsim/media_mfe2_480p.wsim -c 1 -r 100 -b 1
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
The heartbeat should be sent after a period of idleness, it's sole
purpose is to update our idea of the external load. As we are sending it
currently in the middle of a busy period, we can reduce it to just
bumping the queue-depth and reusing the most recent clocks, so that we
do not perturb the RT timings greatly.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
We only need a heartbeat if we don't have any inflight batches to those
engine (the idea is to try and keep track of external usage). It only
needs to be periodic, but use once a wait point for simplicity.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Skip having to emit a nop batch to allocation a page and retrieve its
address from the kernel by using a persistent object.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
If the workload is repeated without a full barrier, we may try to add
the step onto a new request list before we remove it from its last.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Map the whole page since we actually do use it!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Moved it to a table of balancer structs for easier maintenance
and less chance of partial updates.
Also added qdr and rand balancers.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Handling was broken in quiet mode which got expose by the
conversion to verbosity.
And I want to have the runtime stats for the master workload
printed in the default verbosity as well.
Also tidy help text adding rtavg to id and group cmd line
options not taking any parameters together.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
dist tarball doesn't build otherwise, as indicated by distcheck.
Fixes: 9e55cca889cd ("wsim: Add rtavg balancer")
Signed-off-by: Petri Latvala <petri.latvala@intel.com>
|
|
This reverts commit b348107351c14cc7371ca65eea067d9a88ab7048.
Signed-off-by: Petri Latvala <petri.latvala@intel.com>
|
|
Too used to kbuild where you only specify objects.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Use a latch value to detect read/write tearing of the individual reads.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Oops.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Use the seqno as a seqlock so that we hopefully only read a coherent set
of results for a single update.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Since we are tracking the request depth outside of the balancer, we can
apply the qd thottle unconditionally.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Slightly more fine-grained than a single on/off switch.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
An improved version of the rt balancer that tracks the average latency
on each engine, rather than the last instantaneous execution latency.
This makes it much less sensitive to rapid changes, which is both a
positive and a negative.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Include the next batch in our estimated run time (that is the time to
execute the remaining queued requests plus ours). This is a better
estimate of queue length and also provides a bit of history to guide
judgement after a sync point.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Discard the low cycles when comparing queue length between engines as we
expect a small amount of instabilty. The threshold is chosen to be
around 50us.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
In order to measure external load (from other clients), we need a
periodic marker along each queue.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Stronger checks for some of the fields like dependencies out
of range and invalid values.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Instead of numbers passed to -b also allow strings.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
It makes more sense for the sequence of random batch durations
to be different between different clients.
Also add a new command line option (-S) to enable forcing the
same sequence if so can be useful for testing.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
New command option -R enables the initial VCS engine assignment
to be round-robinned across all clients.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
This one flips a coin when it fails to decide where to schedule.
It works well with the hd12 workload, either for one or multiple
clients, on which other balancers fail at the moment.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
hd06mp2, hd12: Want many parallel clients (20+) and lets itself
be balanced.
fhd26u7, 4k12u7: Simulates either encoder or decoder with VCS1
(HEVC) dependency and some balancing VCS usage. Needs fewer
clients (3-6).
hd01, hd17i4: Mostly RCS limited targetting maximum execution
speed for a single client. Must not be hampered by incorrect
balancing decisions.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Last step has to be a batch for the implicit sync to make sense.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
This enables media balancing workloads to be ran on GT2 SKUs.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Multiple dependencies separated by forward slashes are now supported.
Some media workloads also updated to use this for better efficiency.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Put the batch in exec object array last instead of moving it
after checking have there been any dependencies requested.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Workloads generated from a high level description of how
things usually work in the transcoding world.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
|
|
One copy and paste error sneaked in the no reloc changes.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Tool which emits batch buffers to engines with configurable
sequences, durations, contexts, dependencies and userspace waits.
Unfinished but shows promise so sending out for early feedback.
v2:
* Load workload descriptors from files. (also -w)
* Help text.
* Calibration control if needed. (-t)
* NORELOC | LUT to eb flags.
* Added sample workload to wsim/workload1.
v3:
* Multiple parallel different workloads (-w -w ...).
* Multi-context workloads.
* Variable (random) batch length.
* Load balancing (round robin and queue depth estimation).
* Workloads delays and explicit sync steps.
* Workload frequency (period) control.
v4:
* Fixed queue-depth estimation by creating separate batches
per engine when qd load balancing is on.
* Dropped separate -s cmd line option. It can turn itself on
automatically when needed.
* Keep a single status page and lie about the write hazard
as suggested by Chris.
* Use batch_start_offset for controlling the batch duration.
(Chris)
* Set status page object cache level. (Chris)
* Moved workload description to a README.
* Tidied example workloads.
* Some other cleanups and refactorings.
v5:
* Master and background workloads (-W / -w).
* Single batch per step is enough even when balancing. (Chris)
* Use hars_petruska_f54_1_random IGT functions and see to zero
at start. (Chris)
* Use WC cache domain when WC mapping. (Chris)
* Keep seqnos 64-bytes apart in the status page. (Chris)
* Add workload throttling and queue-depth throttling commands.
(Chris)
v6:
* Added two more workloads.
* Merged RT balancer from Chris.
v7:
* Merged NO_RELOC patch from Chris.
* Added missing RT balancer to help text.
TODO list:
* Fence support.
* Batch buffer caching (re-use pool).
* Better error handling.
* Less 1980's workload parsing.
* More workloads.
* Threads?
* ... ?
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
|
|
In order to bend over backwards to keep supporting Android.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
This reverts commit 25fbae15262cf570e207e62f50e7c5233e06bc67, restoring
commit 301ad44cdf1b868b1ab89096721da91fa8541fdc
Author: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Date: Thu Mar 2 10:37:11 2017 +0100
lib: Open debugfs files for the given DRM device
with fixes.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
This reverts commit 301ad44cdf1b868b1ab89096721da91fa8541fdc.
When a render-only device is opened and gem_quiescent_gpu is called, we
need to use the debugfs dir for the master device instead.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
|
|
When opening a DRM debugfs file, locate the right path based on the
given DRM device FD.
This is needed so, in setups with more than one DRM device, any
operations on debugfs files affect the expected DRM device.
v2: - rebased and fixed new API additions
v3: - updated chamelium test, which was missed previously
- use the minor of the device for the debugfs path, not the major
- have a proper exit handler for calling igt_hpd_storm_reset with the
right device fd.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Robert Foss <robert.foss@collabora.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Fix compiler warning about I915_EXEC_FENCE_OUT definition/redefinition
as it is defined in libdrm/i915_drm.h:890:0
gem_latency.c:48:0: warning: "I915_EXEC_FENCE_OUT" redefined
#define I915_EXEC_FENCE_OUT (1 << 17)
^
In file included from ../lib/intel_batchbuffer.h:6:0,
from ../lib/drmtest.h:39,
from ../lib/igt.h:27,
from gem_latency.c:31:
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
|