Age | Commit message (Collapse) | Author |
|
MI_NOOP cannot be preempted which means up to now gem_wsim workloads were
preemptable on batch buffer granularity only.
Add MI_ARB_CHK every 100us so the new default is mid-batch preemption.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
A new workload command ('P') is added which enables per context dynamic
priority control.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Need namespace for new commands and I never documented they are case
insensitive so it is fine.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Need to check we actually are in VCS2 remapping mode!
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
It doesn't look like there should be a dependency there.
v2: s/intel_batchbuffer/intel_reg/
v3: One more s/intel_batchbuffer/intel_reg/ in benchmarks
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Katarzyna Dec <katarzyna.dec@intel.com>
|
|
Pollable spin batch exports a spin->running pointer which can be checked
by dereferencing it to see if the spinner is actually executing on the
GPU.
This is useful for tests which want to make sure they do not proceed with
their next step whilst the spinner is potentially only being processed by
the driver and not actually executing.
Pollable spinner can be created with igt_spin_batch_new_poll or
__igt_spin_batch_new_poll, after which igt_spin_busywait_until_running can
be used to busy wait until it is executing.
v2:
* Move READ_ONCE to igt_core.
* Add igt_spin_busywait_until_running. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Add busy and busy-avg balancers which make balancing decisions by looking
at engine busyness via the i915 PMU.
And thus are able to make decisions on the actual instantaneous load of
the system, and not use metrics that lag behind by a batch or two. In
doing so, each client should be able to greedily maximise their own
usage of the system, leading to improved load balancing even in the face
of other uncooperative clients. On the other hand, we are only using the
instantaneous load without coupling in the predictive factor for dispatch
and execution length.
v2:
* Commit text. (Chris Wilson)
* Rename get_stats to get_pmu_stats. (Chris Wilson)
* Fix PMU readout in VCS remap mode.
v3:
* Integrated Petri's meson build recipe.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Petri Latvala <petri.latvala@intel.com>
|
|
The MMAP_V2 is replaced by just using MMAP, since the official header
has the updated struct. The gem_create_v2 and gem_get_aperture are
left as is, because they seem to not be reflected in the UABI header!
Signed-off-by: Eric Anholt <eric@anholt.net>
Acked-by: Petri Latvala <petri.latvala@intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
|
VCS2 needs to be redirected to BCS in the remapping mode
otherwise the update will get missed.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Maximum has to be larger than minimum and all values positive.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
An approximation of in kernel balancing where balancing
decision is postponed to just before submitting the batch
for execution.
This is implemented by waiting on data dependencies in
userspace before submitting each batch.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
This one assigns VCS engines for each created context in a round-
robin fashion. By default each client starts the assignment from
the first engine.
In global mode the engines are assigned in the round-robin fashion
between all the clients - which is identical to the i915 behaviour.
Important difference is that doing it from gem_wsim means the
assignment always start from a known point so the run-to-run
behaviour is identical and repeatable, contrary to when i915 round-
robin assignment is used.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Also check that the usage makes sense.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Same as in other benchmarks using them, as not every system is updated
with libdrm having those defines.
Signed-off-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Reviewed-by: Petri Latvala <petri.latvala@intel.com>
|
|
gem_execbuf_wr was duplicated in multiple places.
Moving everything to lib/
Signed-off-by: Lukasz Fiedorowicz <lukasz.fiedorowicz@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
In this mode ('-G' on the command line) all balancing operations
are routed via the first client so the complete balancing state
is shared. In other words the overall balancing behaviours is
like there is only one client submitting the aggregate workload.
This can help with the observed metrics and lead to better
balancing decisions in a lot of cases.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Refactoring for upcoming changes.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Will make the userspace balancing daemon simulation easier.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Document priority support in the help text.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Slaves just keep on running, far beyond the repeat target of their
master.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Need to close more pipe ends to support master with more than
one background workloads.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Just a micro-optimisation to avoid copying back the struct to userspace
if we aren't looking for an output.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Two new workload commands are added, 'f' and 'q.<idx>' which
enable creation and signalling of non i915 fences.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Introduce an anonymous union so each step type can use its own
name for the metadata.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Add sync fence dependency support to workload steps.
Only one sync fence dependency per step is supported at the
moment.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Just compact it a bit by avoiding the min != max check
duplication and change get_duration to change w_step.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Used with the '-a' command line switch which follows the same
usage as '-w' and '-W', it enables to add append workload steps
to the end of all normal workloads.
This for example allows running any workload in the real-time
mode:
gem_wsim -w <some-workload> -a p.16667
Makes a workload to be run with the 60 Hz period.
At the same time fix the periodic mode execution with dropped
frames, or almost dropped frames.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Tidy last in the array presumed offset setting even though this
code path is not used at the moment.
Also use READ_ONCE on all fields we are trying to read from the
status page.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Do the GPU quiesce as close as possible in time to the start of
the workload in order to minimize the gap in engine utilization
tracking.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
We need to track queue depth not just when we are asked to load balance
but for specific engine requests so that we include those in our
balancing.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Simple code tidy to reduce some duplication.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Thinking of qd as an estimator for the speed/latency of execution,
applying an average to produce a more stable estimation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
For reasons I haven't completely fathomed, adding a heartbeat per frame
is upsetting the qd-balancer when run with just a single client, e.g.
./gem_wsim -n 341724 -w wsim/media_mfe2_480p.wsim -c 1 -r 100 -b 1
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
The heartbeat should be sent after a period of idleness, it's sole
purpose is to update our idea of the external load. As we are sending it
currently in the middle of a busy period, we can reduce it to just
bumping the queue-depth and reusing the most recent clocks, so that we
do not perturb the RT timings greatly.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
We only need a heartbeat if we don't have any inflight batches to those
engine (the idea is to try and keep track of external usage). It only
needs to be periodic, but use once a wait point for simplicity.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Skip having to emit a nop batch to allocation a page and retrieve its
address from the kernel by using a persistent object.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
If the workload is repeated without a full barrier, we may try to add
the step onto a new request list before we remove it from its last.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Map the whole page since we actually do use it!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Moved it to a table of balancer structs for easier maintenance
and less chance of partial updates.
Also added qdr and rand balancers.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Handling was broken in quiet mode which got expose by the
conversion to verbosity.
And I want to have the runtime stats for the master workload
printed in the default verbosity as well.
Also tidy help text adding rtavg to id and group cmd line
options not taking any parameters together.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Use a latch value to detect read/write tearing of the individual reads.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Oops.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Use the seqno as a seqlock so that we hopefully only read a coherent set
of results for a single update.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Since we are tracking the request depth outside of the balancer, we can
apply the qd thottle unconditionally.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Slightly more fine-grained than a single on/off switch.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
An improved version of the rt balancer that tracks the average latency
on each engine, rather than the last instantaneous execution latency.
This makes it much less sensitive to rapid changes, which is both a
positive and a negative.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Include the next batch in our estimated run time (that is the time to
execute the remaining queued requests plus ours). This is a better
estimate of queue length and also provides a bit of history to guide
judgement after a sync point.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|