Age | Commit message (Collapse) | Author |
|
In few tests we got some MI_* duplicates (MI_MATH for example).
Add common definitions in intel_reg.h and remove local definitions
in the tests.
v2: Definitions MI_LOAD_REGISTER_MEM_GEN8 was removed so from now
on user will need to encode length on it own. :
Signed-off-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
Reviewed-by: Petri Latvala <petri.latvala@intel.com>
|
|
Add a wrapper for gem_create_ext ioctl (a version of gem_create that
accepts extensions). In preparation for the driver change implementing it,
a local definition of its id and necessary structs have been added,
which are to be erased as soon as those definitions
appear in the i915_drm.h file.
The new ioctl wrapper is added to a separate file.
For consistency the wrapper of the old ioctl, gem_create
is moved from ioctl_wrappers to gem_create.
Signed-off-by: Andrzej Turko <andrzej.turko@linux.intel.com>
Cc: Zbigniew Kempczynski <zbigniew.kempczynski@intel.com>
Cc: Dominik Grzegorzek <dominik.grzegorzek@intel.com>
Cc: Petri Latvala <petri.latvala@intel.com>
Cc: Chris P Wilson <chris.p.wilson@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Acked-by: Petri Latvala <petri.latvala@intel.com>
|
|
In preparation for a variation on the exisiting GEM_CREATE API, split
the ioctl from out of the large ioctl_wrappers.c
Signed-off-by: Andrzej Turko <andrzej.turko@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
-L and -D <device> on the command line.
With no device specified tool tries to find i915 discrete or integrated in
that order.
v2:
* Fix error handling and support render devices.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Use MI_MATH and MI_COND_BBE we can construct a loop that runs for a
precise number of clock cycles, as measured by the CTX_TIMESTAMP. We use
the CTX_TIMESTAMP (as opposed to the CS_TIMESTAMP) so that the elapsed
time is measured local to the context, and the length of the batch is
unaffected by preemption. Since the clock ticks at a known frequency, we
can directly translate the batch durations into cycles and so remove the
requirement for nop calibration, and the often excessively large nop
batches.
The downside to this is that we need to use engine local registers, and
before gen11 there is no support in the CS for relative mmio and so this
technique does not support transparent load balancing on a virtual
engine before Icelake.
A further drawback is the loss of granularity for setting
non-preemptible durations as the arbitration is check on every timing
loop, rather than after a fixed period of time. A workload is then
either preemptible or not, with nothing in between. The reader is
welcome to see if it can be improved.
v2: More commentary, more code removal.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Intended use case was that run without arguments prints out the
calibrations which can be simply copied and pasted to the -n argument and
things should just work.
Two problems we need to solve: If the print out loops shows zero
calibrations (engine not present) they are later rejected and also if some
calibration is not given it is only an error if it needs to be used
(engine present).
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
It is useful to know how much memory workload is allocating.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
-f <float> on the command line can be used to scale batch buffer durations
in all parsed workloads. Equally -F <float> for delay steps.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
See README for more details.
v2:
* No need to mess with flags. (Chris)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Show average/min/max workload iteration and dropped period stats when 'p'
command is used.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Add support for defining buffer object working sets and targetting them as
data dependencies. For more information please see the README file.
v2:
* More robustness in parsing here and there. (Chris)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Evaluation of userspace load balancing options was how this tool started
but since we have settled on doing it in the kernel.
Tomorrow we will want to update the tool for new engine interfaces and all
this legacy code will just be a distraction.
Rip out everything not related to explicit load balancing implemented via
context engine maps and adjust the workloads to use it.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Recently added assert in a common helper used for calculating batch
duration and preemption period is harmful when preemption is disabled on a
context. Split out into low level and high level helper and use the former
for preemption period queries.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
We want to mark workload contexts as non-persistent if possible so that we
do not have to worry about leaving long (or infinite!) batches running
post exit.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
VCS is a special (non-physical) engine id/name which means load-balancing
in legacy workloads. As such when i915 balancing is not enabled it needs
to have a calibration as well.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
When providing engine calibrations on the command line, we need to apply
the one given for RCS to DEFAULT as well.
Also when load balancing is used we need to get calibration from one of
the real engines from the map which should provide a better match.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
../benchmarks/gem_wsim.c: In function ‘calibrate_nop’:
../benchmarks/gem_wsim.c:2999:4: warning: taking the absolute value of unsigned type ‘long unsigned int’ has no effect [-Wabsolute-value]
2999 | labs(size - last_size) > (size * tolerance_pct / 100));
Frankly the warning looks like nonsense to me, unless the compiler
actually optimizes the labs() away. Otherwise there's going to be an
implicit cast to signed long anyway since that's what labs() takes.
But the warning is easy enough to avoid here so let's do it.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
|
|
Extend handling -n parameter by accepting multiple values of per
engine nop calibration. Add raw numbers handling to set default
calibration values. Print copyable and pastable string with
calibrations. Allow to switch between calculating in parallel
or doing it sequentially.
Accepted input values:
-n 123456
All calibrations will be set to 123456.
-n ENG=value,ENG2=value2,value3
i.e.
-n RCS=123456,BCS=345678,999999
RCS engine's value is set to 123456, BCS engine's value is set to
345678, 999999 is copied to rest of engines. All engines must be set;
you can either provide values for each of the engines, or you can set
specific values and provide a default value for the others.
-n value,ENG1=value1,ENG2=value2
First, value is copied to all engines, then value1 overrides ENG1, and
finally value2 overrides ENG2.
New output follows the pattern:
Nop calibrations for 1000us delay is: <eng1>=<v1>,<eng2>=<v2>,...
So you can easily copy-paste it to the next invocation.
Switching between calculation modes:
Run program with -T parameter to calculate calibrations in parallel.
The calculations are performed sequentially by default.
v2: Get rid of trailing whitespaces. Skip DEFAULT and VCS engines
when printing out calibrations. Reject them in the string passed
to -n. Re-align rest of help text. Fix accepting unknown engines.
v3: Consider all cases of arguments
for -n (Tvrtko).
-n 10 (raw number)
-n RCS (engine without calib)
-n AA (neither the engine nor the number)
-n RCS=500 (valid eng=val pair)
-n RCS=AA (calib is not a number)
-n XYZ=10 (engine is not an engine)
-n XYZ=AA (combo)
v4: Print calculated values (Chris). Do not make any assumptions
about the order of the engines (Tvrtko).
v5: Suppress the output with -q (Chris).
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Anna Karas <anna.karas@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
[tursulin: fixup mixed plural and singular use in print_engine_calibrations]
|
|
Since with multiple devices, we may have multiple different perf_pmu
each with their own type, we want to find the right one for the job.
The tests are run with a specific fd, from which we can extract the
appropriate bus-id and find the associated perf-type. The performance
monitoring tools are a little more general and not yet ready to probe
all device or bind to one in particular, so we just assume the default
igfx for the time being.
v2: Extract the bus address from out of sysfs
v3: A new name for a new decade!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Robert M. Fosha" <robert.m.fosha@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: "Robert M. Fosha" <robert.m.fosha@intel.com> #v2
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Our list was something between Wayland and Linux Kernel list
implementations, right in the uncanny valley.
On top of that it falsely claimed that it's a straight copy from the
Wayland project.
Let's make our impl more akin to the kernel one to ease the cognitive
dissonance for the developers working on all those projects.
This patch:
* mimics the current kernel list interface
* separates IGT helpers in the source files
* adds brief explanation and code example for igt-doc
* introduces igt_list.c as static inlines are not visible in the docs
v2: mimic the kernel instead of wayland (Chris)
- _head suffix for the sentinel/link struct
- _entry_ in iterator names that go over the elements
v3: I forgot to merge this in time and there was another call site
that had to be converted in gem_spin_batch.c
Cc: Petri Latvala <petri.latvala@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Petri Latvala <petri.latvala@intel.com>
|
|
Drop the forcewake before libigt tries to wait on it.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Library was limited for reading registers for only
one device at a time in igt tests.
Changes in this patch give as oportunity to test multiple devices in
the same time.
v8: pack and rename structure, remove unnecessary field
v7: remove unnecessary code
v6: Reword patch. Cosmetic changes.
Cc: Antonio Argenziano <antonio.argenziano@intel.com>
Cc: Daniele Spurio Ceraolo <daniele.ceraolospurio@intel.com>
Cc: Katarzyna Dec <katarzyna.dec@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Petri Latvala <petri.latvala@intel.com>
Cc: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Mrzyglod <daniel.t.mrzyglod@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
e.g. benchmarks/gem_wsim.c:2936:4: warning: absolute value function ‘abs’ given an argument of type ‘long int’ but has parameter of type ‘int’ which may cause truncation of value [-Wabsolute-value]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Simon Ser <simon.ser@intel.com>
|
|
We can use offsetof for the same effect, much tidier with no dummy
locals.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Apparently VLA structs (e.g. struct { int array[count] }) is a gcc
extension that clang refuses to support as handling memory layout is too
difficult for it. So calculate the size by hand!
v2: Use alloca().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
New command line option to allow controling the initial pseudo random
generator seed in order to allow repeatable runs.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Suggested-by: Simon Ser <simon.ser@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Back when gem_wsim used forking it was safe to use the common storage
prng, but after converting to threads it no longer is.
Fix by storing and using a new per workload seed for batch buffer
duration randomness.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
On Icelake second vcs engine is vcs2 instead of vcs1 so add some logical
to physical instance remapping based on engine discovery to support it.
v2:
* Rebase.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Instead of hardcoding the VCS balancing engines, discover, both with the
new engines query, or with the legacy get_param in the fallback case, so
class based addressing always works.
v2:
* Simplify has_engine_query check. (Andi)
* Fix assert on uninitialized variable. (Andi)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
|
|
This will allow applying the discovered engine configuration from a single
place.
v2:
* Consolidate enum to ci conversion.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
To allow exercising the SSEU configuration in combination with Virtual
Engine, allow RCS to be specified in the engine map and use appropriate
index based addressing when applying SSEU configuration to it.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
A new workload command ('S') is added which allows per context slice
(re-)configuration.
v2:
* Only query device SSEU on first use. (Chris)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
A new command line switch ('-s') is added which toggles the low slice
count mode for workloads following on the command line.
This enables easy benchmarking of the effect of running the existing media
workloads in parallel against another client. For example:
./gem_wsim -n ... -v -r 600 -W master.wsim -s -w media_nn480.wsim
Adding or removing the '-s' switch before the second workload enables
analyzing the cost of dynamic SSEU switching impacted to the first
(master) workload.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
For simulating frame split workloads it is useful to express a batch which
ends at the same time as the parallel submission on the respective bonded
engine. For this we add support for infinite batch durations and the batch
terminate command ('T'). Syntax looks like this:
1.RCS.*.0.0
T.-1
First step starts an infinite batch, and second command terminates the
infinite batch with the usual relative workload step addressing.
v2: (Chris)
* Relax the recursive batch with 4096 nops between BB_START.
* Check for at least gen8.
* Simplify relocation entry building.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> # v1
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Engine bonds are an i915 uAPI applicable to load balanced contexts with
engine map. They allow expression rules of engine selection between two
contexts when submissions are also tied with submit fences.
Please refer to the README for a more detailed description.
v2:
* Use list of symbolic engine names instead of the mask. (Chris)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
A new workload command for enabling a load balanced context map (aka
Virtual Engine). Example usage:
B.1
This turns on load balancing for context one, assuming it has already been
configured with an engine map. Only DEFAULT engine specifier can be used
with load balanced engine maps.
v2:
* Lift restriction to only use load balancer when enabled in context map.
(Chris)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Parsing an integer workload descriptor field is a common pattern which we
can extract to a helper macro and by doing so further improve the
readability of the main parsing loop.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
We can improve the parsing loop readability a bit more by avoiding some
line breaks caused by explicit NULL checks.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Support new i915 uAPI for configuring contexts with engine maps.
Please refer to the README file for more detailed explanation.
v2:
* Allow defining engine maps by class.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
v2:
* Remove redundant check. (Chris)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Add support for submit fences in a way similar to how normal input fences
are handled. Eg:
1.RCS.500-1000.0.0
1.VCS1.3000.s-1.0
1.VCS2.3000.s-2.0
Submit fences are signalled when the originating request enters the
submission backend.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
A few more opportunities to compact the code by using the error logging
helper.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
There is a repeated pattern with error handling which can be moved to a
macro to for better readability in the command parsing loop.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
We are moving towards bumping the uAPI headers more often instead of using
too much local struct/ioctl/param definitions since the latter are more
challenging for rebase and maintenance.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Support i915 virtual engine from gem_wsim (-b i915) and media-bench.pl
v2:
* Add vm_destroy. (Chris)
* Remove unneeded braces. (Chris)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Use the client id to alternate the static_vcs balancer (-b context)
across clients with the round robin flag (-R) - otherwise all clients
end up on vcs0 and do not match the context balancing employed by
media-driver.
v2: Put it behind the -R flag.
v3: Don't skip -R flag for -b context in scripts/media-bench.pl
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
One significant usecase for intel_reg/etc. is to be able to examine
the hardware state *before* loading the driver. If the tool forces
the driver to load we've totally lost that capability.
This reverts commit 8ae86621d6fff60b6e20c6b0f9b336785c935b0f.
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Michał Winiarski <michal.winiarski@intel.com>
|
|
It allows us to make things a little bit more generic. Also, we now
require fd rather than doing guesswork when it comes to pci address.
v2: Use readlinkat rather than string concat, move stuff around, provide
a version that does not assert. (Chris)
v3: Print addr on failure, avoid assignment in conditionals. (Chris)
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
We do not bother explicitly unmapping memory on exit so no need to store
address and size in the workload step struct.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|