Age | Commit message (Collapse) | Author |
|
In split mode all requests have to be added up since they were previously
re-arranged so there is no overlap.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: John Harrison <John.C.Harrison@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
|
|
Turn off timeline stacking in favour of putting all the boxes belonging to
a single context at the same vertical slot.
Also remove the custom sorting function in favour of correctly assigning
the subgroup id's and order which greatly speeds up the library operations.
And finally remove the 'Toggle stacking' button which never worked.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
|
|
Now that we scale timestamps to get better timeline granularity, the hacky
hand rolled micro-second time to HTML date conversion does no longer cut
it.
Use perl built-in gmtime to handle things properly.
v2: Do not bother with the epoch since timeline times are relative to
unknown system boot time. (John Harrison)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
|
|
Commit 87d2affc380da96ba66c258c5337c363fe8651ef ("trace.pl: Add support
for colouring context execution"), due some dodgy attempts at patch
splitting broke the legacy colouring mode.
Fix it by passing in the request stage into the helper so all stages can
be correctly coloured in both modes.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
|
|
New way of describing engines needs the tool to be adapted to understand it.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
|
|
Timeline id allocation order is not tied with engine ids any more.
Remove the option which assumed that was the case in attempt to provide
more readable timeline.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
|
|
Add the command line switch which uses different colours for different
context execution boxes.
v2:
* Use HSL to simplify color generation. (Lionel)
* Colour other boxes in the same colour but different shade so it is
easier to follow the timeline.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: John Harrison <John.C.Harrison@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> # v1
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
|
|
It makes sense to fetch the min and max timestamp only after the
last sort of the array.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: John Harrison <John.C.Harrison@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
|
|
Just to clear up some space for incoming code refactoring.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: John Harrison <John.C.Harrison@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
|
|
Delay the auto-generation of end/notify values until the point where
everything is known. As opposed to potentially generating them
multiple times with differing values (in the case of 'incomplete'
entries).
v2: More complete description. [Tvrtko]
Signed-off-by: John Harrison <John.C.Harrison@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
There are various statistics being calculated multiple times in
multiple places while the log file is being read in. Some of these are
then re-calculated when the database is munged to correct various
issues with the logs. This patch consolidates the calculations into a
separate pass after all the reading and munging has been done.
Note that this actually produces a different final output as the
'execute-delay' values were not previously being re-calculated after
all the fixups. Thus were based on an incorrect calculation.
v2: Reduce scope of some local variables [Tvrtko]
Signed-off-by: John Harrison <John.C.Harrison@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Add an extra level to the databse key sort so that the ordering is
deterministic. If the time stamp matches, it now compares the key
itself as well (context/seqno). This makes it much easier to determine
if a change has actually broken anything. Previously back to back runs
with no changes could still produce different output, especially when
adding extra debug output during the calculations.
As the comparison test is now more than a single equation, moved it
out into a separate sort function.
v2: Re-work sort func for readability/performance [Tvrtko]
Signed-off-by: John Harrison <John.C.Harrison@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Cache the key count value rather than querying the hash every time.
Also assert that the database does not magically change size after the
fixups.
v2: Rename variable according to style guide [Tvrtko]
v3: Reverted accidental style change and added a blank line. [Tvrtko]
Signed-off-by: John Harrison <John.C.Harrison@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
i915 renamed the request related tracepoints so catch up with that.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> # irc
|
|
This makes life easier for users behind network proxies. A simple
'export https_proxy=<proxy url>' will make this works, today user
needs to do a more complicated setup in ssh/git with proxy
information.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Petri Latvala <petri.latvala@intel.com>
|
|
This makes scripts/run-tests.sh to look into a build dir by default,
looking for the test lists. With this we can run the script after
building i-g-t with meson without having to resort to the environment
variable.
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Petri Latvala <petri.latvala@intel.com>
|
|
When piglit runs it chdir() to its own directory, so passing a relative
path doesn't work. E.g.:
Fatal Error: IGT directory does not exist. Missing: build/tests
Make sure path is absolute throughout the script.
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Petri Latvala <petri.latvala@intel.com>
|
|
If the script is already running as root, it doens't need to be
executed through sudo. This also moves the calls to exec piglit to a
common function.
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Petri Latvala <petri.latvala@intel.com>
|
|
Signed-off-by: Petri Latvala <petri.latvala@intel.com>
Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Reviewed-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
|
|
A couple of small optimizations which altogether bring around 30%
improvement in my testing.
1. Do less string processing on tracepoints names and push more of the
check into the if-ladder.
2. Pull out common db key and ctx processing and cache common values in
local vars.
3. Key value pair parsing is faster with a regexp.
4. Avoid sorting the db hash multiple times if possible.
v2:
* Use faster key-value splitting method. (John Harrison)
v3:
* Fix floating-point to int time conversion.
v4:
* Fix refactoring and logic fails. (John Harrison)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: John Harrison <John.C.Harrison@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
|
|
Last refactoring introduced a copy and paste error, fix it.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reported-by: John Harrison <John.C.Harrison@intel.com>
Cc: John Harrison <John.C.Harrison@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
|
|
Breakage from previous refactoring.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reported-by: John Harrison <John.C.Harrison@intel.com>
Cc: John Harrison <John.C.Harrison@intel.com>
|
|
Remove some inactive code which was only uised to support the old execlists
implementation. It is hidden behind a toggle which is not user accesible
anyway.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: John Harrison <John.C.Harrison@intel.com>
|
|
Instead of hard-coding the order of key-value pairs into regular
expressions, auto-detect them as we go.
At the same time re-factor the code so it is smaller and even
slightly faster (10-15% by a quick measurement).
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: John Harrison <John.C.Harrison@intel.com>
Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
|
|
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
List the available tests with piglit instead of by hand. This solves
naming inconsistencies (piglit throwing caps away) as seen by
cibuglog, and makes the listing code simpler.
The format of the listing changes from
test-binary/subtest-name
to
igt@test-binary@subtest-name
but so far nothing has been able to directly consume run-tests.sh -l
output. The piglit format is directly consumable by piglit --test-list, and thus
by run-tests.sh -T.
Signed-off-by: Petri Latvala <petri.latvala@intel.com>
Cc: Tomi Sarvela <tomi.p.sarvela@intel.com>
Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Acked-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Acked-by: Tomi Sarvela <tomi.p.sarvela@intel.com>
Acked-by: Martin Peres <martin.peres@linux.intel.com>
|
|
Recently we added a number of chamelium tests to the fast-feedback testlist.
Chemelium is build-optional - requires `./configure --enable-chamelium`.
To mitigate issue with piglit exiting abruptly due to the (possibly)
missing test binaries, this makes it behave more gracefuly, considering
those as simply "notrun".
Cc: Petri Latvala <petri.latvala@intel.com>
Signed-off-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
|
It is useful to be able to specify wps target relative to single
client performance when evaluating multiple workloads.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Handle multi-day uptime when parsing tracepoint timestamps.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Simulates a single decoder feeding multiple processing and
encoding pipelines.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
So gem_wsim can be driven in it.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
For time being just displays the saturation finding steps.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Plus a help text correction and calibration speed-up in
-R and -T modes.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Record it within this script since trace.pl added support.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
New option (-w) allows direct pass-through to gem_wsim for cases
when heterogenous workloads, or even additional parameters to
gem_wsim need to be tested.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Merge and flatten all the engine timelines to produce an
aggregate stat.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
In addition:
* optimize saturation point finding
* fix wps target modes
* always use -R, it is pointless not to
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
When evaluating best balancers it is useful to be able to glance
over the range of results for a particular workload since that
determines the weighted scoring.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
I am failing to come up with a smart formula which would
have a little bit of an exponential component and take into
consideration both total thtoughput and single client
performance.
Simply adding the two scores together might work better
than any complications.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Makes sense to keep it around if a different type of analysis
needs to be done later.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Split out the flagging logic so to make it easier to read and so
that the complete failure to balance is declared a failure
instead of a warning.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Code just didn't expect '<none>' as the selected balancer.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
It wasn't normalized as the results are.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
The high level goal of this script is to programatically analyze
the simulated media workloads (gem_wsim) by finding an optimal
load balancing strategy, and also detecting any possible
shortcomings of the same.
When run without command line arguments script will run through
both of its phases.
In the first phase it will be running all the known balancers
against the all the known workloads, and for each combination
look for a point where aggregated system throughput cannot be
increased by running more parallel workload instances.
At that point each balancer gets a score proportional to the
throughput achieved, which is added to the running total for the
complete phase.
Several different score boards are kept - total throughput, per
client throughput and combined (total + per client). Weighted
scoreboards are also kept where scores are weighted based on the
total variance detected for a single workload. This means scores
for workloads which respond well to being balanced will be worth
more than of the ones which do not balance well in neither of
the configurations.
Based on the first phase a "best" balancing strategy will be
selected based on the combined weighted scoreboard.
Second phase will then proceed to profile all the selected
workloads with this balancer and look at potential problems with
GPU engines not being completely saturated.
If none of the active engine is saturated the workload will be
flagged, as it will if the only saturated engine is one of the
ones which can be balanced, but the other one in the same class
is under-utilized.
Flagged workloads then need to be analyzed which can be achieved
by looking at the html of the engine timelines which are
generated during this phase. (These files are all put in the
current working directory.)
It is quite possible that something flagged by the script as
suspect is completely fine and just a consequence of the
workload in question being fundementally unbalanced.
It is possible to skip directly to the second phase of the
evaluation by using the -b command line option. This option must
contain a string exactly as understood by gem_wsim's -b option.
For example '-b "-b rtavg -R"'.
Apart from being run with no arguments, script also supports a
selection of command line switches to enable fine tuning.
For example, also including the complete output from the script
in order to be more illustrative:
-8<---
+ scripts/media-bench.pl -n 642317 -r 2 -B rand,rtavg -W media_load_balance_hd12.wsim,media_load_balance_fhd26u7.wsim
Workloads:
media_load_balance_hd12.wsim
media_load_balance_fhd26u7.wsim
Balancers: rand,rtavg,
Target workload duration is 2s.
Calibration tolerance is 0.01.
Nop calibration is 642317.
Evaluating 'media_load_balance_hd12.wsim'... 2s is 990 workloads. (error=0.00750000000000006)
Finding saturation points for 'media_load_balance_hd12.wsim'...
rand balancer ('-b rand'): 6 clients (1412.576 wps, 235.429333333333 wps/client).
rand balancer ('-b rand -R'): 6 clients (1419.639 wps, 236.6065 wps/client).
rtavg balancer ('-b rtavg'): 5 clients (1430.143 wps, 286.0286 wps/client).
rtavg balancer ('-b rtavg -H'): 5 clients (1339.775 wps, 267.955 wps/client).
rtavg balancer ('-b rtavg -R'): 5 clients (1386.384 wps, 277.2768 wps/client).
rtavg balancer ('-b rtavg -R -H'): 6 clients (1365.943 wps, 227.657166666667 wps/client).
Best balancer is '-b rtavg'.
Evaluating 'media_load_balance_fhd26u7.wsim'... 2s is 52 workloads. (error=0.002)
Finding saturation points for 'media_load_balance_fhd26u7.wsim'...
rand balancer ('-b rand'): 3 clients (46.532 wps, 15.5106666666667 wps/client).
rand balancer ('-b rand -R'): 3 clients (46.242 wps, 15.414 wps/client).
rtavg balancer ('-b rtavg'): 6 clients (61.232 wps, 10.2053333333333 wps/client).
rtavg balancer ('-b rtavg -H'): 4 clients (57.608 wps, 14.402 wps/client).
rtavg balancer ('-b rtavg -R'): 6 clients (61.793 wps, 10.2988333333333 wps/client).
rtavg balancer ('-b rtavg -R -H'): 7 clients (60.697 wps, 8.671 wps/client).
Best balancer is '-b rtavg -R'.
Total wps rank:
===============
1: '-b rtavg' (1)
2: '-b rtavg -R' (0.989191465637926)
3: '-b rtavg -R -H' (0.973103630772601)
4: '-b rtavg -H' (0.938804458876241)
5: '-b rand -R' (0.874465740398305)
6: '-b rand' (0.874342391093453)
Total weighted wps rank:
========================
1: '-b rtavg -R' (1)
2: '-b rtavg' (0.998877134022041)
3: '-b rtavg -R -H' (0.982849160383224)
4: '-b rtavg -H' (0.938950446314292)
5: '-b rand' (0.80507369080098)
6: '-b rand -R' (0.80229656623594)
Per client wps rank:
====================
1: '-b rtavg -H' (1)
2: '-b rand' (0.977356849770376)
3: '-b rand -R' (0.976222085591368)
4: '-b rtavg' (0.888825068013012)
5: '-b rtavg -R' (0.875653417817828)
6: '-b rtavg -R -H' (0.726389466714194)
Per client weighted wps rank:
=============================
1: '-b rand' (1)
2: '-b rand -R' (0.996866139192282)
3: '-b rtavg -H' (0.986348733324348)
4: '-b rtavg' (0.811593544774355)
5: '-b rtavg -R' (0.805704548552663)
6: '-b rtavg -R -H' (0.671567075453688)
Combined wps rank:
==================
1: '-b rtavg' (1)
2: '-b rtavg -R' (0.989191465637926)
3: '-b rtavg -H' (0.972251783752137)
4: '-b rtavg -R -H' (0.949708930404222)
5: '-b rand' (0.914594701126905)
6: '-b rand -R' (0.914312395840401)
Combined weighted wps rank:
===========================
1: '-b rtavg' (1)
2: '-b rtavg -R' (0.995945739226824)
3: '-b rtavg -H' (0.984347862855008)
4: '-b rtavg -R -H' (0.956920992185625)
5: '-b rand' (0.899001713089319)
6: '-b rand -R' (0.896984246540919)
Balancer is '-b rtavg'.
Idleness tolerance is 2%.
Profiling 'media_load_balance_hd12.wsim'...
2s is 992 workloads. (error=0.00150000000000006)
Saturation at 6 clients (1434.207 workloads/s).
Pass [ 0: 0.57%, 2: 22.59%, 3: 23.30%, ]
Profiling 'media_load_balance_fhd26u7.wsim'...
2s is 52 workloads. (error=0.001)
Saturation at 6 clients (61.823 workloads/s).
WARN [ 0: 7.77%, 2: 0.66%, 3: 28.70%, ]
Problematic workloads were:
media_load_balance_fhd26u7.wsim -c 6 -r 52 [ 0: 7.77%, 2: 0.66%, 3: 28.70%, ]
-8<---
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
We are not interested in knowing the amount of perf data
captures so less noise is better.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
Given a log file created via perf with some interesting trace
events enabled, this tool can generate the timeline graph of
requests getting queued, their dependencies resolved, sent to
the GPU for executing and finally completed.
This can be useful when analyzing certain classes of performance
issues. More help is available in the tool itself.
The tool will also calculate some overall per engine statistics,
like total time engine was idle and similar.
v2:
* Address missing git add.
* Make html output optional (--html switch) and by default
just output aggregated per engine stats to stdout.
v3:
* Added --trace option which invokes perf with the correct
options automatically.
* Added --avg-delay-stats which prints averages for things
like waiting on ready, waiting on GPU and context save
duration.
* Fix warnings when no waits on an engine.
* Correct help text.
v4:
* Add --squash-ctx-id to substract engine id from ctx id
when parsing to make it easier to identify which context
is which with new i915 ctx id allocation scheme.
* Reconstruct request_out events where they are missing.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Harri Syrja <harri.syrja@intel.com>
Cc: Krzysztof E Olinski <krzysztof.e.olinski@intel.com>
|
|
This patch passes igt scripts/run-tests.sh -T option to piglit runner
as --test-list.
Restrictions to the option usage come from the piglit side:
- only last --test-list is processed
- no test regex inclusions or exclusions are processed if --test-list
exists
Signed-off-by: Tomi Sarvela <tomi.p.sarvela@intel.com>
Reviewed-by: Petri Latvala <petri.latvala@intel.com>
|