summaryrefslogtreecommitdiff
path: root/benchmarks/wsim/README
blob: 205cd6c93afbed010f0feadab307393104774dad (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
Workload descriptor format
==========================

ctx.engine.duration_us.dependency.wait,...
<uint>.<str>.<uint>[-<uint>].<int <= 0>[/<int <= 0>][...].<0|1>,...
P|X.<uint>.<int>
d|p|s|t|q|a.<int>,...
f

For duration a range can be given from which a random value will be picked
before every submit. Since this and seqno management requires CPU access to
objects, care needs to be taken in order to ensure the submit queue is deep
enough these operations do not affect the execution speed unless that is
desired.

Additional workload steps are also supported:

 'd' - Adds a delay (in microseconds).
 'p' - Adds a delay relative to the start of previous loop so that the each loop
       starts execution with a given period.
 's' - Synchronises the pipeline to a batch relative to the step.
 't' - Throttle every n batches.
 'q' - Throttle to n max queue depth.
 'f' - Create a sync fence.
 'a' - Advance the previously created sync fence.
 'P' - Context priority.
 'X' - Context preemption control.

Engine ids: RCS, BCS, VCS, VCS1, VCS2, VECS

Example (leading spaces must not be present in the actual file):
----------------------------------------------------------------

  1.VCS1.3000.0.1
  1.RCS.500-1000.-1.0
  1.RCS.3700.0.0
  1.RCS.1000.-2.0
  1.VCS2.2300.-2.0
  1.RCS.4700.-1.0
  1.VCS2.600.-1.1
  p.16000

The above workload described in human language works like this:

  1.   A batch is sent to the VCS1 engine which will be executing for 3ms on the
       GPU and userspace will wait until it is finished before proceeding.
  2-4. Now three batches are sent to RCS with durations of 0.5-1.5ms (random
       duration range), 3.7ms and 1ms respectively. The first batch has a data
       dependency on the preceding VCS1 batch, and the last of the group depends
       on the first from the group.
  5.   Now a 2.3ms batch is sent to VCS2, with a data dependency on the 3.7ms
       RCS batch.
  6.   This is followed by a 4.7ms RCS batch with a data dependency on the 2.3ms
       VCS2 batch.
  7.   Then a 0.6ms VCS2 batch is sent depending on the previous RCS one. In the
       same step the tool is told to wait for the batch completes before
       proceeding.
  8.   Finally the tool is told to wait long enough to ensure the next iteration
       starts 16ms after the previous one has started.

When workload descriptors are provided on the command line, commas must be used
instead of new lines.

Multiple dependencies can be given separated by forward slashes.

Example:

  1.VCS1.3000.0.1
  1.RCS.3700.0.0
  1.VCS2.2300.-1/-2.0

I this case the last step has a data dependency on both first and second steps.

Sync (fd) fences
----------------

Sync fences are also supported as dependencies.

To use them put a "f<N>" token in the step dependecy list. N is this case the
same relative step offset to the dependee batch, but instead of the data
dependency an output fence will be emitted at the dependee step, and passed in
as a dependency in the current step.

Example:

  1.VCS1.3000.0.0
  1.RCS.500-1000.-1/f-1.0

In this case the second step will have both a data dependency and a sync fence
dependency on the previous step.

Example:

  1.RCS.500-1000.0.0
  1.VCS1.3000.f-1.0
  1.VCS2.3000.f-2.0

VCS1 and VCS2 batches will have a sync fence dependency on the RCS batch.

Example:

  1.RCS.500-1000.0.0
  f
  2.VCS1.3000.f-1.0
  2.VCS2.3000.f-2.0
  1.RCS.500-1000.0.1
  a.-4
  s.-4
  s.-4

VCS1 and VCS2 batches have an input sync fence dependecy on the standalone fence
created at the second step. They are submitted ahead of time while still not
runnable. When the second RCS batch completes the standalone fence is signaled
which allows the two VCS batches to be executed. Finally we wait until the both
VCS batches have completed before starting the (optional) next iteration.

Context priority
----------------

  P.1.-1
  1.RCS.1000.0.0
  P.2.1
  2.BCS.1000.-2.0

Context 1 is marked as low priority (-1) and then a batch buffer is submitted
against it. Context 2 is marked as high priority (1) and then a batch buffer
is submitted against it which depends on the batch from context 1.

Context priority command is executed at workload runtime and is valid until
overriden by another (optional) same context priority change. Actual driver
ioctls are executed only if the priority level has changed for the context.

Context preemption control
--------------------------

  X.1.0
  1.RCS.1000.0.0
  X.1.500
  1.RCS.1000.0.0

Context 1 is marked as non-preemptable batches and a batch is sent against 1.
The same context is then marked to have batches which can be preempted every
500us and another batch is submitted.

Same as with context priority, context preemption commands are valid until
optionally overriden by another preemption control change on the same context.