Workload descriptor format ========================== ctx.engine.duration_us.dependency.wait,... ..[-].[/][...].<0|1>,... P|X.. d|p|s|t|q|a.,... f For duration a range can be given from which a random value will be picked before every submit. Since this and seqno management requires CPU access to objects, care needs to be taken in order to ensure the submit queue is deep enough these operations do not affect the execution speed unless that is desired. Additional workload steps are also supported: 'd' - Adds a delay (in microseconds). 'p' - Adds a delay relative to the start of previous loop so that the each loop starts execution with a given period. 's' - Synchronises the pipeline to a batch relative to the step. 't' - Throttle every n batches. 'q' - Throttle to n max queue depth. 'f' - Create a sync fence. 'a' - Advance the previously created sync fence. 'P' - Context priority. 'X' - Context preemption control. Engine ids: RCS, BCS, VCS, VCS1, VCS2, VECS Example (leading spaces must not be present in the actual file): ---------------------------------------------------------------- 1.VCS1.3000.0.1 1.RCS.500-1000.-1.0 1.RCS.3700.0.0 1.RCS.1000.-2.0 1.VCS2.2300.-2.0 1.RCS.4700.-1.0 1.VCS2.600.-1.1 p.16000 The above workload described in human language works like this: 1. A batch is sent to the VCS1 engine which will be executing for 3ms on the GPU and userspace will wait until it is finished before proceeding. 2-4. Now three batches are sent to RCS with durations of 0.5-1.5ms (random duration range), 3.7ms and 1ms respectively. The first batch has a data dependency on the preceding VCS1 batch, and the last of the group depends on the first from the group. 5. Now a 2.3ms batch is sent to VCS2, with a data dependency on the 3.7ms RCS batch. 6. This is followed by a 4.7ms RCS batch with a data dependency on the 2.3ms VCS2 batch. 7. Then a 0.6ms VCS2 batch is sent depending on the previous RCS one. In the same step the tool is told to wait for the batch completes before proceeding. 8. Finally the tool is told to wait long enough to ensure the next iteration starts 16ms after the previous one has started. When workload descriptors are provided on the command line, commas must be used instead of new lines. Multiple dependencies can be given separated by forward slashes. Example: 1.VCS1.3000.0.1 1.RCS.3700.0.0 1.VCS2.2300.-1/-2.0 I this case the last step has a data dependency on both first and second steps. Sync (fd) fences ---------------- Sync fences are also supported as dependencies. To use them put a "f" token in the step dependecy list. N is this case the same relative step offset to the dependee batch, but instead of the data dependency an output fence will be emitted at the dependee step, and passed in as a dependency in the current step. Example: 1.VCS1.3000.0.0 1.RCS.500-1000.-1/f-1.0 In this case the second step will have both a data dependency and a sync fence dependency on the previous step. Example: 1.RCS.500-1000.0.0 1.VCS1.3000.f-1.0 1.VCS2.3000.f-2.0 VCS1 and VCS2 batches will have a sync fence dependency on the RCS batch. Example: 1.RCS.500-1000.0.0 f 2.VCS1.3000.f-1.0 2.VCS2.3000.f-2.0 1.RCS.500-1000.0.1 a.-4 s.-4 s.-4 VCS1 and VCS2 batches have an input sync fence dependecy on the standalone fence created at the second step. They are submitted ahead of time while still not runnable. When the second RCS batch completes the standalone fence is signaled which allows the two VCS batches to be executed. Finally we wait until the both VCS batches have completed before starting the (optional) next iteration. Context priority ---------------- P.1.-1 1.RCS.1000.0.0 P.2.1 2.BCS.1000.-2.0 Context 1 is marked as low priority (-1) and then a batch buffer is submitted against it. Context 2 is marked as high priority (1) and then a batch buffer is submitted against it which depends on the batch from context 1. Context priority command is executed at workload runtime and is valid until overriden by another (optional) same context priority change. Actual driver ioctls are executed only if the priority level has changed for the context. Context preemption control -------------------------- X.1.0 1.RCS.1000.0.0 X.1.500 1.RCS.1000.0.0 Context 1 is marked as non-preemptable batches and a batch is sent against 1. The same context is then marked to have batches which can be preempted every 500us and another batch is submitted. Same as with context priority, context preemption commands are valid until optionally overriden by another preemption control change on the same context.