Age | Commit message (Collapse) | Author |
|
Adding crc32 calculation on gpu gives us new possibility to verify data
integrity without relying on trust cpu mapping is correct.
Patch introduces calculating crc32 on DG2 only. On older gens ALU
(MI_MATH) doesn't support bit-shifting instructions as well as multiply
or divide. Emulating n-bit shifts cost hundred of instructions with
predicated SRM (works on render engine only). Another limitation is lack
of indexed load / store. On DG2 we can use WPARID and CS_MI_ADDRESS_OFFSET
to achieve indexed operation on memory.
Due to performance reasons (cpu crc32 calculation even on WC memory is
still much faster than on gpu, also depends on calculated object memory
region) calculation will complete in reasonable of time only for few MiB.
v2: - use registers relative to engine to allow run on all engines (Chris)
- use predication instead of memory access to get better performance
(Chris)
- add location where crc32 implementation comes from (Petri)
v4: - use common crc32 table from igt_crc
- add docs
v5: - change BIT(n) to informative macros (Zbigniew)
Signed-off-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
Acked-by: Petri Latvala <petri.latvala@intel.com>
|
|
In few tests we got some MI_* duplicates (MI_MATH for example).
Add common definitions in intel_reg.h and remove local definitions
in the tests.
v2: Definitions MI_LOAD_REGISTER_MEM_GEN8 was removed so from now
on user will need to encode length on it own. :
Signed-off-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
Reviewed-by: Petri Latvala <petri.latvala@intel.com>
|
|
From dg2, there are multiple compute engines which conflict with the
parallel-submit restriction; check for the illegal engine class prior to
running the test.
And from dg2, MI_ATOMIC is only valid for use with system memory for
INC, DEC and MOV operation, not the ADD used in the test workload.
Replace the ADD+1 with an INC, so that the same workload runs on all
platforms.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6157
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
|
|
The XY_COLOR_BLT instruction used by igt_draw's blitter implementation
doesn't support tile-4 (plus we've heard informally from the hardware
team that the instruction is deprecated in general). Switch to
XY_FAST_COLOR_BLT to perform our solid fills on DG2. This instruction
will also allow us to extend the igt_draw support to 64bit+ color depths
in the future too if we have tests that start wanting to test that.
Note that we don't currently pass enough information down to this
routine to pick an appropriate value for the smem vs lmem performance
hint bit, but that doesn't impact the output generated.
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Jeevan B <jeevan.b@intel.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
|
|
Add basic parallel execbuf submission test which more or less just
submits the same BB in loop a which does an atomic increment to a memory
location. The memory location is checked at the end for the correct
value. Different sections use various IOCTL options (e.g. fences,
location of BBs, etc...).
In addition to above sections, an additional section ensure the ordering
of parallel submission by submitting a spinning batch to 1 individual
engine, submit a parallel execbuf to all engines instances within the
class, verify none on parallel execbuf make to hardware, release
spinner, and finally verify everything has completed.
The parallel-ordering section assumes default timeslice / preemption
timeout values. If these values are changed the test may fail.
v2:
(Daniele)
- Add assert to ensure parallel & load_balance both not set in ctx lib
- s/count/expected/g in check_bo()
- use existing query library functions
- clean up bb_per_execbuf / count usage
- drop dead loop
- add comment for parallel-ordering
- don't declare loop variables inside loop
v3:
(Daniele)
- Read timeslice from sysfs
v4:
(Daniele)
- Add comment in timeslice sysfs function
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
|
|
Add definitions for the timestamp registers.
Reviewed-by: Manasi Navare <manasi.d.navare@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
|
|
Add definitions for the transcoder VRR registers.
Reviewed-by: Manasi Navare <manasi.d.navare@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
|
|
Test various things using mmio async flips. These are present since
g4x, except g4x does not seem to have a working flipdone interrupt.
Reviewed-by: Karthik B S <karthik.b.s@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
|
|
Add MI_SET_APPID instruction and param definitions
Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Those tests are meant to verify the ordering while messing around with
the timeline points.
v2: Add munmap() (Chris)
v3: Rebase
Drop gem_context_set_all_engines()
Fix subtest require on number of engines
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
|
To produce surfaces that are compressed using the media compression
format we need to use one of the media engines. The simplest way for
this is to use the vebox engine's tiling convert command, so add support
for this.
v2:
- Rebase on latest igt. (Mika)
Cc: Mika Kahola <mika.kahola@intel.com>
Cc: Brian Welty <brian.welty@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Mika Kahola <mika.kahola@intel.com>
|
|
We're only using the first instance of the render engine, but the
standard way on GEN12 to read/write engine registers is to depend on the
MMIO remapping to select the proper instance based on which command
streamer the read/write instruction was run from.
Enable the MMIO remapping to align with the standard way to do reg
read/writes.
Cc: Mika Kahola <mika.kahola@intel.com>
Cc: Brian Welty <brian.welty@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Mika Kahola <mika.kahola@intel.com>
|
|
On GEN12+ the AUX CCS surfaces required by the render and media
compression must be specified by a 3 level page table directory, which
translates the main surface graphics address to the AUX CCS surface
graphics address. For this purpose add support for creating a GEM buffer
to translate the linear surface address range to the linear AUX surface
address range.
The buffers containing the main surface must be pinned down, since the
directory table entry indices depend on the surface address, and they
must be 64kB aligned. The page table can be relocated OTOH, so allow
that and emit the required relocation entries.
v2:
- Make level variables to be 0 based (l1..l3 -> level=0..2).
- Add missing drm_intel_bo_set_softpin_offset() stub to fix build on
non-Intel archs.
- Fix missing offsets in reloc entries of already bound objects. (Chris)
- Randomize pin offsets, to try to avoid eviction. (Chris)
- Remove redundant MI_NOOPS around MI_LOAD_REGISTER_MEM
- Stop using explicit reloc cache domains, as these don't make sense on
GEN12 anyway. (Chris)
- Fix missing autotools support. (Chris)
- s/igt_aux_pgtable/intel_aux_pgtable/, since the functionality is Intel
specific. (Chris)
v3:
- Make sure all objects with an AUX surface are pinned.
v4:
- s/MI_LOAD_REGISTER_MEM/MI_LOAD_REGISTER_MEM_GEN8/ (Chris)
- Fix using buf->bo->size instead of buf->size when finding a free
range for a pinned obj.
- Fix alignment of the reserved space start for a pinned obj.
- Move gen12_emit_aux_pgtable_state() to its logical spot.
v5:
- Fix reloc emit call, passing a relative instead of absolute target
offset. (Chris)
- Fix off-by-one error when generating a random offset for pinned objs.
Cc: Mika Kahola <mika.kahola@intel.com>
Cc: Brian Welty <brian.welty@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
gen9+ introduces a cmdparser for the BLT engine which copies the
incoming BB to a kmd owned buffer for submission (to prevent changes
being made after the bb has been safely scanned). This breaks the
spin functionality because it relies on changing the submitted spin
buffers in order to terminate them.
Instead, for gen9+, we change the semantics by introducing a COND_BB_END
into the infinite loop, to wait until a memory flag (in anothe bo) is
cleared.
v2: Correct nop length to avoid overwriting bb_end instr when using
a dependency bo (cork)
v3: fix conflicts on igt_dummyload (Mika)
v4: s/bool running/uint32_t running, fix r->delta (Mika)
v5: remove overzealous assert (Mika)
v6: rebase on top of lib changes (Mika)
v7: rework on top of public igt lib changes (Mika)
v8: rebase
v9: simplify by using bb end as conditional (Chris)
Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com> (v2)
Cc: Joonas Lahtinen <joonas.lahtinen@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
1<<31 (same as 2<<30) is undefined behaviour in C. When compiling with
GCC and UBSan, it gives this error:
../tools/intel_reg_decode.c: In function ‘ivb_debug_port’:
../tools/intel_reg_decode.c:398:3: error: case label does not reduce to an integer constant
case PORT_DBG_DRRS_HW_STATE_HIGH:
^~~~
This happens because 1<<31 isn't representable as a signed int. Instead,
use an unsigned int.
Signed-off-by: Simon Ser <simon.ser@intel.com>
Reviewed-by: Petri Latvala <petri.latvala@intel.com>
|
|
When drawing with cairo to Y/Yf tiled FBs we use a temporary linear
buffer which is mapped to the CPU, but the fast blit needed for this
only expects 32 bpp FBs. Add support for other bpps too.
This is needed for upcoming patches testing non-32bit bpp formats with
Y/Yf tiling.
Thanks to Ville for explaining why we need the temporary buffer. (Looks
like for Y tiling we could do without, but that's a separate topic.)
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Imre Deak <imre.deak@intel.com>
|
|
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
|
|
v2: Adjust for BB handling changes. (Tvrtko Ursulin)
Correct XY_FAST_COPY_DST_TILING_Yf. (Tvrtko Ursulin)
v3: New tiling modes are not defined in the kernel any more. (Tvrtko Ursulin)
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
This patch adds support for dumping audio registers of Cherryview.
Signed-off-by: Libin Yang <libin.yang@intel.com>
|
|
libva makes extensive use of chained batch buffers. The batch
buffer copy portion of the command parser has the potential to
break chained batches, so add a simple test to make sure that
doesn't happen.
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
|
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
|
|
registers
intel_poller can be used to poll various display registers
(IIR,scanline/pixel/flip/frame counter, live address, etc.).
It can be used to determine eg. at which scanline or pixel count certain
events occur.
v2: s/intel_poller/intel_display_poller/
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
|
|
These comments are not gtk-doc comments, so replacing /** with /*
prevents any gtk-doc warnings.
Signed-off-by: Thomas Wood <thomas.wood@intel.com>
|
|
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
|
|
These opcodes are not specific for an endpoint, but are the same for all
endpoints. So rename them accordingly, using the name the VLV2 sideband
HAS uses. Also move the macros to the .c file, since they aren't used
anywhere else.
Signed-off-by: Imre Deak <imre.deak@intel.com>
|
|
This makes it a bit more like the kernel, so we can go poke at DPIO and
other IOSF regs a bit more easily.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
|
|
This is the only place where they are used and we've even started using
1 << n constants with gen 7.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
|
|
That's how the registers are named in the kernel defines.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
|
|
I did the same change in the Kernel a few months ago. This should help
not getting confused about which bit does what.
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
|
|
Since we now always want a length for this command, and we've created a
non-length variant, remove the #define to prevent further foot shooting.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
I've opted to not use the PIPE_CONTROL w/a for now. I am unclear if it
is actually required (the test does pass).
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
|
|
Debugging watermark issues on gen2/3 without them is hard ...
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
|
This provides a macro that allows us to update all the arbitrary blit
commands we have stuck throughout the code. It assumes we don't actually
use 64b relocs (which is currently true). This also allows us to easily find
all the areas we need to update later when we really use the upper dword.
This block was done mostly with a sed job, and represents the easier
in test blit implementations.
v2 by Oscar: s/OUT_BATCH/BEGIN_BATCH in BLIT_COPY_BATCH_START
CC: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
|
|
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
|
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
|
|
|
|
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
|
|
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
|
|
It was previously printing ironlake_debug_regs and haswell_debug_regs.
Since ironlake_debug_regs contains a lot of registers that don't exist
on Haswell, running intel_reg_dumper on Haswell caused "unclaimed
register" messages. Now I've copied the existing registers from
ironlake_debug_regs to haswell_debug_regs, so we won't print the
registers that don't exist anymore.
Also removed DP_TP_STATUS_A since it doesn't exist.
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
|
|
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
|
Alan typo'ed it, I've failed to notice :(
|
|
Solaris <sys/types.h> already has #define NOPID (pid_t)(-1)
Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
|
In Valleyview the DPLL and lane control registers are accessible only
through side band fabric called DPIO. Added two tools to read and write
registers residing in this space.
v2: Moved the core read/write functions to lib/intel_dpio.c based on
Ben's feedback
Signed-off-by: Vijay Purushothaman <vijay.a.purushothaman@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
|
Not just a copy of pipe B. Meh.
Also kill a few redudant #define for pipe B - they match pipe A.
|
|
Also reorder the pipe B regs a bit to be consisten with pipe A.
|
|
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
|
|
For now, only print their content for diffing, but also add the necessary
bits that can be used for more verbose output in the fugure.
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
|
I was interested in finding why my IVB system is not getting GPU turbo
after suspend/resume. The piece that looks weird to me is that
INTERRUPT_THRESHOLD is sitting at 0, whereas pre-suspend it's
0x12000000.
|
|
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
|
This allows to check if rc6 works, and how long have we been in each
state.
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
|