This removes the usage of intel_ring_emit in favour of
directly writing to the ring buffer.
intel_ring_emit was preventing the compiler for optimising
fetch and increment of the current ring buffer pointer and
therefore generating very verbose code for every write.
It had no useful purpose since all ringbuffer operations
are started and ended with intel_ring_begin and
intel_ring_advance respectively, with no bail out in the
middle possible, so it is fine to increment the tail in
intel_ring_begin and let the code manage the pointer
itself.
Useless instruction removal amounts to approximately
two and half kilobytes of saved text on my build.
Not sure if this has any measurable performance
implications but executing a ton of useless instructions
on fast paths cannot be good.
v2:
* Change return from intel_ring_begin to error pointer by
popular demand.
* Move tail increment to intel_ring_advance to enable some
error checking.
v3:
* Move tail advance back into intel_ring_begin.
* Rebase and tidy.
v4:
* Complete rebase after a few months since v3.
v5:
* Remove unecessary cast and fix !debug compile. (Chris Wilson)
v6:
* Make intel_ring_offset take request as well.
* Fix recording of request postfix plus a sprinkle of asserts.
(Chris Wilson)
v7:
* Use intel_ring_offset to get the postfix. (Chris Wilson)
* Convert GVT code as well.
v8:
* Rename *out++ to *cs++.
v9:
* Fix GVT out to cs conversion in GVT.
v10:
* Rebase for new intel_ring_begin in selftests.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170214113242.29241-1-tvrtko.ursulin@linux.intel.com
Apply workarounds to Geminilake, and annotate those that are applied
unconditionally when they apply to GLK based on the workaround database.
v2: Fix commit message typos. (David)
v3: Rebase.
Cc: David Weinehall <david.weinehall@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1485422218-9102-1-git-send-email-ander.conselvan.de.oliveira@intel.com
Along with GLK it was introduced the .is_lp and IS_GEN9_LP.
So, following the same simplification standard we can
put Skylake and Kabylake under the same bucket for most
of the things.
So let's add the IS_GEN9_BC for "Big Core" (non Atom based
platforms).
The i915_drv.c was let out of this patch on purpose
because that is really a decision per platform, just like
other cases where IS_KABYLAKE is different from IS_SKYLAKE.
v2: fix conflict with IS_LP and 3 new cases for this
big core bucket:
- intel_ddi.c: intel_ddi_get_link_dpll
- intel_fbc.c: find_compression_threshold
- i915_gem_gtt.c: gtt_write_workarounds
Cc: Anusha Srivatsa <anusha.srivatsa@intel.com>
Cc: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1485196357-30599-2-git-send-email-rodrigo.vivi@intel.com
Geminilake is mostly backwards compatible with broxton, so change most
of the IS_BROXTON() checks to IS_GEN9_LP(). Differences between the
platforms will be implemented in follow-up patches.
v2: Don't reuse broxton's path in intel_update_max_cdclk().
Don't set plane count as in broxton.
v3: Rebase
v4: Include the check intel_bios_is_port_hpd_inverted().
Commit message.
v5: Leave i915_dmc_info() out; glk's csr version != bxt's. (Rodrigo)
v6: Rebase.
v7: Convert a few mode IS_BROXTON() occurances in pps, ddi, dsi and pll
code. (Rodrigo)
v8: Squash a couple of DDI patches with more conversions. (Rodrigo)
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1480667037-11215-2-git-send-email-ander.conselvan.de.oliveira@intel.com
Like GEM init, GUC init, MOCS init and context creation.
Enables them to lose dev_priv locals.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
The state stored in this struct is not only the information about the
buffer object, but the ring used to communicate with the hardware. Using
buffer here is overly specific and, for me at least, conflates with the
notion of buffer objects themselves.
s/struct intel_ringbuffer/struct intel_ring/
s/enum intel_ring_hangcheck/enum intel_engine_hangcheck/
s/describe_ctx_ringbuf()/describe_ctx_ring()/
s/intel_ring_get_active_head()/intel_engine_get_active_head()/
s/intel_ring_sync_index()/intel_engine_sync_index()/
s/intel_ring_init_seqno()/intel_engine_init_seqno()/
s/ring_stuck()/engine_stuck()/
s/intel_cleanup_engine()/intel_engine_cleanup()/
s/intel_stop_engine()/intel_engine_stop()/
s/intel_pin_and_map_ringbuffer_obj()/intel_pin_and_map_ring()/
s/intel_unpin_ringbuffer()/intel_unpin_ring()/
s/intel_engine_create_ringbuffer()/intel_engine_create_ring()/
s/intel_ring_flush_all_caches()/intel_engine_flush_all_caches()/
s/intel_ring_invalidate_all_caches()/intel_engine_invalidate_all_caches()/
s/intel_ringbuffer_free()/intel_ring_free()/
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-15-git-send-email-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-4-git-send-email-chris@chris-wilson.co.uk
'ring' is an old deprecated term for a GPU engine. Chris Wilson wants to
use the name for what is currently known as an intel_ringbuffer, but it
will be dreadfully confusing if some rings are ringbuffers but other
rings are still engines. So this patch changes the names of a bunch of
parameters called 'ring' to either 'engine' or 'engine_id' according to
what they actually are.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469034967-15840-3-git-send-email-david.s.gordon@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The purpose for each MOCS entry isn't well defined atm. Defining these
is important to remove any uncertainty about the use of these entries
for example in terms of performance and GPU/CPU coherency.
Suggested by Ville.
v4:
- Rename I915_MOCS_AUTO to I915_MOCS_PTE. (Chris)
CC: Rong R Yang <rong.r.yang@intel.com>
CC: Yakui Zhao <yakui.zhao@intel.com>
CC: Ville Syrjälä <ville.syrjala@linux.intel.com>
CC: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467383528-16142-1-git-send-email-imre.deak@intel.com
Setting a write-back cache policy in the MOCS entry definition also
implies snooping, which has a considerable overhead. This is
unexpected for a few reasons:
- From user-space's point of view since it didn't want a coherent
surface (it didn't set the buffer as such via the set caching IOCTL).
- There is a separate MOCS entry field for snooping (which we never
set).
- This MOCS table is about caching in (e)LLC and there is no (e)LLC on
BXT. There is a separate table for L3 cache control.
Considering the above the current behavior of snooping looks like an
unintentional side-effect of the WB setting. Changing it to be LLC-UC
gets rid of the snooping without any ill-effects. For a coherent
surface the application would use a separate MOCS entry at index 1 and
call the set caching IOCTL to setup the PTE entries for the
corresponding buffer to be snooped. In the future we could also add a
new MOCS entry for coherent surfaces.
This resulted in 70% improvement in synthetic texturing benchmarks.
Kudos to Valtteri Rantala, Eero Tamminen and Michael T Frederick and
Ville who helped to narrow the source of problem to the kernel and to
the snooping behaviour in particular.
With a follow-up change to adjust the 3rd entry value
igt/gem_mocs_settings is passing after this change.
v2:
- Rebase on v2 of patch 1/2.
v3:
- Set the entry as LLC uncached instead of PTE-passthrough. This way
we also keep snooping disabled, but we also make the cacheability/
coherency setting indepent of the PTE which is managed by the
kernel. (Chris)
CC: Rong R Yang <rong.r.yang@intel.com>
CC: Yakui Zhao <yakui.zhao@intel.com>
CC: Valtteri Rantala <valtteri.rantala@intel.com>
CC: Eero Tamminen <eero.t.tamminen@intel.com>
CC: Michael T Frederick <michael.t.frederick@intel.com>
CC: Ville Syrjälä <ville.syrjala@linux.intel.com>
CC: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Acked-by: Zhao Yakui <yakui.zhao@intel.com>
Tested-by: Rong R Yang <rong.r.yang@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467380406-11954-3-git-send-email-imre.deak@intel.com
Use named struct initializers for clarity. Also fix the target cache
definition to reflect its role in GEN9 onwards. On GEN8 a TC value of 0
meant ELLC but on GEN9+ it means the TC and LRU controls are taken from
the PTE.
No functional change, igt/gem_mocs_settings still passing after this
change.
v2: (Chris)
- Add back the hexa literals for the entries.
Add note that igt/gem_mocs_settings still passes.
CC: Rong R Yang <rong.r.yang@intel.com>
CC: Yakui Zhao <yakui.zhao@intel.com>
CC: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Acked-by: Zhao Yakui <yakui.zhao@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467380406-11954-2-git-send-email-imre.deak@intel.com
It seems pretty clear that bitwise OR was intended here and not logical
OR.
Fixes: 6fc29133ea ('drm/i915/gen9: Add WaDisableSkipCaching')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
text data bss dec hex filename
6309351 3578714 696320 10584385 a18141 vmlinux
6308391 3578714 696320 10583425 a17d81 vmlinux
Almost 1KiB of code reduction.
v2: More s/INTEL_INFO()->gen/INTEL_GEN()/ and IS_GENx() conversions
text data bss dec hex filename
6304579 3578778 696320 10579677 a16edd vmlinux
6303427 3578778 696320 10578525 a16a5d vmlinux
Now over 1KiB!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1462545621-30125-3-git-send-email-chris@chris-wilson.co.uk
Combine the near identical implementations of intel_logical_ring_begin()
and intel_ring_begin() - the only difference is that the logical wait
has to check for a matching ring (which is assumed by legacy).
In the process some debug messages are culled as there were following a
WARN if we hit an actual error.
v2: Updated commentary
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-12-git-send-email-chris@chris-wilson.co.uk
Allow for the MOCS to be programmed for all engines.
Currently we program the MOCS when the first render batch
goes through. This works on most platforms but fails on
platforms that do not run a render batch early,
i.e. headless servers. The patch now programs all initialised engines
on init and the RCS is programmed again within the initial batch. This
is done for predictable consistency with regards to the hardware
context.
Hardware context loading sets the values of the MOCS for RCS
and L3CC. Programming them from within the batch makes sure that
the render context is valid, no matter what the previous state of
the saved-context was.
v2: posted correct version to the mailing list.
v3: moved programming to within engine->init_hw() (Chris Wilson)
v4: code formatting and white-space changes. (Chris Wilson)
Testcase: igt/gem_mocs_settings
Signed-off-by: Peter Antoine <peter.antoine@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1460556205-6644-1-git-send-email-chris@chris-wilson.co.uk
Equivalent to the existing for_each_engine() macro, this will replace
the latter wherever the third argument *is* actually wanted (in most
places, it is not used). The third argument is renamed to emphasise
that it is an engine id (type enum intel_engine_id). All the callers of
the macro that actually need the third argument are updated to use this
version, and the argument (generally 'i') is also updated to be 'id'.
Other callers (where the third argument is unused) are untouched for
now; they will be updated in the next patch.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Some trivial ones, first pass done with Coccinelle:
@@
@@
(
- I915_NUM_RINGS
+ I915_NUM_ENGINES
|
- intel_ring_flag
+ intel_engine_flag
|
- for_each_ring
+ for_each_engine
|
- i915_gem_request_get_ring
+ i915_gem_request_get_engine
|
- intel_ring_idle
+ intel_engine_idle
|
- i915_gem_reset_ring_status
+ i915_gem_reset_engine_status
|
- i915_gem_reset_ring_cleanup
+ i915_gem_reset_engine_cleanup
|
- init_ring_lists
+ init_engine_lists
)
But that didn't fully work so I cleaned it up with:
for f in *.[hc]; do sed -i -e s/I915_NUM_RINGS/I915_NUM_ENGINES/ $f; done
for f in *.[hc]; do sed -i -e s/i915_gem_request_get_ring/i915_gem_request_get_engine/ $f; done
for f in *.[hc]; do sed -i -e s/intel_ring_flag/intel_engine_flag/ $f; done
for f in *.[hc]; do sed -i -e s/intel_ring_idle/intel_engine_idle/ $f; done
for f in *.[hc]; do sed -i -e s/init_ring_lists/init_engine_lists/ $f; done
for f in *.[hc]; do sed -i -e s/i915_gem_reset_ring_cleanup/i915_gem_reset_engine_cleanup/ $f; done
for f in *.[hc]; do sed -i -e s/i915_gem_reset_ring_status/i915_gem_reset_engine_status/ $f; done
v2: Rebase.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Make I915_READ and I915_WRITE more type safe by wrapping the register
offset in a struct. This should eliminate most of the fumbles we've had
with misplaced parens.
This only takes care of normal mmio registers. We could extend the idea
to other register types and define each with its own struct. That way
you wouldn't be able to accidentally pass the wrong thing to a specific
register access function.
The gpio_reg setup is probably the ugliest thing left. But I figure I'd
just leave it for now, and wait for some divine inspiration to strike
before making it nice.
As for the generated code, it's actually a bit better sometimes. Eg.
looking at i915_irq_handler(), we can see the following change:
lea 0x70024(%rdx,%rax,1),%r9d
mov $0x1,%edx
- movslq %r9d,%r9
- mov %r9,%rsi
- mov %r9,-0x58(%rbp)
- callq *0xd8(%rbx)
+ mov %r9d,%esi
+ mov %r9d,-0x48(%rbp)
callq *0xd8(%rbx)
So previously gcc thought the register offset might be signed and
decided to sign extend it, just in case. The rest appears to be
mostly just minor shuffling of instructions.
v2: i915_mmio_reg_{offset,equal,valid}() helpers added
s/_REG/_MMIO/ in the register defines
mo more switch statements left to worry about
ring_emit stuff got sorted in a prep patch
cmd parser, lrc context and w/a batch buildup also in prep patch
vgpu stuff cleaned up and moved to a prep patch
all other unrelated changes split out
v3: Rebased due to BXT DSI/BLC, MOCS, etc.
v4: Rebased due to churn, s/i915_mmio_reg_t/i915_reg_t/
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1447853606-2751-1-git-send-email-ville.syrjala@linux.intel.com
When register type safety happens, we can't just try to emit the
register itself to the ring. Instead we'll need to extract the
offset from it first. Add some convenience functions that will do
that.
v2: Convert MOCS setup too
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1446672017-24497-20-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Kabylake is a Intel® Processor containing Intel® HD Graphics
following Skylake.
It is Gen9p5, so it inherits everything from Skylake.
Let's start by adding the platform separated from Skylake
but reusing most of all features, functions etc. Later we
rebase the PCI-ID patch without is_skylake=1
so we don't replace what original Author did there.
Few IS_SKYLAKEs if statements are not being covered by this patch
on purpose:
- Workarounds: Kabylake is derivated from Skylake H0 so no
W/As apply here.
- GuC: A following patch removes Kabylake support with an
explanation: No firmware available yet.
- DMC/CSR: Done in a separated patch since we need to be carefull
and load the version for revision 7 since
Kabylake is Skylake H0.
v2: relative cleaner commit message and added the missed
IS_KABYLAKE to intel_i2c.c as pointed out by Jani.
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
This change adds the programming of the MOCS registers to the gen 9+
platforms. The set of MOCS configuration entries introduced by this
patch is intended to be minimal but sufficient to cover the needs of
current userspace - i.e. a good set of defaults. It is expected to be
extended in the future to provide further default values or to allow
userspace to redefine its private MOCS tables based on its demand for
additional caching configurations. In this setup, userspace should
only utilize the first N entries, higher entries are reserved for
future use.
It creates a fixed register set that is programmed across the different
engines so that all engines have the same table. This is done as the
main RCS context only holds the registers for itself and the shared
L3 values. By trying to keep the registers consistent across the
different engines it should make the programming for the registers
consistent.
v2:
-'static const' for private data structures and style changes.(Matt Turner)
v3:
- Make the tables "slightly" more readable. (Damien Lespiau)
- Updated tables fix performance regression.
v4:
- Code formatting. (Chris Wilson)
- re-privatised mocs code. (Daniel Vetter)
v5:
- Changed the name of a function. (Chris Wilson)
v6:
- re-based
- Added Mesa table entry (skylake & broxton) (Francisco Jerez)
- Tidied up the readability defines (Francisco Jerez)
- NUMBER of entries defines wrong. (Jim Bish)
- Added comments to clear up the meaning of the tables (Jim Bish)
Signed-off-by: Peter Antoine <peter.antoine@intel.com>
v7 (Francisco Jerez):
- Don't write L3-specific MOCS_ESC/SCC values into the e/LLC control
tables. Prefix L3-specific defines consistently with L3_ and
e/LLC-specific defines with LE_ to avoid this kind of confusion in
the future.
- Change L3CC WT define back to RESERVED (matches my hardware
documentation and the original patch, probably a misunderstanding
of my own previous comment).
- Drop Android tables, define new minimal tables more suitable for the
open source stack.
- Add comment that the MOCS tables are part of the kernel ABI.
- Move intel_logical_ring_begin() and _advance() calls one level down
(Chris Wilson).
- Minor formatting and style fixes.
v8 (Francisco Jerez):
- Add table size sanity check to emit_mocs_control/l3cc_table() (Chris
Wilson).
- Add comment about undefined entries being implicitly set to uncached
for forwards compatibility.
v9 (Francisco Jerez):
- Minor style fixes.
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Acked-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>