gecko-dev/gfx/wr/wrench/benchmarks
Glenn Watson 702c53489e Bug 1528157 - Optimize GPU time for clip mask generation. r=kvark
On integrated GPUs, we are typically completely bound by memory
bandwidth and the number of pixels that get written / blended.

On real world pages, it's often the case that we end up with
clip tasks that are long in one dimension but not the other, due
to box-shadow edges, clip mask segments etc. When this occurs,
the logic that tries to get a small 'used_rect' to clear targets
to fails, since the union of those ends up being a very large
rect that covers (most of) the surface. This can cost a lot of
GPU time on some integrated chipsets.

Instead, it appears to be much faster to issue multiple clears,
one for each clip mask region, which is typically < 10% of the
surface we were clearing previously.

However, we can also restore an old optimization we used to have
which means we can skip clears altogether in the common case. The
first mask in a clip task will write to all the pixels in the mask,
so we can draw that with blending disabled (also a significant win
on integrated GPUs) and skip the clear in these cases. With this
functionality in place, the multiplicative blend mode is only
enabled for any clips other than the first in a mask (this is
quite a rare case - most clip tasks end up with a single mask).

On low end GPUs driving a 4k screen, I've measured GPU wins of up
to 5 ms/frame on some real world pages with this change.

Differential Revision: https://phabricator.services.mozilla.com/D19893

--HG--
extra : moz-landing-system : lando
2019-02-19 20:52:27 +00:00
..
aligned-gradient.yaml
benchmarks.list
box-shadow-large.yaml
clip-clear.yaml
large-blur-radius.yaml
large-boxshadow-ellipse-2.yaml
large-boxshadow-ellipse.yaml
large-clip-rect.yaml
many-box-shadows.yaml
many-images.yaml
overlapping-text-shadows.yaml
radial-gradient.yaml
simple-batching.yaml
text-rendering.yaml
transforms-simple.yaml
unaligned-gradient.yaml