gradient-move-stops.html ends up failing because a gradient stop is offset by a small amount but it causes a large amount of pixels to differ by 255, beyond what is reasonable to mark with fuzzy thresholds.
Differential Revision: https://phabricator.services.mozilla.com/D111127
Bug 1687977 removed brush_radial_gradient as well as the SWGL fast-path with it. This
adds it back inside cs_radial_gradient.
Differential Revision: https://phabricator.services.mozilla.com/D109195
The same optimization of looking for merged linear gradients can also be
applied to radial gradients by solving the quadratic equation to check
how large a span we can process within a given merged span. This allows
us to save a bunch of table lookup and some other math in the inner loops.
Differential Revision: https://phabricator.services.mozilla.com/D105858
For Draw (non-native) and CA modes, we include the per-tile
valid rect in the clip rect from the surface.
For DC (non-virtual) mode, a per-tile clip rect is set on the
visual for each tile, separate from the overall clip rect that
is set on the surface visual.
For DC (virtual) mode, the Trim API is used to remove pixels
in the virtual tile area that are outside the valid / clipped
region.
Note: Although the valid rect is now applied in the native
compositors, it's currently only based on the overall picture
cache bounding rect. Thus, with this patch there isn't any
noticeable performance improvement. Once this lands and is
working correctly, the follow up patch to calculate a smaller
valid region per-tile is a small amount of work.
Differential Revision: https://phabricator.services.mozilla.com/D61424
--HG--
extra : moz-landing-system : lando
The gradient code is the only one that does a really weird thing with
LengthPercentage values, by getting the percentage and length separately and
turning the length into a percentage relative to the line length (which is in
device pixels).
This won't work once we have min() / max() / etc. in CSS (as we can't access
the length and percentage components separately, as which one you choose may
depend on the percentage basis). So instead of that, use the regular
ResolveToCssPixels there are lengths involved.
We change a bit the surrounding code to work in CSS pixels, so as to avoid
unneeded CSS -> device pixel conversions.
Differential Revision: https://phabricator.services.mozilla.com/D60159
--HG--
extra : moz-landing-system : lando
The gradient code is the only one that does a really weird thing with
LengthPercentage values, by getting the percentage and length separately and
turning the length into a percentage relative to the line length (which is in
device pixels).
This won't work once we have min() / max() / etc. in CSS (as we can't access
the length and percentage components separately, as which one you choose may
depend on the percentage basis). So instead of that, use the regular
ResolveToCssPixels there are lengths involved.
We change a bit the surrounding code to work in CSS pixels, so as to avoid
unneeded CSS -> device pixel conversions.
Differential Revision: https://phabricator.services.mozilla.com/D60159
--HG--
extra : moz-landing-system : lando
There are a number of issues with the current gradient dithering
implementation, that cause many test failures and also fuzziness
rendering when enabling DirectComposition virtual surfaces. In
particular, the dither result is dependent on the offset of the
update rect within a render target.
For now, this patch disables gradient dithering by default. This
gives us:
- A heap of new test PASS results (or reduced fuzziness).
- Fixes a number of non-deterministic fuzziness bugs with DC.
- Improves performance of gradient rendering by a reasonable amount.
We can fix gradient dithering as a follow up, and re-enable if/when
we find content that would benefit from it significantly (we may
be able to improve gradients in other ways than dithering too).
Differential Revision: https://phabricator.services.mozilla.com/D60460
--HG--
extra : moz-landing-system : lando
This patch implements the majority of the planned picture caching
improvements. It supports most of the functionality required to
(as a follow up) support OS compositor integration. It also improves
on the robustness and functionality of the previous picture caching
implementation.
There are some expected temporary performance regressions in
some cases (such as content that is constantly invalidating) and
during initial page render when many render targets must be drawn
to. These performance regressions will be resolved in follow up
commits by supporting multi-resolution tiles.
The scene is split into a number of slices, determined by the scroll
root of each primitive, which can be found by the primitive's
spatial node indices. If a scene contains too many slices, then
picture caching is disabled on the page, to avoid excessive texture
memory usage, and rendering falls back to rasterizing each frame.
The specific changes in this patch are:
* Support tile caches for multiple scroll roots, allowing the
entire page (including fixed divs and the main UI bar) to be
cached in most cases, in addition to the main content.
* Remove requirement to read tiles back from the framebuffer.
Instead, they are drawn into the picture cache target tiles,
and blitted to the screen. This is slightly slower than the
existing picture caching when content is constantly changing,
however this cost will disappear / become irrelevant when
the OS compositor integration work is complete.
* Switch picture cache render targets to be nearest sampled (they
are always rendered 1:1) and support depth buffer targets.
* Make use of the external scroll offset support to allow removal
of the primitive correlation hacks in the previous picture
caching implementation. Also allows storing of primitive
dependencies in picture space rather than world space, which
reduces floating point inaccuracies.
* Determine if each tile and picture cache can be considered
opaque. This is used to determine whether subpixel AA text
rendering is available on a slice, and for rendering optimizations
related to disabling blending and/or tile clears.
* Use the clip chain instance results from the recent visibility pass
work to determine clip chain dependencies. This results in fewer
clip item dependencies in tiles, which is faster to check validity
and reduces redundant invalidations.
* Remove extra overhead during batching related to batch lists,
and region iteration, as they are no longer required.
* Support PrimitiveVisibilityMask during batching. This allows a
single traversal of a picture (surface) root during batching to
efficiently construct multiple alpha batcher objects (typically
one per invalida tile).
* Picture caching is now handled implicitly by WR, depending on
the content of the scene. There is no requirement for client
code to manually select which stacking context should be cached.
* Simplify how clip chain / transform dependencies are tracked by
picture cache tiles.
* Support pushing / popping enclosing clip chain roots without
the need for a stacking context / picture in some cases. This
simplifies the logic to split the scene into multiple slices.
The main remaining work in this area is (a) extend the code to
optionally provide each slice as an input to the OS compositor
rather than drawing the tiles in WR, and (b) support multi-resolution
tiles so that we reduce the draw call, batching and render target
overhead in cases where much of the page content is changing.
Differential Revision: https://phabricator.services.mozilla.com/D34319
--HG--
extra : moz-landing-system : lando
Use the external scroll offsets provided by Gecko to:
(a) Offset primitives and clips by accumulated scroll offset.
(b) Adjust the scroll transforms and hit test results.
This allows primitives and clips to be stored in a true local space,
that is consistent between display lists, even if scrolling has
occurred. This is a step towards planned picture caching improvements.
Differential Revision: https://phabricator.services.mozilla.com/D27856
--HG--
extra : moz-landing-system : lando
The existing linear gradient shader is quite slow, especially
on very large gradients on integrated GPUs.
The vast majority of gradients in real content are very simple
(typically < 4 stops, no angle, no repeat). For these, we can
run a fast path to persist a small gradient in the texture cache
and draw the gradient via the image shader.
This is _much_ faster than the catch-all gradient shader, and also
results in better batching while drawing the main scene.
In future, we can expand the fast path to handle more cases, such
as angled gradients. For now, it takes a conservative approach,
but still seems to hit the fast path on most real content.
Differential Revision: https://phabricator.services.mozilla.com/D22445
--HG--
extra : moz-landing-system : lando