This patch implements the majority of the planned picture caching
improvements. It supports most of the functionality required to
(as a follow up) support OS compositor integration. It also improves
on the robustness and functionality of the previous picture caching
implementation.
There are some expected temporary performance regressions in
some cases (such as content that is constantly invalidating) and
during initial page render when many render targets must be drawn
to. These performance regressions will be resolved in follow up
commits by supporting multi-resolution tiles.
The scene is split into a number of slices, determined by the scroll
root of each primitive, which can be found by the primitive's
spatial node indices. If a scene contains too many slices, then
picture caching is disabled on the page, to avoid excessive texture
memory usage, and rendering falls back to rasterizing each frame.
The specific changes in this patch are:
* Support tile caches for multiple scroll roots, allowing the
entire page (including fixed divs and the main UI bar) to be
cached in most cases, in addition to the main content.
* Remove requirement to read tiles back from the framebuffer.
Instead, they are drawn into the picture cache target tiles,
and blitted to the screen. This is slightly slower than the
existing picture caching when content is constantly changing,
however this cost will disappear / become irrelevant when
the OS compositor integration work is complete.
* Switch picture cache render targets to be nearest sampled (they
are always rendered 1:1) and support depth buffer targets.
* Make use of the external scroll offset support to allow removal
of the primitive correlation hacks in the previous picture
caching implementation. Also allows storing of primitive
dependencies in picture space rather than world space, which
reduces floating point inaccuracies.
* Determine if each tile and picture cache can be considered
opaque. This is used to determine whether subpixel AA text
rendering is available on a slice, and for rendering optimizations
related to disabling blending and/or tile clears.
* Use the clip chain instance results from the recent visibility pass
work to determine clip chain dependencies. This results in fewer
clip item dependencies in tiles, which is faster to check validity
and reduces redundant invalidations.
* Remove extra overhead during batching related to batch lists,
and region iteration, as they are no longer required.
* Support PrimitiveVisibilityMask during batching. This allows a
single traversal of a picture (surface) root during batching to
efficiently construct multiple alpha batcher objects (typically
one per invalida tile).
* Picture caching is now handled implicitly by WR, depending on
the content of the scene. There is no requirement for client
code to manually select which stacking context should be cached.
* Simplify how clip chain / transform dependencies are tracked by
picture cache tiles.
* Support pushing / popping enclosing clip chain roots without
the need for a stacking context / picture in some cases. This
simplifies the logic to split the scene into multiple slices.
The main remaining work in this area is (a) extend the code to
optionally provide each slice as an input to the OS compositor
rather than drawing the tiles in WR, and (b) support multi-resolution
tiles so that we reduce the draw call, batching and render target
overhead in cases where much of the page content is changing.
Differential Revision: https://phabricator.services.mozilla.com/D34319
--HG--
extra : moz-landing-system : lando
Use the external scroll offsets provided by Gecko to:
(a) Offset primitives and clips by accumulated scroll offset.
(b) Adjust the scroll transforms and hit test results.
This allows primitives and clips to be stored in a true local space,
that is consistent between display lists, even if scrolling has
occurred. This is a step towards planned picture caching improvements.
Differential Revision: https://phabricator.services.mozilla.com/D27856
--HG--
extra : moz-landing-system : lando
The existing linear gradient shader is quite slow, especially
on very large gradients on integrated GPUs.
The vast majority of gradients in real content are very simple
(typically < 4 stops, no angle, no repeat). For these, we can
run a fast path to persist a small gradient in the texture cache
and draw the gradient via the image shader.
This is _much_ faster than the catch-all gradient shader, and also
results in better batching while drawing the main scene.
In future, we can expand the fast path to handle more cases, such
as angled gradients. For now, it takes a conservative approach,
but still seems to hit the fast path on most real content.
Differential Revision: https://phabricator.services.mozilla.com/D22445
--HG--
extra : moz-landing-system : lando
add fuzzy-if statements to allow reftests to run on new windows10 AMI image
Differential Revision: https://phabricator.services.mozilla.com/D12553
--HG--
extra : moz-landing-system : lando
This patch was written entirely by the following script:
#!/bin/bash
if [ ! -d "./.hg" ]
then
echo "Not in a source tree." 1>&2
exit 1
fi
find . -regex '.*\(ref\|crash\)test.*\.list' | while read FILENAME
do
echo "Processing ${FILENAME}."
# The following has four substitutions:
# * The first one replaces the *first* argument to fuzzy() when it doesn't
# have a - in it, by replacing it with an explicit 0-N range.
# * The second one does the same for the *second* argument to fuzzy().
# * The third does the same for the *second* argument to fuzzy-if().
# * The fourth does the same for the *third* argument to fuzzy-if().
#
# Note that this is using perl rather than sed because perl doesn't
# support non-greedy matching, which is needed for the first argument to
# fuzzy-if.
perl -pi -e 's/(fuzzy\()([^ ,()-]*)(,[^ ,()]*\))/${1}0-${2}${3}/g;s/(fuzzy\([^ ,()]*,)([^ ,()-]*)(\))/${1}0-${2}${3}/g;s/(fuzzy-if\([^ ]*?,)([^ ,()-]*)(,[^ ,()]*\))/${1}0-${2}${3}/g;s/(fuzzy-if\([^ ]*?,[^ ,()]*,)([^ ,()-]*)(\))/${1}0-${2}${3}/g' "${FILENAME}"
done
Differential Revision: https://phabricator.services.mozilla.com/D2974
--HG--
extra : moz-landing-system : lando
This covers all the reftests that have lower fuzz (or zero fuzz) and
were producing an UNEXPECTED-PASS result with webrender on windows. In
many cases I just adjusted the lower bound of the existing webrender
fuzz. In other cases existing fails-if conditions had to be tweaked to
exclude webrender.
MozReview-Commit-ID: 49LvS0vuYWR
--HG--
extra : rebase_source : d194e24affb87fe4560a127ff4016f9c38f414fd
In these cases there is no human-visible difference in the WR renderings
of the test and reference images, so we can replace the failing
annotations with fuzzy ones instead.
MozReview-Commit-ID: Ioiem4ZwwI2
--HG--
extra : rebase_source : d2204d9674b4aba6eb78bc0d0007a40121564ac5
This patch:
- adds fails-if annotations for all the reftests that were consistently failing
with layers-free turned on.
- removes fails-if or reduces the range on fuzzy-if annotations for all
the reftests that were producing UNEXPECTED-PASS results with
layers-free turned on.
- adds skip-if, random-if, or fuzzy-if annotations to the reftests that
were intermittently failing due to timeout, obvious incorrectness, or
slight pixel differences, respectively.
MozReview-Commit-ID: A0Aknn6rnjj
--HG--
extra : rebase_source : 420d9cf43f23a5d654fa36eec69138937d13c173
Skip tests that are expected to fail in both Stylo and Gecko modes. They would unexpectedly "pass" in styloVsGecko mode when comparing the two failures, which is not a useful result.
MozReview-Commit-ID: 3mOpjU225Q1
--HG--
extra : rebase_source : 22bb5d4e3c5138ef832995eaf5716824f4707ffe
extra : source : d40fb20c9a49d0797c0eeae613a04912b12a28f7
Skip tests that are expected to fail in both Stylo and Gecko modes. They would unexpectedly "pass" in styloVsGecko mode when comparing the two failures, which is not a useful result.
MozReview-Commit-ID: 3mOpjU225Q1
--HG--
extra : rebase_source : 0c307639c3626af3b6b43e05d3ee73d08b3f47ce