The size of the surface is irrelevant for external surfaces, since it is
handled by the native compositor with each call to AttachExternalImage. By
removing the size from the hash key, this patch ensures that real or
imagined changes to surface sizes for external surfaces will no longer
call create_compositor_external_surface, thrashing the resources of the
native compositor.
Differential Revision: https://phabricator.services.mozilla.com/D175495
This adds the new infrastructure for rendering masked primitives
and uses it for simple rectangle primitives. Follow up patches
will port other primitives to it (and transformed rectangles).
Instead of rendering an alpha mask and then applying that during
picture cache rendering of content, the underlying content is
drawn to an off-screen surface, and the mask is applied on
top of that via multiplicative blending.
This is particularly helpful for applying masks to dynamically
rendered pictures in future, as we can apply the mask over the
already rendered picture without allocating an extra surface.
Since the content and mask is rendered together to a surface,
we can take advantage of this in future by caching the result
in the texture cache, rather than a temporary render target.
This means we don't need to redraw clip masks for this content
each time the surrounding area is invalidated.
Since the clip-mask is rendered in to the off-screen surface,
it is cheaper and simpler to composite the content in to the
main scene, avoiding an extra texture fetch and some tricky
fragment shader logic to sample the correct part of the mask.
To reduce the number of off-screen pixels that get drawn, the
system supports splitting the content up in to a series of
segments. This can either be a 9-patch, for the simple and
common case of a single rounded clip, or a tile grid across
the primitive. The tile grid can make it much faster to apply
large image masks, where there are often large areas that we
can determine are not affected by the mask image.
Differential Revision: https://phabricator.services.mozilla.com/D173095
This adds the new infrastructure for rendering masked primitives
and uses it for simple rectangle primitives. Follow up patches
will port other primitives to it (and transformed rectangles).
Instead of rendering an alpha mask and then applying that during
picture cache rendering of content, the underlying content is
drawn to an off-screen surface, and the mask is applied on
top of that via multiplicative blending.
This is particularly helpful for applying masks to dynamically
rendered pictures in future, as we can apply the mask over the
already rendered picture without allocating an extra surface.
Since the content and mask is rendered together to a surface,
we can take advantage of this in future by caching the result
in the texture cache, rather than a temporary render target.
This means we don't need to redraw clip masks for this content
each time the surrounding area is invalidated.
Since the clip-mask is rendered in to the off-screen surface,
it is cheaper and simpler to composite the content in to the
main scene, avoiding an extra texture fetch and some tricky
fragment shader logic to sample the correct part of the mask.
To reduce the number of off-screen pixels that get drawn, the
system supports splitting the content up in to a series of
segments. This can either be a 9-patch, for the simple and
common case of a single rounded clip, or a tile grid across
the primitive. The tile grid can make it much faster to apply
large image masks, where there are often large areas that we
can determine are not affected by the mask image.
Differential Revision: https://phabricator.services.mozilla.com/D173095
Since landing bug 1823411 we have been receiving crash reports on a
variety of Mali-T devices when attempting to compile the brush_blend
shader. This appears to be due to changing v_color_mat to mediump,
though the reason why that crashes is currently unknown. This patch
reverts it to highp to avoid the crash.
This is being landed as-is due to being so late in the cycle, in order
to prevent crashes making it to beta. Further work should be to
determine precisely what conditions cause the crash, and add a test to
ensure we do not encounter it again.
Differential Revision: https://phabricator.services.mozilla.com/D174722
It is a preparation for Bug 1804233. From bug 1776885 comment 6, we need to defer WebRender rendering on render thread if remote texture is not ready. To do it, ResultMsg messages handling needs to be controlled.
By adding the followings, ResultMsg messages handling could be controlled by RenderThread.
- FramePublishId to new_frame_ready()
- Renderer::set_target_frame_publish_id()
new_frame_ready() has frame_publish_id. The publish_id could be set to Renderer::set_target_frame_publish_id(). Then Renderer::update() defers processing of ResultMsg, if frame_publish_id of ResultMsg::PublishDocument exceeds target_frame_publish_id.
Differential Revision: https://phabricator.services.mozilla.com/D173512
At the same time, simplify the inner region support from arbitrary
tile configs to an inner + AA edge section setup. It turns out that
we won't need the extra tiling functionality in follow up patches.
Differential Revision: https://phabricator.services.mozilla.com/D172749
In bug 1823411 prim_shared.glsl's vClipMaskUv was made mediump,
assuming it was safe to do so as it is an unnormalized texture
coordinate. This is, however, causing fuzziness in a test on Adreno
devices, so we are now reverting it to highp.
Differential Revision: https://phabricator.services.mozilla.com/D173468
When upgrading the NDK to r23, the wrench builds for android fail
because cargo apk starts adding flags to the cargo rustc call it does,
and that's not compatible with the wrench crate having both a lib and a
bin target.
As cargo apk only packages the lib in the apk, we can just be explicit
and build the library only.
Differential Revision: https://phabricator.services.mozilla.com/D173244
When a new scene is swapped in on the render backend thread, its
scroll frames have scroll offsets that come from the main thread
and do not reflect async scroll deltas until such deltas are
sampled from APZ.
It's possible for hit-testing to observe the scene in this
temporary state, potentially leading to incorrect hit-test results.
To avoid this, save the async offsets from the previous scene
and apply them to the new scene until we can sample proper offsets
from APZ.
Differential Revision: https://phabricator.services.mozilla.com/D173100
No functionality change here, just preparing for having different
variations of shaders that make use of ps_quad as a base.
Differential Revision: https://phabricator.services.mozilla.com/D172197
When a new scene is swapped in on the render backend thread, its
scroll frames have scroll offsets that come from the main thread
and do not reflect async scroll deltas until such deltas are
sampled from APZ.
It's possible for hit-testing to observe the scene in this
temporary state, potentially leading to incorrect hit-test results.
To avoid this, save the async offsets from the previous scene
and apply them to the new scene until we can sample proper offsets
from APZ.
Differential Revision: https://phabricator.services.mozilla.com/D173100
The previous patch in this series ensured that every varying is now
given an explicit precision. We want to use mediump where possible for
performance reasons, and highp when required for correctness. Going
forward, in order to ensure that developers have considered what
precision is required for new varyings, this patch adds a shader test
to ensure that they are explicitly qualified.
Shader tests have until now used the `glsl` crate. And as the glsl
crate does not handle preprocessor directive properly, we ran them on
the pre-optimized shader sources. However, the optimization pass
outputs explicit varying precisions even if the input did not contain
them, so that does not work for this case. Instead, we have switched
to use the`glsl-lang` crate, which does handle preprocessor directives
correctly. This does add some duplicate crate dependencies, however
this only affects wrench, not webrender itself.
Differential Revision: https://phabricator.services.mozilla.com/D173029
Mali profiling tools have shown we are frequently fragment bound, in
particular due to varying interpolation. To help mitigate this, we
should use mediump where possible. Currently most of our varyings are
implicitly highp by default. This patch reduces their precision to
mediump where possible. When varyings must remain highp for
correctness reasons, this is now stated explicitly.
As expected, this does cause a fair bit of reftest fuzziness on
Android devices. This patch also updates reftest expectations to
reflect this.
Differential Revision: https://phabricator.services.mozilla.com/D173028
Previously, we would do a fine-grained visibility check for
prims against the dirty rect stack (after coarse grained
tile visibility), then prepare the primitive, then determine
which command buffer(s) the prim should be added to, based
on which tile(s) the prim affects.
The patch changes this so that the fine-grained visibility
check returns a list of command buffer(s) that the prim
should be added to. This is passed to the prim prepare
step, and then used to directly add prims to the buffers
rather than checking which tiles are affected by the prim.
The motivation for doing this will become apparent in
follow up patches. We want to be able to encode
multiple command buffer commands per-prim, whereas it
was previously only possible to encode primitive
commands. By allowing prim-prepare to write directly
to the command buffers, rather than return a list of
primitive commands, we can write whatever commands
are needed. Future patches will use this to write
segment rect streams, and other information.
A side effect of this is that the `tile_rect` field
in the `PrimitiveVisibility` struct is no longer
required. This reduces the size of `PrimitiveInstance`
from 104 bytes to 88 bytes, which is likely to be
a reasonable performance win on pages that have
high primitive counts.
Differential Revision: https://phabricator.services.mozilla.com/D172081
Previously, we would do a fine-grained visibility check for
prims against the dirty rect stack (after coarse grained
tile visibility), then prepare the primitive, then determine
which command buffer(s) the prim should be added to, based
on which tile(s) the prim affects.
The patch changes this so that the fine-grained visibility
check returns a list of command buffer(s) that the prim
should be added to. This is passed to the prim prepare
step, and then used to directly add prims to the buffers
rather than checking which tiles are affected by the prim.
The motivation for doing this will become apparent in
follow up patches. We want to be able to encode
multiple command buffer commands per-prim, whereas it
was previously only possible to encode primitive
commands. By allowing prim-prepare to write directly
to the command buffers, rather than return a list of
primitive commands, we can write whatever commands
are needed. Future patches will use this to write
segment rect streams, and other information.
A side effect of this is that the `tile_rect` field
in the `PrimitiveVisibility` struct is no longer
required. This reduces the size of `PrimitiveInstance`
from 104 bytes to 88 bytes, which is likely to be
a reasonable performance win on pages that have
high primitive counts.
Differential Revision: https://phabricator.services.mozilla.com/D172081
In bug 1787520 and bug 1795614 we added a workaround for a driver bug
on Mali G78 and G710 GPUs. It turns out this also affects the G77, now
that some devices have received software updates containing an updated
driver version. This patch applies the workaround to the G77 as well,
as long as the driver version is affected.
Differential Revision: https://phabricator.services.mozilla.com/D171740
The main goal of this is to fix an implementation detail where the
WR code had to read every primitive in the tile even when checking
if a small sub-tile was valid (as the advance amounts of the
primitive dependency array vectors was stored in each primitive).
However, this patch itself is quite a significant optimization, it
improves displaylist_mutate by ~16%.
Instead of maintaining separate arrays for each dependency, use
a single byte array and use peek-poke to store these dependencies.
This simplifies the code for comparing dependencies, and makes the
traversal of sparse index buffers of the primitive array much faster.
Differential Revision: https://phabricator.services.mozilla.com/D170710
Also update some FOG tests that are now incorrect (label limits have been
lifted).
The `default_features = false` on `env_logger` are to avoid a new, duplicate
dependency on hermit-abi.
Differential Revision: https://phabricator.services.mozilla.com/D170816
To run webrender's wrench tests the app needs to read a file to parse
the command line args from, and write a file with the test
output. These files also need to be written and read manually over adb
by the engineer running the tests (or more likely by the
android_wrench.py script).
Currently we use the external data dir for these files. However, on
recent android versions these are no longer accessible to the app
without jumping through some hoops. This patch makes us instead use
the internal data dir. It also makes the app "debuggable" so that
these files can be written to via adb using run-as.
Additionally, we must ensure that the android_wrench.py script uses
run-as instead of root to push and pull the files, as root does not
have permission to do so on recent android versions.
Differential Revision: https://phabricator.services.mozilla.com/D169829
Simplifies some upcoming work to change how we store these when
updating primitive dependencies during picture cache updates.
Differential Revision: https://phabricator.services.mozilla.com/D170546
This patch exposes an unrelated issue that causes a performance regression.
For now, we'll revert this to get back to a normal baseline. Then, separately
fix the underlying code which regresses perf, then re-land this patch.
Revert "Bug 1811978 - Enable the new tiled rendering path, update test expectations r=gfx-reviewers,nical"
This reverts commit 7f3a2568aabf9fa2358fe0f7421042ba85a23442.
Differential Revision: https://phabricator.services.mozilla.com/D170399
This is an unsavoury workaround to let the fuzzing team make progress while we are sorting through our blob layerization and sizing issues.
This adds a hidden pref "gfx.webrender.debug.restrict-blob-size", which, when set to true, clamps the size of blob images to 2048x2048.
This means that bigger blobs will render incorrectly but will be less likely to cause OOMs.
Differential Revision: https://phabricator.services.mozilla.com/D169944
A consequence of the previous patch in this series, which allows
partial picture cache tile invalidation on recent Mali GPUs, is that
we will start using a scissored clear to clear the tile if the dirty
region is smaller than the entire tile. We saw in bug 1809738 that a
scissored clear can be less efficient.
If it allows us to re-render less of the tile that is probably a
worthwhile trade off. However, we frequently encounter a case where
the entire valid_rect of a tile is dirty, but that is smaller than the
entire texture. This is because our tiles are 1024x512 pixels, so for
example on a 1080 pixel wide screen, the 2nd column of tiles will only
have a valid_rect that is 56 pixels wide. In such cases, using a
scissored clear does not reduce the amount of rendering required.
This patch therefore makes it so that we use an unscissored clear (on
devices where that is preferrable) if the dirty_rect is equal to the
valid_rect.
Depends on D169890
Differential Revision: https://phabricator.services.mozilla.com/D169891
We previously blocked partial picture cache tile invalidation on
Mali-T and Mali-G devices due to several bugs. These issues no longer
appear to reproduce on recent Mali GPUs: eg G77, G78, G710 which are
all "Valhall" architecture. This patch therefore keeps the workaround
for all Midgard and Bifrost GPUs, but removes it for others.
Differential Revision: https://phabricator.services.mozilla.com/D169890
If the glslopt pass fails to optimize a shader, we helpfully print the
shader source so that it is easy to find what the problem is. However,
we print this from within the build_parallel call, meaning if multiple
shaders fail to optimize we print all of their sources and they get
interleaved with each other, making it very difficult to find the
problematic line.
This patch makes us delay printing the source until the same place
where we print the error log, after we have stopped processing in
parallel. This means we will only print the source of a single
shader (the first one which fails to optimize), along with the error
log for that shader, which is much easier to debug.
This also includes a small tidy-up to handle both vertex and fragment
shaders in a loop to remove code duplication, and runs rustfmt.
Differential Revision: https://phabricator.services.mozilla.com/D169618