It's always supplied by Gecko anyway, and being able to rely on this
will make it easier to create stable spatial node IDs that persist
across display lists.
Differential Revision: https://phabricator.services.mozilla.com/D100076
These NVIDIA device resets are specific to Linux and trying to handle
them more gracefully is increasingly difficult. There are many
textures/buffers that we need to clear inside WebRender, but attempting
to add them to the list has proved difficult due to the number of places
we need to add, as well as race conditions with clearing them. Given
this shouldn't happen often, it doesn't seem worth optimizing for and we
should treat it just as an innocent device reset.
Testing this revealed an issue during recovery where unflushed device
resets were not handled as expected. When we checked for errors after
creating a new GL context, we would encounter a GL_CONTEXT_LOST error
which we failed to recover from. This is because we called
GLContext::fGetError instead of the GL method directly; the context lost
state was saved in mContextLost, and any subsequent calls to
GLContext::fGetError would continue to return GL_CONTEXT_LOST.
Differential Revision: https://phabricator.services.mozilla.com/D99905
wr_notifier_wake_up uses RenderThread::WakeUp, which in turn just directly
calls Renderer::Update. As a side-effect, this can queue a composite to the
main framebuffer deep inside the renderer, but without having gone through
the normal pathway of Renderer::UpdateAndRender. UpdateAndRender ensures
that any RenderCompositor is properly prepared for the frame by calling
BeginFrame and other hooks as appropriate. But since we went through just
Update instead, there is never any call to BeginFrame and the SWGL framebuffer
never gets a chance to be properly set up in the RenderCompositor. In such
cases that we actually need to composite to the framebuffer, it seems more
appropriate to call UpdateAndRender, which also supports a boolean indicating
whether or not we actually intend to render something. To further simplify,
we just reuse the existing HandleFrameOneDoc handler to avoid needing separate
entry-points into UpdateAndRender.
Differential Revision: https://phabricator.services.mozilla.com/D99733
This patch adds infrastructure for crash reporter annotations to
WebRender. This is used to expose the new annotation,
GraphicsCompileShader, to indicate which shader we are in the process of
compiling. We often see crash reports when compiling shaders, and it
would be useful to know which one it is compiling.
This also adds another annotation, IsWebRenderResourcePathOverridden,
which is useful to know if someone overrode the shader resource path for
testing purposes. We can likely ignore any crash reports that have this
annotation set.
Differential Revision: https://phabricator.services.mozilla.com/D99736
There might be some overlap with memory counted elsewhere and some of
the size calculations could be wrong but it should give us an overall
picture.
Differential Revision: https://phabricator.services.mozilla.com/D99562
If gfx.webrender.allow-partial-present-buffer-age is false then we do
not query the egl buffer age, and always render the entire
backbuffer. However, we were still calling eglSetDamageRegion if
KHR_partial_present is available. Calling eglSetDamageRegion without
first querying the buffer age is an error, which was causing an
assertion failure in debug builds. To fix this, check the value of the
pref before calling eglSetDamageRegion.
Differential Revision: https://phabricator.services.mozilla.com/D99659
This ID allows the compositor to track per-frame information from frame
generation, through APZ sampling, to the NotifyDidRender notification.
Differential Revision: https://phabricator.services.mozilla.com/D97535
This reduces some code duplication, and makes it clear which things we want to
do only once per window, and which things we want to do once per pipeline / CompositorBridge.
This change affects the ordering of messages, but hopefully not in a way that
anybody was relying on. Specifically, the image notifications are now sent before
the NotifyDidPaint for the root CompositorBridge is sent.
Differential Revision: https://phabricator.services.mozilla.com/D97533
Aside from on Windows, we do not appear to handle device resets properly
without the GPU process. This patch adds in the necessary plumbing to
handle the device reset properly. It also ensures that whenever we check
for a device reset reason, we handle all of the reasons (e.g. not just
the NV video memory purge reset reason) to ensure they are not lost, and
handles them all consistently in the same manner.
It also tracks the number of device resets for thresholding purposes
with an in process compositor. While it will only disable WebRender on
Linux at this time, it will put a note in the critical log if the
threshold was exceeded on all platforms. This may prove useful in
evaluating whether or not we should do the same everywhere.
Differential Revision: https://phabricator.services.mozilla.com/D98705
The pixel-local-storage functionality was an experiment for faster
drawing of clip masks on low end tiled GPUs. However, it's never
reached a point where it was shippable and showing clear performance
wins.
This patch removes the experimental PLS support - we can always
revive it from git history if we ever want to consider it again.
Differential Revision: https://phabricator.services.mozilla.com/D98290
The composite code uses the size of the dest rect to determine scaling, and we don't want this to be affected by the clip.
Differential Revision: https://phabricator.services.mozilla.com/D97707
In the following circumstances, WR was failing to detect a
composite was required:
- There is a picture cache slice that is smaller than a single tile.
- The position of that picture cache slice is changed.
- No other content invalidations occur.
This clip rect in the composite descriptor must include the
device_valid_rect rather than the tile device_rect. This ensures
that in the case of a picture cache slice that is smaller than a
single tile, the clip rect in the composite descriptor will change
if the position of that slice is changed. Otherwise, WR may conclude
that no composite is needed if the tile itself was not invalidated
due to changing content.
Differential Revision: https://phabricator.services.mozilla.com/D96966
Fix an unintentional bug where I had not originally anticipated the renderer
thread to be waiting on the jobs available condition, so had avoided trying to
signal the condition. Later revisions of the patch then made the thread wait on
the condition, so it is always necessary to signal from either thread sending
jobs to make sure either thread wakes up when necessary to process jobs.
Differential Revision: https://phabricator.services.mozilla.com/D97194
There appears to be a substantial overhead for trying to wake cold threads
from a thread pool (especially with rayon), so for now let's leave the existing
thread spawning in place, but reduce the stack size for individual threads.
Since these threads only call into SWGL's composite routines and do little else,
there isn't much harm in having a small stack size.
Differential Revision: https://phabricator.services.mozilla.com/D96748
We seem to be spending a significant amount of time inside crossbeam channels
sending jobs across. Sending multiple bands for a given job tends to relock the
channel multiple times when ideally we only want to queue the job once, or lock
the channel once (neither of which we can conveniently do with crossbeam). To
alleviate this, I've implemented a more custom solution with a mutexed VecDeque
and some Condvars.
Differential Revision: https://phabricator.services.mozilla.com/D96520
Move the calculation of the dirty rects array earlier in frame
drawing, and supply that to the `start_compositing` method of
the compositor trait.
For now, it's assumed that the native compositor wants a single
dirty rect, and doesn't use buffer-age functionality. These
params will be configurable as part of the compositor capabilities
struct in follow up patches.
Differential Revision: https://phabricator.services.mozilla.com/D95828
Implementing the Draw compositor via the native compositor interface
is simpler if the buffer age is passed into the top level render method.
Differential Revision: https://phabricator.services.mozilla.com/D95824
Implementing the Draw compositor via the native compositor interface
is simpler if the buffer age is passed into the top level render method.
Differential Revision: https://phabricator.services.mozilla.com/D95824
This removes some of the complexity in the renderer associated with
drawing multiple document layers in a single render. It retains
the rest of the document API, which will be used to implement the
functionality in bug #1654938.
Differential Revision: https://phabricator.services.mozilla.com/D95478
Allow-list all Python code in tree for use with the black linter, and re-format all code in-tree accordingly.
To produce this patch I did all of the following:
1. Make changes to tools/lint/black.yml to remove include: stanza and update list of source extensions.
2. Run ./mach lint --linter black --fix
3. Make some ad-hoc manual updates to python/mozbuild/mozbuild/test/configure/test_configure.py -- it has some hard-coded line numbers that the reformat breaks.
4. Make some ad-hoc manual updates to `testing/marionette/client/setup.py`, `testing/marionette/harness/setup.py`, and `testing/firefox-ui/harness/setup.py`, which have hard-coded regexes that break after the reformat.
5. Add a set of exclusions to black.yml. These will be deleted in a follow-up bug (1672023).
# ignore-this-changeset
Differential Revision: https://phabricator.services.mozilla.com/D94045
Allow-list all Python code in tree for use with the black linter, and re-format all code in-tree accordingly.
To produce this patch I did all of the following:
1. Make changes to tools/lint/black.yml to remove include: stanza and update list of source extensions.
2. Run ./mach lint --linter black --fix
3. Make some ad-hoc manual updates to python/mozbuild/mozbuild/test/configure/test_configure.py -- it has some hard-coded line numbers that the reformat breaks.
4. Make some ad-hoc manual updates to `testing/marionette/client/setup.py`, `testing/marionette/harness/setup.py`, and `testing/firefox-ui/harness/setup.py`, which have hard-coded regexes that break after the reformat.
5. Add a set of exclusions to black.yml. These will be deleted in a follow-up bug (1672023).
# ignore-this-changeset
Differential Revision: https://phabricator.services.mozilla.com/D94045
Allow-list all Python code in tree for use with the black linter, and re-format all code in-tree accordingly.
To produce this patch I did all of the following:
1. Make changes to tools/lint/black.yml to remove include: stanza and update list of source extensions.
2. Run ./mach lint --linter black --fix
3. Make some ad-hoc manual updates to python/mozbuild/mozbuild/test/configure/test_configure.py -- it has some hard-coded line numbers that the reformat breaks.
4. Add a set of exclusions to black.yml. These will be deleted in a follow-up bug (1672023).
# ignore-this-changeset
Differential Revision: https://phabricator.services.mozilla.com/D94045
The code always assumed that the size of the image with the Y plane dimensions, which, while often the case, isn't correct.
We remove the assertions that the display offset was always (0,0) and properly carry the actual data over IPC.
Remoting the theora decoder and enabling fast video copy exposed several other related issues in the various D3D11 image types.
Various WPT uses theora YUV44 images (because we do not support YUV444 H264 ones). Those images are made of 32 pixels planes with a display size set to 20 pixels. Prior P1D the backend image was a ShareYCbCrPlanar image which correctly handled the size settings.
Like the image serializer, the various D3D11 images always assumed that the Y plane size was the image size.
This however expose existing issues where the offset position of the display is completely ignored for some image type. See bug 1669054
All those problems explain why sometimes we displayed more pixels than we should have.
Depends on D91914
Differential Revision: https://phabricator.services.mozilla.com/D92233
In a (large-ish) nutshell:
- Consolidate all counters under a single type.
- Counters are all arranged in an array and referred to via index.
- All counters can be displayed as average+max (float/int), graph, and change indicator.
- Specify what to show and in what form via a pref.
- All counters and visualizations support not having values every frame.
- GPU time queries visualization is easier to read relative to the frame budget:
- If the maximum value is under 16ms, the right side of the graph is fixed at 16ms.
- If the maximum value is above 16ms, draw a vertical bar at 16ms.
- Added a few new profile counters:
- Total frame CPU time (from API send to the end of GPU command submission).
- Visibility, Prepare, Batching and Glyph resolve times.
The main change is how profile counters are represented. Instead of having different types for different visualizations, every counter is represented the same way, tracking average/max values over half a ms and optionally recording a graph over a number of frames. Counters are stored in a vector and referred to via index (See constants at the top of profiler.rs).
The main motivation for this storage is to facilitate adding counters without having to think too much about where to store them and how to pass them to the renderer.
The profiler's UI is defined by a string with with a single syntax:
- Comma separated list of tokens (leading and trailing spaces ignored), which can be:
- A counter name:
- If prefixed with a '#' character, the counter is shown as a graph.
- If prefixed with a '*' character, the counter is shown as a change indicator
- By default (counter name without prefix), the counter is shown as average and max over half a second.
- A preset name:
- A preset is a builtin UI string in the same syntax that can be nested in the main UI string.
- Presets are defined towards the top of profiler.rs and can also refer to other presets.
- An empty token adds a bit of vertical space.
- A '|' token begins a new column.
- A '_' token begins a new row.
Differential Revision: https://phabricator.services.mozilla.com/D93603
In a (large-ish) nutshell:
- Consolidate all counters under a single type.
- Counters are all arranged in an array and referred to via index.
- All counters can be displayed as average+max (float/int), graph, and change indicator.
- Specify what to show and in what form via a pref.
- All counters and visualizations support not having values every frame.
- GPU time queries visualization is easier to read relative to the frame budget:
- If the maximum value is under 16ms, the right side of the graph is fixed at 16ms.
- If the maximum value is above 16ms, draw a vertical bar at 16ms.
- Added a few new profile counters:
- Total frame CPU time (from API send to the end of GPU command submission).
- Visibility, Prepare, Batching and Glyph resolve times.
The main change is how profile counters are represented. Instead of having different types for different visualizations, every counter is represented the same way, tracking average/max values over half a ms and optionally recording a graph over a number of frames. Counters are stored in a vector and referred to via index (See constants at the top of profiler.rs).
The main motivation for this storage is to facilitate adding counters without having to think too much about where to store them and how to pass them to the renderer.
The profiler's UI is defined by a string with with a single syntax:
- Comma separated list of tokens (leading and trailing spaces ignored), which can be:
- A counter name:
- If prefixed with a '#' character, the counter is shown as a graph.
- If prefixed with a '*' character, the counter is shown as a change indicator
- By default (counter name without prefix), the counter is shown as average and max over half a second.
- A preset name:
- A preset is a builtin UI string in the same syntax that can be nested in the main UI string.
- Presets are defined towards the top of profiler.rs and can also refer to other presets.
- An empty token adds a bit of vertical space.
- A '|' token begins a new column.
- A '_' token begins a new row.
Differential Revision: https://phabricator.services.mozilla.com/D93603
This patch removes the public API and high level logic for
disabling picture caching for debugging and pinch-zoom in
some cases.
Follow up patches will remove and simplify the internal parts
of WR that remain to handle the disabled picture caching
code path.
Differential Revision: https://phabricator.services.mozilla.com/D93446
NativeLayerCA only understands how to extract the IOSurface from a RenderMacIOSurfaceTextureHost.
Rather than trying to support both types, this just merges them, as they are both just an IOSurface pointer and some associated helper functions.
Differential Revision: https://phabricator.services.mozilla.com/D93181
This interface is never used directly, and the only consumers of the virtual functions are by the derived classes themselves.
Differential Revision: https://phabricator.services.mozilla.com/D93180
When using the native RenderCompositor+SWGL on MacOS, we don't support passing buffer textures directly to the compositor.
Differential Revision: https://phabricator.services.mozilla.com/D93179
Currently we query the backbuffer age in
RenderCompositor::BeginFrame(). Querying the age on android requires
the driver to dequeue a new backbuffer. By doing this right at the
start of the frame, we may cause the driver to block until a buffer is
ready.
We don't actually need the buffer age until part way through
rendering, in Renderer::composite_simple(), by which point there is a
better chance the buffer is available. So move the query to there instead.
Differential Revision: https://phabricator.services.mozilla.com/D92950
It is possible that when the ImageBridge received a new image to composite, that image hasn't yet been registered via the VideoBridge.
This can happen if the decoding occurs in the different process than the content process. Even though the VideoBridge registration message was sent earlier by the RDD process, the ImageBridge message sent by the content process reached the compositor earlier.
So we only attempt to use the TextureHost if it is valid and the underlying image has been properly registered. Otherwise we will continue to use the previous image.
Some methods are modified to lazily perform their action only once the image has been registered from the PVideoBridge.
Differential Revision: https://phabricator.services.mozilla.com/D92234
The DisplayListBuilder::PushRoundedRect function is used for <li> bullets and a
few other decorations. It draws a rounded rectangle as an ordinary rectangle
with a rounded rect clip.
However, you can get the same effect by drawing a box border with rounded
corners around a box with zero width and height. This, WebRender can cache as a
bitmap and draw as an image. Clips are not cached in this way, and require extra
attention from WebRender to process.
Differential Revision: https://phabricator.services.mozilla.com/D92984
Previously we used a single texture array for a given tile size,
and resized the texture array as more tiles were needed.
However, this results in expensive driver stalls and GPU copy times
when resizing the array.
Instead, use a fixed slice count for each texture array, and support
multiple textures with the same tile size.
This may result in slightly more draw calls during compositing of
picture cache tiles due to batch breaks, but will remain a small
number due to the limited number of picture cache tiles that are
allocated at any one time.
Differential Revision: https://phabricator.services.mozilla.com/D92715
When ANGLE detects device reset, GetDeviceOfEGLDisplay() returns nullptr. It is not handled as device reset in current RenderCompositorANGLE::ShutdownEGLLibraryIfNecessary(). It should be handled as device reset.
Differential Revision: https://phabricator.services.mozilla.com/D92543
While we measuered somewhat surprisingly high performance improvements on linux when replacing standard channels with crossbeam ones on Linux, there has been a CONTENT_FRAME_TIME regression on Windows around the same time. In doubt, this patch makes us use standard channels on Windows to see if it fixes the regression. This patch will be reverted if it doesn't turn out restore the CONTENT_FRAME_TIME numbers.
Swgl needs to continue using crossbeam because it depends on select which doesn't exist in standard channels.
Differential Revision: https://phabricator.services.mozilla.com/D92383