For YUV 422 video, when we are sampling UV planes at half the resolution of the
Y plane, we can interpolate from 2 samples for the UV planes as an approximation
of the 4 samples, allowing us to better pack the math into SIMD vectors and
substantially reduce the number of multiplications.
Differential Revision: https://phabricator.services.mozilla.com/D105137
Often images are upscaled from a smaller resolution on a page, especially
when there is any amount of zoom being used, and especially at higher screen
resolutions. In this case, we don't really take advantage of the fact that all
the samples for a SIMD chunk can be loaded from memory in a single load, so
long as we're willing to shuffle them around. We also can take advantage of the
fact that most images are axis-aligned so that they have a constant filter
offset with the next row.
Also, we can easily fall off the fast past for blendTextureNearest if for some
reason there is a significant subpixel offset. In this case, we can still do
something way faster than a normal linear filter the optimizes for the fact
that both the X and Y steps are constant 1:1, but we need to interpolate with
neighboring samples.
Differential Revision: https://phabricator.services.mozilla.com/D105131
Previously we had encountered issues when rendering partial regions of
picture cache tiles on Mali-Gxx devices. These often manifested as
patterns of black squares and rectangles. We worked around this by
ensuring that we always clear and render the entire tile. We have now
had a report of a similar looking problem on a Mali-Txxx devices, so
apply the same workaround there.
Differential Revision: https://phabricator.services.mozilla.com/D105278
There are no code changes, only #include changes.
It was a fairly mechanical process: Search for all "AUTO_PROFILER_LABEL", and in each file, if only labels are used, convert "GeckoProfiler.h" into "ProfilerLabels.h" (or just add that last one where needed).
In some files, there were also some marker calls but no other profiler-related calls, in these cases "GeckoProfiler.h" was replaced with both "ProfilerLabels.h" and "ProfilerMarkers.h", which still helps in reducing the use of the all-encompassing "GeckoProfiler.h".
Differential Revision: https://phabricator.services.mozilla.com/D104588
Per comment 9 this seems caused by an unexpected reframe. I haven't been
able to repro this, but the only kind of thing that should cause it is
the global reflow we do when fonts change.
This patch turns these async font loading features off in APZ tests to
see whether it helps avoiding this kind of intermittent. If it doesn't,
I guess we should revert this and try to repro harder.
Differential Revision: https://phabricator.services.mozilla.com/D105166
CLOSED TREE
Backed out changeset f6519420f910 (bug 1678487)
Backed out changeset 9beae015d19b (bug 1678487)
Backed out changeset 029cc10d2477 (bug 1678487)
Well, mostly thread-safe, in the sense that on shutdown we might free
them, but that is pre-existing and can't happen for the code-path that I'm
about to touch.
We could probably just avoid freeing these transforms if we wanted...
Differential Revision: https://phabricator.services.mozilla.com/D104946
This addresses some deficiencies in the way solid spans are handled when clip-masking
or AA is used. In that case, the overhead of the extra blend stage is significant and
we can do a better just by temporarily disabling the those parts of the blend stage
and fast-pathing them instead of just going through commit_solid_span.
Differential Revision: https://phabricator.services.mozilla.com/D104963
This patch removes from RenderTaskKind members that are independent from what the render task is drawing. The uv rect set automatically either to Some(handle) if the render task is not cached or to None if it is cached. This reflects what was happening implicitly before this patch. The uv rect kind defaults to Rect which is the most common case, but can be set when creating the render task.
This is a first step toward more flexibility when deciding whether a render task is cached or not (there is stil some coupling in the batching code between the type of primitive and whether their render tasks are cached).
More importantly, not having to understand what is up with presence or absence of uv handles in render tasks makes adding new ones much easier.
Differential Revision: https://phabricator.services.mozilla.com/D104840
Now that most of the complicated alpha-pass features such as clip-masking and anti-aliasing
are handled in SWGL during the blend stage, most of the fast-paths are identical and only call
swgl_commitTextureLinear in a tight loop. We can do a lot better here by just moving that loop
into SWGL, not only making it faster but removing all the redundant boiler-plate code out of
the shaders.
Differential Revision: https://phabricator.services.mozilla.com/D104536
This cleans up the WR brush shaders to not have to use its own
implementation of init_transform_fs() for anti-aliasing when SWGL
is available. To enable this, most of the details of AAing have
been moved into brush.glsl to simplify the control knobs and
allow easier modifications.
With swgl_antiAlias() used, the drawSpan fast-paths no longer have to
care about whether ot not AA is enabled, so we can more easily stay
on these fast-paths without worry.
Differential Revision: https://phabricator.services.mozilla.com/D104493
The main goal of this patch is to move all the complexity of optionally
handling anti-aliasing out of the GLSL drawSpan fast-paths and into SWGL
itself into specific blend-mode handling of anti-aliasing. Mainly this
adds a swgl_antiAlias() extension to be called from the GLSL vertex shader,
after which no further involvement is necessary from the shader to work.
This also enables SWGL to better track those areas of a span that don't
need any anti-aliasing applied, and so can potentially be faster.
Some massaging of blend_pixels() was necessary to get it to inline properly
with all the extra cases added. This is mainly a consequence of the DO_AA
macro that lives inside BLEND_CASE, which is used to handle the dispatching
of new AA_BLEND_KEY and AA_MASK_BLEND_KEY cases. The parameters for these
AA modes are mostly handled in SWGL via the aa_span() function, which computes
the area of the span where non-opaque AA weights are necessary and where it can
skip over the opaque interior.
There are some incidental drive-by cleanups that were necessary of bvecs and
pack_pixels.
Differential Revision: https://phabricator.services.mozilla.com/D104492
Now that most of the complicated alpha-pass features such as clip-masking and anti-aliasing
are handled in SWGL during the blend stage, most of the fast-paths are identical and only call
swgl_commitTextureLinear in a tight loop. We can do a lot better here by just moving that loop
into SWGL, not only making it faster but removing all the redundant boiler-plate code out of
the shaders.
Differential Revision: https://phabricator.services.mozilla.com/D104536