Attempting to just clamping the base address returning from texelFetchPtr might be causing
some crashes in the case the texture is actually smaller than the offset area. Instead, switch
out the sampler with a zero buffer to ensure we have something sane to sample without having
to do slow bounds checking on everything.
Differential Revision: https://phabricator.services.mozilla.com/D132508
+ Begin to add video tests to ensure we ratchet towards correctness.
+ Test rec709 x (yuv420p, yuv420p10, gbrp) x (tv, pc) x codecs.
+ Just mark fuzziness for now. Better would be e.g. 16_127_233 'bad
references'.
Differential Revision: https://phabricator.services.mozilla.com/D115298
+ Begin to add video tests to ensure we ratchet towards correctness.
+ Test rec709 x (yuv420p, yuv420p10, gbrp) x (tv, pc) x codecs.
+ Just mark fuzziness for now. Better would be e.g. 16_127_233 'bad
references'.
Differential Revision: https://phabricator.services.mozilla.com/D115298
+ Begin to add video tests to ensure we ratchet towards correctness.
+ Test rec709 x (yuv420p, yuv420p10, gbrp) x (tv, pc) x codecs.
+ Just mark fuzziness for now. Better would be e.g. 16_127_233 'bad
references'.
Differential Revision: https://phabricator.services.mozilla.com/D115298
+ Begin to add video tests to ensure we ratchet towards correctness.
+ Test rec709 x (yuv420p, yuv420p10, gbrp) x (tv, pc) x codecs.
+ Just mark fuzziness for now. Better would be e.g. 16_127_233 'bad
references'.
Differential Revision: https://phabricator.services.mozilla.com/D115298
This adds some swgl_commitTextureLinearR8ToRGBA8 variations so that we can deal
with alpha glyph formats. Following that, a simple span shader is added that
dispatches to this as appropriate.
Differential Revision: https://phabricator.services.mozilla.com/D115551
This implements gl_ClipDistance so that the text shader doesn't have
to manually evaluate whether or not we're inside the untransformed
source texture. This simplification is necessary so that we can work
towards using swgl_commitTexture in the text shader.
Differential Revision: https://phabricator.services.mozilla.com/D115457
This adds a swgl_blendSubpixelText() extension that enables us to move some
of the complexity of plumbing dual-source blending out of the shader for
subpixel text. This will enable further speed-ups later by allowing us to use
swgl_commitTexture.
Differential Revision: https://phabricator.services.mozilla.com/D115456
This adds some swgl_commitTextureLinearR8ToRGBA8 variations so that we can deal
with alpha glyph formats. Following that, a simple span shader is added that
dispatches to this as appropriate.
Differential Revision: https://phabricator.services.mozilla.com/D115551
This implements gl_ClipDistance so that the text shader doesn't have
to manually evaluate whether or not we're inside the untransformed
source texture. This simplification is necessary so that we can work
towards using swgl_commitTexture in the text shader.
Differential Revision: https://phabricator.services.mozilla.com/D115457
This adds a swgl_blendSubpixelText() extension that enables us to move some
of the complexity of plumbing dual-source blending out of the shader for
subpixel text. This will enable further speed-ups later by allowing us to use
swgl_commitTexture.
Differential Revision: https://phabricator.services.mozilla.com/D115456
This adds some swgl_commitTextureLinearR8ToRGBA8 variations so that we can deal
with alpha glyph formats. Following that, a simple span shader is added that
dispatches to this as appropriate.
Differential Revision: https://phabricator.services.mozilla.com/D115551
This implements gl_ClipDistance so that the text shader doesn't have
to manually evaluate whether or not we're inside the untransformed
source texture. This simplification is necessary so that we can work
towards using swgl_commitTexture in the text shader.
Differential Revision: https://phabricator.services.mozilla.com/D115457
This adds a swgl_blendSubpixelText() extension that enables us to move some
of the complexity of plumbing dual-source blending out of the shader for
subpixel text. This will enable further speed-ups later by allowing us to use
swgl_commitTexture.
Differential Revision: https://phabricator.services.mozilla.com/D115456
This renames compatible_type() to can_implicitly_convert_to()
which is a better name. The parameters are renamed to make
it more obvious what's going on. A note is added about
glsl misparsing of float literals and finally, implicit conversion of
float/double to int is removed because that's not supported by glsl.
Differential Revision: https://phabricator.services.mozilla.com/D115247
This adds a span shader that tries to treat the box shadow as a nine-patch
and intersect with the various sectors of it. This allows committing entire
contiguous spans of texture from the source box shadow rather than doing
slower per-fragment processing.
Differential Revision: https://phabricator.services.mozilla.com/D115113
This expands on an earlier fix from bug 1698009. It turns out we can occasionally find
YUV values which can still produce negative RGB values if only Y is clamped. The final
solution to this is just to clamp the output RGB values rather than input YUV values.
Since this is only used when we fall off the SWGL fast-paths (which properly handle
this clamping already), the performance impact of the extra clamping should be negligible.
Differential Revision: https://phabricator.services.mozilla.com/D111032
We were just generating nonsensical code for this case. I don't
think this actually gets used anywhere so we never noticed.
Differential Revision: https://phabricator.services.mozilla.com/D110832
Sometimes we sample slightly outside the tile repeat boundaries due to rounding
or anti-aliasing, which may cause potential sampling artifacts along primitive
edges. This adds supports for enforcing tile repeat limits as we otherwise do
in the brush_image shader to prevent such artifacts.
Differential Revision: https://phabricator.services.mozilla.com/D110397
On some Adreno 3xx devices we have observed that the driver does not
pack varyings in to vectors as efficiently as the spec mandates. This
results in some of our shaders using a greater number of varying
vectors than GL_MAX_VARYING_VECTORS (which 16 on this device), leading
to shader compilation errors at run time.
Work around this by manually packing our varyings in to fewer
vectors. Additionally, add a test to ensure that we never use more
than 16 vectors even if the driver were to perform no additional
packing.
Differential Revision: https://phabricator.services.mozilla.com/D107929
On some Adreno 3xx devices we have observed that the driver does not
pack varyings in to vectors as efficiently as the spec mandates. This
results in some of our shaders using a greater number of varying
vectors than GL_MAX_VARYING_VECTORS (which 16 on this device), leading
to shader compilation errors at run time.
Work around this by manually packing our varyings in to fewer
vectors. Additionally, add a test to ensure that we never use more
than 16 vectors even if the driver were to perform no additional
packing.
Differential Revision: https://phabricator.services.mozilla.com/D107929
Since WebRender doesn't need texture array support anymore, neither does SWGL.
This is a massive simplification which should benefit both performance and
simplicity. This patch pretty much just removes functionality but doesn't
change any functionality that is already used and relied upon.
Differential Revision: https://phabricator.services.mozilla.com/D106718
cs_clip_rectangle is slow because we evaluate distance AA for every fragment
the shader touches. With SWGL, we can do much better since we have control
over span. We calculate an inner opaque octagon which can just use a cheap
solid fill and an outer AA octagon within which we need to actually we do
AA and outside which we can just do another solid clear. This reduces most
of the cost of rounded-rectangles to just some setup work, a few fragments
of distance AA on the ends of a span, and large runs of solid color where
we don't have to do much work.
Differential Revision: https://phabricator.services.mozilla.com/D106658
Some sites use pixelated/crisp image-rendering and/or 1x1 images as color
sources. When we hit these, we fall off the fast-path. Try to handle some
of those cases we are finding in the wild, namely nearest filtering and
repeat filtering.
There is some slight movement in the wrench fuzz due to the composite shader
being accelerated in situations it was previously not due to nearest filter.
Differential Revision: https://phabricator.services.mozilla.com/D105864
The same optimization of looking for merged linear gradients can also be
applied to radial gradients by solving the quadratic equation to check
how large a span we can process within a given merged span. This allows
us to save a bunch of table lookup and some other math in the inner loops.
Differential Revision: https://phabricator.services.mozilla.com/D105858
For linear gradients, we are currently bottlenecked by looking up a gradient
table entry, doing interpolation, and converting to pixel formats for every
sample.
We can accelerate this by instead looking for contiguous segments of gradient
within the range of entries we need to sample and then interpolating these
as a single gradient. This also enables us to convert to relevant pixel formats
only when setting up this gradient, which greatly reduces the per-pixel processing
down to essentially a shift and add.
To enable this sort of crawling of the gradient table, the output gradients have
been modified such that each entry's step value will equal an adjacent entry's
step value if and only if they are from same gradient. We can ensure this by, in
the very rare case two segments of gradient have the same step, using the equivalent
of nextafter() to imperceptibly alter the value so that the invariant is maintained.
Differential Revision: https://phabricator.services.mozilla.com/D105716
Now that most of the complicated alpha-pass features such as clip-masking and anti-aliasing
are handled in SWGL during the blend stage, most of the fast-paths are identical and only call
swgl_commitTextureLinear in a tight loop. We can do a lot better here by just moving that loop
into SWGL, not only making it faster but removing all the redundant boiler-plate code out of
the shaders.
Differential Revision: https://phabricator.services.mozilla.com/D104536
The main goal of this patch is to move all the complexity of optionally
handling anti-aliasing out of the GLSL drawSpan fast-paths and into SWGL
itself into specific blend-mode handling of anti-aliasing. Mainly this
adds a swgl_antiAlias() extension to be called from the GLSL vertex shader,
after which no further involvement is necessary from the shader to work.
This also enables SWGL to better track those areas of a span that don't
need any anti-aliasing applied, and so can potentially be faster.
Some massaging of blend_pixels() was necessary to get it to inline properly
with all the extra cases added. This is mainly a consequence of the DO_AA
macro that lives inside BLEND_CASE, which is used to handle the dispatching
of new AA_BLEND_KEY and AA_MASK_BLEND_KEY cases. The parameters for these
AA modes are mostly handled in SWGL via the aa_span() function, which computes
the area of the span where non-opaque AA weights are necessary and where it can
skip over the opaque interior.
There are some incidental drive-by cleanups that were necessary of bvecs and
pack_pixels.
Differential Revision: https://phabricator.services.mozilla.com/D104492
Now that most of the complicated alpha-pass features such as clip-masking and anti-aliasing
are handled in SWGL during the blend stage, most of the fast-paths are identical and only call
swgl_commitTextureLinear in a tight loop. We can do a lot better here by just moving that loop
into SWGL, not only making it faster but removing all the redundant boiler-plate code out of
the shaders.
Differential Revision: https://phabricator.services.mozilla.com/D104536
The main goal of this patch is to move all the complexity of optionally
handling anti-aliasing out of the GLSL drawSpan fast-paths and into SWGL
itself into specific blend-mode handling of anti-aliasing. Mainly this
adds a swgl_antiAlias() extension to be called from the GLSL vertex shader,
after which no further involvement is necessary from the shader to work.
This also enables SWGL to better track those areas of a span that don't
need any anti-aliasing applied, and so can potentially be faster.
Some massaging of blend_pixels() was necessary to get it to inline properly
with all the extra cases added. This is mainly a consequence of the DO_AA
macro that lives inside BLEND_CASE, which is used to handle the dispatching
of new AA_BLEND_KEY and AA_MASK_BLEND_KEY cases. The parameters for these
AA modes are mostly handled in SWGL via the aa_span() function, which computes
the area of the span where non-opaque AA weights are necessary and where it can
skip over the opaque interior.
There are some incidental drive-by cleanups that were necessary of bvecs and
pack_pixels.
Differential Revision: https://phabricator.services.mozilla.com/D104492
This removes some calls to commit_span from inside the draw_span specializers.
Instead it relies upon the span rasterizer loop to do some of the work, which
will incur a function pointer call in the rare case we actually return out
of a specializer early. This shouldn't be that performance critical and will
remove some inliner bloat.
Also, I refactored commit_output in the rasterizer itself to hopefully cause
fewer template instantiations which should also further reduce inliner bloat.
Differential Revision: https://phabricator.services.mozilla.com/D104150
This patch has a few moving parts. We have to first tell WR that when it
detects the extension that it is actually allowed to use it. We have to make
the glsl-to-cxx translator eat the blend_supports_all_equations layout
qualifier. We have to enable generation of advanced-blend-equation variants
in the SWGL build setup. Then we report the actual extension inside SWGL.
Finally, we actually add all the necessary blend equation enums, hash them
down to a blend key, and implement all the blend modes therein.
Differential Revision: https://phabricator.services.mozilla.com/D103804
This patch has a few moving parts. We have to first tell WR that when it
detects the extension that it is actually allowed to use it. We have to make
the glsl-to-cxx translator eat the blend_supports_all_equations layout
qualifier. We have to enable generation of advanced-blend-equation variants
in the SWGL build setup. Then we report the actual extension inside SWGL.
Finally, we actually add all the necessary blend equation enums, hash them
down to a blend key, and implement all the blend modes therein.
Differential Revision: https://phabricator.services.mozilla.com/D103804