Bug 1678119 - Implement native anti-aliasing in SWGL. r=jrmuizel

The main goal of this patch is to move all the complexity of optionally
handling anti-aliasing out of the GLSL drawSpan fast-paths and into SWGL
itself into specific blend-mode handling of anti-aliasing. Mainly this
adds a swgl_antiAlias() extension to be called from the GLSL vertex shader,
after which no further involvement is necessary from the shader to work.
This also enables SWGL to better track those areas of a span that don't
need any anti-aliasing applied, and so can potentially be faster.

Some massaging of blend_pixels() was necessary to get it to inline properly
with all the extra cases added. This is mainly a consequence of the DO_AA
macro that lives inside BLEND_CASE, which is used to handle the dispatching
of new AA_BLEND_KEY and AA_MASK_BLEND_KEY cases. The parameters for these
AA modes are mostly handled in SWGL via the aa_span() function, which computes
the area of the span where non-opaque AA weights are necessary and where it can
skip over the opaque interior.

There are some incidental drive-by cleanups that were necessary of bvecs and
pack_pixels.

Differential Revision: https://phabricator.services.mozilla.com/D104492
This commit is contained in:
Lee Salzman 2021-02-12 00:19:02 +00:00
Родитель 3a51301262
Коммит 2044de5687
7 изменённых файлов: 721 добавлений и 378 удалений

Просмотреть файл

@ -2998,7 +2998,13 @@ pub fn ast_to_hir(state: &mut State, tu: &syntax::TranslationUnit) -> Translatio
Type::new(BVec4),
vec![Type::new(BVec2), Type::new(BVec2)],
);
declare_function(
state,
"bvec4",
Some("make_bvec4"),
Type::new(BVec4),
vec![Type::new(Bool), Type::new(Bool), Type::new(Bool), Type::new(Bool)],
);
declare_function(
state,
"int",
@ -3486,7 +3492,7 @@ pub fn ast_to_hir(state: &mut State, tu: &syntax::TranslationUnit) -> Translatio
state,
"greaterThanEqual",
None,
Type::new(BVec2),
Type::new(BVec4),
vec![Type::new(Vec4), Type::new(Vec4)],
);
declare_function(state, "any", None, Type::new(Bool), vec![Type::new(BVec2)]);
@ -3743,6 +3749,20 @@ pub fn ast_to_hir(state: &mut State, tu: &syntax::TranslationUnit) -> Translatio
Type::new(Void),
vec![Type::new(Sampler2D), Type::new(Vec2), Type::new(Vec2), Type::new(Vec2)],
);
declare_function(
state,
"swgl_antiAlias",
None,
Type::new(Void),
vec![Type::new(Int)],
);
declare_function(
state,
"swgl_antiAlias",
None,
Type::new(Void),
vec![Type::new(BVec4)],
);
declare_function_ext(
state,
"swgl_validateGradient",

Просмотреть файл

@ -43,3 +43,42 @@ within the given rectangle, specified relative to the clip mask offset.
Anything falling outside this rectangle will be clipped entirely. If the
rectangle is empty, then the clip mask will be ignored.
```
void swgl_antiAlias(int edgeMask);
```
When called from the vertex shader, this enables anti-aliasing for the
currently drawn primitive while blending is enabled. This setting will only
apply to the current primitive. Anti-aliasing will be applied only to the
edges corresponding to bits supplied in the mask. For simple use-cases,
the edge mask can be set to all 1 bits to enable AA for the entire quad.
The order of the bits in the edge mask must match the winding order in which
the vertices are output in the vertex shader if processed as a quad, so that
the edge ends on that vertex. The easiest way to understand this ordering
is that for a rectangle (x0,y0,x1,y1) then the edge Nth edge bit corresponds
to the edge where Nth coordinate in the rectangle is constant.
SWGL tries to use an anti-aliasing method that is reasonably close to WR's
signed-distance field approximation. WR would normally try to discern the
2D local-space coordinates of a given destination pixel relative to the
2D local-space bounding rectangle of a primitive. It then uses the screen-
space derivative to try to determine the how many local-space units equate
to a distance of around one screen-space pixel. A distance approximation
of coverage is then used based on the distance in local-space from the
the current pixel's center, roughly at half-intensity at pixel center
and ranging to zero or full intensity within a radius of half a pixel
away from the center. To account for AAing going outside the normal geometry
boundaries of the primitive, WR has to extrude the primitive by a local-space
estimate to allow some AA to happen within the extruded region.
SWGL can ultimately do this approximation more simply and get around the
extrusion limitations by just ensuring spans encompass any pixel that is
partially covered when computing span boundaries. Further, since SWGL already
knows the slope of an edge and the coordinate of the span relative to the span
boundaries, finding the partial coverage of a given span becomes easy to do
without requiring any extra interpolants to track against local-space bounds.
Essentially, SWGL just performs anti-aliasing on the actual geometry bounds,
but when the pixels on a span's edge are determined to be partially covered
during span rasterization, it uses the same distance field method as WR on
those span boundary pixels to estimate the coverage based on edge slope.

Просмотреть файл

@ -136,7 +136,7 @@ fn main() {
cc::Build::new()
.cpp(true)
.file("src/gl.cc")
.flag("-std=c++14")
.flag("-std=c++17")
.flag("-UMOZILLA_CONFIG_H")
.flag("-fno-exceptions")
.flag("-fno-rtti")

Разница между файлами не показана из-за своего большого размера Загрузить разницу

Просмотреть файл

@ -215,7 +215,13 @@ SI Float sqrt(Float v) {
#endif
}
SI float recip(float x) { return 1.0f / x; }
SI float recip(float x) {
#if USE_SSE2
return _mm_cvtss_f32(_mm_rcp_ss(_mm_set_ss(x)));
#else
return 1.0f / x;
#endif
}
// Use a fast vector reciprocal approximation when available. This should only
// be used in cases where it is okay that the approximation is imprecise -
@ -233,7 +239,13 @@ SI Float recip(Float v) {
#endif
}
SI float inversesqrt(float x) { return 1.0f / sqrtf(x); }
SI float inversesqrt(float x) {
#if USE_SSE2
return _mm_cvtss_f32(_mm_rsqrt_ss(_mm_set_ss(x)));
#else
return 1.0f / sqrtf(x);
#endif
}
SI Float inversesqrt(Float v) {
#if USE_SSE2
@ -674,8 +686,8 @@ SI I32 roundfast(Float v, Float scale) {
}
template <typename T>
SI auto round_pixel(T v, float maxval = 1.0f) {
return roundfast(v, (255.0f / maxval));
SI auto round_pixel(T v, float scale = 255.0f) {
return roundfast(v, scale);
}
#define round __glsl_round
@ -1166,6 +1178,23 @@ struct bvec4_scalar {
IMPLICIT constexpr bvec4_scalar(bool a) : x(a), y(a), z(a), w(a) {}
constexpr bvec4_scalar(bool x, bool y, bool z, bool w)
: x(x), y(y), z(z), w(w) {}
bool& select(XYZW c) {
switch (c) {
case X:
return x;
case Y:
return y;
case Z:
return z;
case W:
return w;
}
}
bool sel(XYZW c1) { return select(c1); }
bvec2_scalar sel(XYZW c1, XYZW c2) {
return bvec2_scalar(select(c1), select(c2));
}
};
struct bvec4_scalar1 {
@ -1207,6 +1236,10 @@ bvec4_scalar make_bvec4(bool x, bool y, bool z, bool w) {
return bvec4_scalar{x, y, z, w};
}
bvec4_scalar make_bvec4(bvec2_scalar a, bvec2_scalar b) {
return bvec4_scalar{a.x, a.y, b.x, b.y};
}
template <typename N>
bvec4 make_bvec4(const N& n) {
return bvec4(n);
@ -1990,6 +2023,10 @@ SI bvec2 lessThan(vec2 x, vec2 y) {
return bvec2(lessThan(x.x, y.x), lessThan(x.y, y.y));
}
SI bvec2_scalar lessThan(vec2_scalar x, vec2_scalar y) {
return bvec2_scalar(lessThan(x.x, y.x), lessThan(x.y, y.y));
}
template <typename T>
auto greaterThan(T x, T y) -> decltype(x > y) {
return x > y;
@ -1999,6 +2036,10 @@ bvec2 greaterThan(vec2 x, vec2 y) {
return bvec2(greaterThan(x.x, y.x), greaterThan(x.y, y.y));
}
bvec2_scalar greaterThan(vec2_scalar x, vec2_scalar y) {
return bvec2_scalar(greaterThan(x.x, y.x), greaterThan(x.y, y.y));
}
template <typename T>
auto greaterThanEqual(T x, T y) -> decltype(x >= y) {
return x >= y;

Просмотреть файл

@ -96,22 +96,10 @@ static ALWAYS_INLINE auto swgl_forceScalar(T v) -> decltype(force_scalar(v)) {
swgl_SpanLength -= swgl_StepSize; \
} while (0)
static ALWAYS_INLINE WideRGBA8 pack_pixels_RGBA8(Float alpha) {
I32 i = round_pixel(alpha);
HalfRGBA8 c = packRGBA8(zipLow(i, i), zipHigh(i, i));
return combine(zipLow(c, c), zipHigh(c, c));
}
static ALWAYS_INLINE WideRGBA8 pack_pixels_RGBA8(float alpha) {
I32 i = round_pixel(alpha);
HalfRGBA8 c = packRGBA8(i, i);
return combine(c, c);
}
// Commit a single chunk of a color scaled by an alpha weight
#define swgl_commitColor(format, color, alpha) \
swgl_commitChunk(format, muldiv255(pack_pixels_##format(color), \
pack_pixels_##format(alpha)))
swgl_commitChunk(format, muldiv256(pack_pixels_##format(color), \
pack_pixels_##format(alpha, 256.0f)))
#define swgl_commitColorRGBA8(color, alpha) \
swgl_commitColor(RGBA8, color, alpha)
#define swgl_commitColorR8(color, alpha) swgl_commitColor(R8, color, alpha)
@ -134,7 +122,8 @@ static ALWAYS_INLINE bool swgl_isTextureR8(S s) {
// Returns the offset into the texture buffer for the given layer index. If not
// a texture array or 3D texture, this will always access the first layer.
template <typename S>
static ALWAYS_INLINE int swgl_textureLayerOffset(S s, float layer) {
static ALWAYS_INLINE int swgl_textureLayerOffset(UNUSED S s,
UNUSED float layer) {
return 0;
}
@ -171,9 +160,9 @@ static ALWAYS_INLINE T swgl_linearQuantizeStep(S s, T p) {
// Commit a single chunk from a linear texture fetch that is scaled by a color
#define swgl_commitTextureLinearColor(format, s, p, color, ...) \
swgl_commitChunk(format, muldiv255(textureLinearUnpacked##format( \
swgl_commitChunk(format, muldiv256(textureLinearUnpacked##format( \
s, ivec2(p), __VA_ARGS__), \
pack_pixels_##format(color)))
pack_pixels_##format(color, 256.0f)))
#define swgl_commitTextureLinearColorRGBA8(s, p, color, ...) \
swgl_commitTextureLinearColor(RGBA8, s, p, color, __VA_ARGS__)
#define swgl_commitTextureLinearColorR8(s, p, color, ...) \
@ -230,7 +219,7 @@ static ALWAYS_INLINE PackedRGBA8 convertYUV(int colorSpace, U16 y, U16 u,
// Helper functions to sample from planar YUV textures before converting to RGB
template <typename S0>
static inline PackedRGBA8 sampleYUV(S0 sampler0, vec2 uv0, int layer0,
int colorSpace, int rescaleFactor) {
int colorSpace, UNUSED int rescaleFactor) {
ivec2 i0(uv0);
switch (sampler0->format) {
case TextureFormat::RGBA8: {
@ -252,15 +241,15 @@ template <typename S0, typename C>
static inline WideRGBA8 sampleColorYUV(S0 sampler0, vec2 uv0, int layer0,
int colorSpace, int rescaleFactor,
C color) {
return muldiv255(
return muldiv256(
unpack(sampleYUV(sampler0, uv0, layer0, colorSpace, rescaleFactor)),
pack_pixels_RGBA8(color));
pack_pixels_RGBA8(color, 256.0f));
}
template <typename S0, typename S1>
static inline PackedRGBA8 sampleYUV(S0 sampler0, vec2 uv0, int layer0,
S1 sampler1, vec2 uv1, int layer1,
int colorSpace, int rescaleFactor) {
int colorSpace, UNUSED int rescaleFactor) {
ivec2 i0(uv0);
ivec2 i1(uv1);
switch (sampler1->format) {
@ -287,9 +276,9 @@ static inline WideRGBA8 sampleColorYUV(S0 sampler0, vec2 uv0, int layer0,
S1 sampler1, vec2 uv1, int layer1,
int colorSpace, int rescaleFactor,
C color) {
return muldiv255(unpack(sampleYUV(sampler0, uv0, layer0, sampler1, uv1,
return muldiv256(unpack(sampleYUV(sampler0, uv0, layer0, sampler1, uv1,
layer1, colorSpace, rescaleFactor)),
pack_pixels_RGBA8(color));
pack_pixels_RGBA8(color, 256.0f));
}
template <typename S0, typename S1, typename S2>
@ -336,10 +325,10 @@ static inline WideRGBA8 sampleColorYUV(S0 sampler0, vec2 uv0, int layer0,
S2 sampler2, vec2 uv2, int layer2,
int colorSpace, int rescaleFactor,
C color) {
return muldiv255(
return muldiv256(
unpack(sampleYUV(sampler0, uv0, layer0, sampler1, uv1, layer1, sampler2,
uv2, layer2, colorSpace, rescaleFactor)),
pack_pixels_RGBA8(color));
pack_pixels_RGBA8(color, 256.0f));
}
// Commit a single chunk of a YUV surface represented by multiple planar
@ -362,7 +351,7 @@ static ALWAYS_INLINE WideRGBA8 applyColor(WideRGBA8 src, NoColor) {
}
static ALWAYS_INLINE WideRGBA8 applyColor(WideRGBA8 src, WideRGBA8 color) {
return muldiv255(src, color);
return muldiv256(src, color);
}
static ALWAYS_INLINE PackedRGBA8 applyColor(PackedRGBA8 src, NoColor) {
@ -370,7 +359,7 @@ static ALWAYS_INLINE PackedRGBA8 applyColor(PackedRGBA8 src, NoColor) {
}
static ALWAYS_INLINE PackedRGBA8 applyColor(PackedRGBA8 src, WideRGBA8 color) {
return pack(muldiv255(unpack(src), color));
return pack(muldiv256(unpack(src), color));
}
// Samples an axis-aligned span of on a single row of a texture using 1:1
@ -469,7 +458,7 @@ static void blendTextureNearestRGBA8(S sampler, const ivec2_scalar& i, int span,
#define swgl_commitTextureNearestColor(format, s, p, uv_rect, color, ...) \
swgl_commitTextureNearest(format, s, p, uv_rect, \
pack_pixels_##format(color), __VA_ARGS__)
pack_pixels_##format(color, 256.0f), __VA_ARGS__)
#define swgl_commitTextureNearestColorRGBA8(s, p, uv_rect, color, ...) \
swgl_commitTextureNearestColor(RGBA8, s, p, uv_rect, color, __VA_ARGS__)
#define swgl_commitTextureNearestColorR8(s, p, uv_rect, color, ...) \
@ -554,19 +543,26 @@ static inline WideRGBA8 sampleGradient(sampler2D sampler, int address,
// Variant that allows specifying a color multiplier of the gradient result.
#define swgl_commitGradientColorRGBA8(sampler, address, entry, color) \
swgl_commitChunk(RGBA8, muldiv255(sampleGradient(sampler, address, entry), \
pack_pixels_RGBA8(color)))
swgl_commitChunk(RGBA8, muldiv256(sampleGradient(sampler, address, entry), \
pack_pixels_RGBA8(color, 256.0f)))
// Extension to set a clip mask image to be sampled during blending. The offset
// specifies the positioning of the clip mask image relative to the viewport
// origin. The bounding box specifies the rectangle relative to the clip mask's
// origin that constrains sampling within the clip mask.
// origin that constrains sampling within the clip mask. Blending must be
// enabled for this to work.
enum SWGLClipFlag {
SWGL_CLIP_FLAG_MASK = 1 << 0,
SWGL_CLIP_FLAG_AA = 1 << 1,
};
static int swgl_ClipFlags = 0;
static sampler2D swgl_ClipMask = nullptr;
static IntPoint swgl_ClipMaskOffset = {0, 0};
static IntRect swgl_ClipMaskBounds = {0, 0, 0, 0};
#define swgl_clipMask(mask, offset, bb_origin, bb_size) \
do { \
if (bb_size != vec2_scalar(0.0f, 0.0f)) { \
swgl_ClipFlags |= SWGL_CLIP_FLAG_MASK; \
swgl_ClipMask = mask; \
swgl_ClipMaskOffset = make_ivec2(offset); \
swgl_ClipMaskBounds = \
@ -574,6 +570,25 @@ static IntRect swgl_ClipMaskBounds = {0, 0, 0, 0};
} \
} while (0)
// Extension to enable anti-aliasing for the given edges of a quad.
// Blending must be enable for this to work.
static int swgl_AAEdgeMask = 0;
static ALWAYS_INLINE int calcAAEdgeMask(bool on) { return on ? 0xF : 0; }
static ALWAYS_INLINE int calcAAEdgeMask(int mask) { return mask; }
static ALWAYS_INLINE int calcAAEdgeMask(bvec4_scalar mask) {
return (mask.x ? 1 : 0) | (mask.y ? 2 : 0) | (mask.z ? 4 : 0) |
(mask.w ? 8 : 0);
}
#define swgl_antiAlias(edges) \
do { \
swgl_AAEdgeMask = calcAAEdgeMask(edges); \
if (swgl_AAEdgeMask) { \
swgl_ClipFlags |= SWGL_CLIP_FLAG_AA; \
} \
} while (0)
// Dispatch helper used by the GLSL translator to swgl_drawSpan functions.
// The number of pixels committed is tracked by checking for the difference in
// swgl_SpanLength. Any varying interpolants used will be advanced past the

Просмотреть файл

@ -470,7 +470,7 @@ SI T samplerScale(S sampler, T P) {
}
template <typename T>
SI T samplerScale(sampler2DRect sampler, T P) {
SI T samplerScale(UNUSED sampler2DRect sampler, T P) {
return P;
}