Bug 1678119 - Implement native anti-aliasing in SWGL. r=jrmuizel

The main goal of this patch is to move all the complexity of optionally handling anti-aliasing out of the GLSL drawSpan fast-paths and into SWGL itself into specific blend-mode handling of anti-aliasing. Mainly this adds a swgl_antiAlias() extension to be called from the GLSL vertex shader, after which no further involvement is necessary from the shader to work. This also enables SWGL to better track those areas of a span that don't need any anti-aliasing applied, and so can potentially be faster. Some massaging of blend_pixels() was necessary to get it to inline properly with all the extra cases added. This is mainly a consequence of the DO_AA macro that lives inside BLEND_CASE, which is used to handle the dispatching of new AA_BLEND_KEY and AA_MASK_BLEND_KEY cases. The parameters for these AA modes are mostly handled in SWGL via the aa_span() function, which computes the area of the span where non-opaque AA weights are necessary and where it can skip over the opaque interior. There are some incidental drive-by cleanups that were necessary of bvecs and pack_pixels. Differential Revision: https://phabricator.services.mozilla.com/D104492
2021-02-12 00:19:02 +00:00 · 2021-02-12 00:19:02 +00:00 · 2044de5687
--- a/gfx/wr/glsl-to-cxx/src/hir.rs
+++ b/gfx/wr/glsl-to-cxx/src/hir.rs
@ -2998,7 +2998,13 @@ pub fn ast_to_hir(state: &mut State, tu: &syntax::TranslationUnit) -> Translatio
        Type::new(BVec4),
        vec![Type::new(BVec2), Type::new(BVec2)],
    );
-
+    declare_function(
+        state,
+        "bvec4",
+        Some("make_bvec4"),
+        Type::new(BVec4),
+        vec![Type::new(Bool), Type::new(Bool), Type::new(Bool), Type::new(Bool)],
+    );
    declare_function(
        state,
        "int",
@ -3486,7 +3492,7 @@ pub fn ast_to_hir(state: &mut State, tu: &syntax::TranslationUnit) -> Translatio
        state,
        "greaterThanEqual",
        None,
-        Type::new(BVec2),
+        Type::new(BVec4),
        vec![Type::new(Vec4), Type::new(Vec4)],
    );
    declare_function(state, "any", None, Type::new(Bool), vec![Type::new(BVec2)]);
@ -3743,6 +3749,20 @@ pub fn ast_to_hir(state: &mut State, tu: &syntax::TranslationUnit) -> Translatio
        Type::new(Void),
        vec![Type::new(Sampler2D), Type::new(Vec2), Type::new(Vec2), Type::new(Vec2)],
    );
+    declare_function(
+        state,
+        "swgl_antiAlias",
+        None,
+        Type::new(Void),
+        vec![Type::new(Int)],
+    );
+    declare_function(
+        state,
+        "swgl_antiAlias",
+        None,
+        Type::new(Void),
+        vec![Type::new(BVec4)],
+    );
    declare_function_ext(
        state,
        "swgl_validateGradient",
--- a/gfx/wr/swgl/README.md
+++ b/gfx/wr/swgl/README.md
@ -43,3 +43,42 @@ within the given rectangle, specified relative to the clip mask offset.
 Anything falling outside this rectangle will be clipped entirely. If the
 rectangle is empty, then the clip mask will be ignored.

+```
+void swgl_antiAlias(int edgeMask);
+```
+
+When called from the vertex shader, this enables anti-aliasing for the
+currently drawn primitive while blending is enabled. This setting will only
+apply to the current primitive. Anti-aliasing will be applied only to the
+edges corresponding to bits supplied in the mask. For simple use-cases,
+the edge mask can be set to all 1 bits to enable AA for the entire quad.
+
+The order of the bits in the edge mask must match the winding order in which
+the vertices are output in the vertex shader if processed as a quad, so that
+the edge ends on that vertex. The easiest way to understand this ordering
+is that for a rectangle (x0,y0,x1,y1) then the edge Nth edge bit corresponds
+to the edge where Nth coordinate in the rectangle is constant.
+
+SWGL tries to use an anti-aliasing method that is reasonably close to WR's
+signed-distance field approximation. WR would normally try to discern the
+2D local-space coordinates of a given destination pixel relative to the
+2D local-space bounding rectangle of a primitive. It then uses the screen-
+space derivative to try to determine the how many local-space units equate
+to a distance of around one screen-space pixel. A distance approximation
+of coverage is then used based on the distance in local-space from the
+the current pixel's center, roughly at half-intensity at pixel center
+and ranging to zero or full intensity within a radius of half a pixel
+away from the center. To account for AAing going outside the normal geometry
+boundaries of the primitive, WR has to extrude the primitive by a local-space
+estimate to allow some AA to happen within the extruded region.
+
+SWGL can ultimately do this approximation more simply and get around the
+extrusion limitations by just ensuring spans encompass any pixel that is
+partially covered when computing span boundaries. Further, since SWGL already
+knows the slope of an edge and the coordinate of the span relative to the span
+boundaries, finding the partial coverage of a given span becomes easy to do
+without requiring any extra interpolants to track against local-space bounds.
+Essentially, SWGL just performs anti-aliasing on the actual geometry bounds,
+but when the pixels on a span's edge are determined to be partially covered
+during span rasterization, it uses the same distance field method as WR on
+those span boundary pixels to estimate the coverage based on edge slope.
--- a/gfx/wr/swgl/build.rs
+++ b/gfx/wr/swgl/build.rs
@ -136,7 +136,7 @@ fn main() {
    cc::Build::new()
        .cpp(true)
        .file("src/gl.cc")
-        .flag("-std=c++14")
+        .flag("-std=c++17")
        .flag("-UMOZILLA_CONFIG_H")
        .flag("-fno-exceptions")
        .flag("-fno-rtti")
--- a/gfx/wr/swgl/src/gl.cc
+++ b/gfx/wr/swgl/src/gl.cc
--- a/gfx/wr/swgl/src/glsl.h
+++ b/gfx/wr/swgl/src/glsl.h
@ -215,7 +215,13 @@ SI Float sqrt(Float v) {
 #endif
 }

-SI float recip(float x) { return 1.0f / x; }
+SI float recip(float x) {
+#if USE_SSE2
+  return _mm_cvtss_f32(_mm_rcp_ss(_mm_set_ss(x)));
+#else
+  return 1.0f / x;
+#endif
+}

 // Use a fast vector reciprocal approximation when available. This should only
 // be used in cases where it is okay that the approximation is imprecise -
@ -233,7 +239,13 @@ SI Float recip(Float v) {
 #endif
 }

-SI float inversesqrt(float x) { return 1.0f / sqrtf(x); }
+SI float inversesqrt(float x) {
+#if USE_SSE2
+  return _mm_cvtss_f32(_mm_rsqrt_ss(_mm_set_ss(x)));
+#else
+  return 1.0f / sqrtf(x);
+#endif
+}

 SI Float inversesqrt(Float v) {
 #if USE_SSE2
@ -674,8 +686,8 @@ SI I32 roundfast(Float v, Float scale) {
 }

 template <typename T>
-SI auto round_pixel(T v, float maxval = 1.0f) {
-  return roundfast(v, (255.0f / maxval));
+SI auto round_pixel(T v, float scale = 255.0f) {
+  return roundfast(v, scale);
 }

 #define round __glsl_round
@ -1166,6 +1178,23 @@ struct bvec4_scalar {
  IMPLICIT constexpr bvec4_scalar(bool a) : x(a), y(a), z(a), w(a) {}
  constexpr bvec4_scalar(bool x, bool y, bool z, bool w)
      : x(x), y(y), z(z), w(w) {}
+
+  bool& select(XYZW c) {
+    switch (c) {
+      case X:
+        return x;
+      case Y:
+        return y;
+      case Z:
+        return z;
+      case W:
+        return w;
+    }
+  }
+  bool sel(XYZW c1) { return select(c1); }
+  bvec2_scalar sel(XYZW c1, XYZW c2) {
+    return bvec2_scalar(select(c1), select(c2));
+  }
 };

 struct bvec4_scalar1 {
@ -1207,6 +1236,10 @@ bvec4_scalar make_bvec4(bool x, bool y, bool z, bool w) {
  return bvec4_scalar{x, y, z, w};
 }

+bvec4_scalar make_bvec4(bvec2_scalar a, bvec2_scalar b) {
+  return bvec4_scalar{a.x, a.y, b.x, b.y};
+}
+
 template <typename N>
 bvec4 make_bvec4(const N& n) {
  return bvec4(n);
@ -1990,6 +2023,10 @@ SI bvec2 lessThan(vec2 x, vec2 y) {
  return bvec2(lessThan(x.x, y.x), lessThan(x.y, y.y));
 }

+SI bvec2_scalar lessThan(vec2_scalar x, vec2_scalar y) {
+  return bvec2_scalar(lessThan(x.x, y.x), lessThan(x.y, y.y));
+}
+
 template <typename T>
 auto greaterThan(T x, T y) -> decltype(x > y) {
  return x > y;
@ -1999,6 +2036,10 @@ bvec2 greaterThan(vec2 x, vec2 y) {
  return bvec2(greaterThan(x.x, y.x), greaterThan(x.y, y.y));
 }

+bvec2_scalar greaterThan(vec2_scalar x, vec2_scalar y) {
+  return bvec2_scalar(greaterThan(x.x, y.x), greaterThan(x.y, y.y));
+}
+
 template <typename T>
 auto greaterThanEqual(T x, T y) -> decltype(x >= y) {
  return x >= y;
--- a/gfx/wr/swgl/src/swgl_ext.h
+++ b/gfx/wr/swgl/src/swgl_ext.h
@ -96,22 +96,10 @@ static ALWAYS_INLINE auto swgl_forceScalar(T v) -> decltype(force_scalar(v)) {
    swgl_SpanLength -= swgl_StepSize;     \
  } while (0)

-static ALWAYS_INLINE WideRGBA8 pack_pixels_RGBA8(Float alpha) {
-  I32 i = round_pixel(alpha);
-  HalfRGBA8 c = packRGBA8(zipLow(i, i), zipHigh(i, i));
-  return combine(zipLow(c, c), zipHigh(c, c));
-}
-
-static ALWAYS_INLINE WideRGBA8 pack_pixels_RGBA8(float alpha) {
-  I32 i = round_pixel(alpha);
-  HalfRGBA8 c = packRGBA8(i, i);
-  return combine(c, c);
-}
-
 // Commit a single chunk of a color scaled by an alpha weight
 #define swgl_commitColor(format, color, alpha)                    \
-  swgl_commitChunk(format, muldiv255(pack_pixels_##format(color), \
-                                     pack_pixels_##format(alpha)))
+  swgl_commitChunk(format, muldiv256(pack_pixels_##format(color), \
+                                     pack_pixels_##format(alpha, 256.0f)))
 #define swgl_commitColorRGBA8(color, alpha) \
  swgl_commitColor(RGBA8, color, alpha)
 #define swgl_commitColorR8(color, alpha) swgl_commitColor(R8, color, alpha)
@ -134,7 +122,8 @@ static ALWAYS_INLINE bool swgl_isTextureR8(S s) {
 // Returns the offset into the texture buffer for the given layer index. If not
 // a texture array or 3D texture, this will always access the first layer.
 template <typename S>
-static ALWAYS_INLINE int swgl_textureLayerOffset(S s, float layer) {
+static ALWAYS_INLINE int swgl_textureLayerOffset(UNUSED S s,
+                                                 UNUSED float layer) {
  return 0;
 }

@ -171,9 +160,9 @@ static ALWAYS_INLINE T swgl_linearQuantizeStep(S s, T p) {

 // Commit a single chunk from a linear texture fetch that is scaled by a color
 #define swgl_commitTextureLinearColor(format, s, p, color, ...)     \
-  swgl_commitChunk(format, muldiv255(textureLinearUnpacked##format( \
+  swgl_commitChunk(format, muldiv256(textureLinearUnpacked##format( \
                                         s, ivec2(p), __VA_ARGS__), \
-                                     pack_pixels_##format(color)))
+                                     pack_pixels_##format(color, 256.0f)))
 #define swgl_commitTextureLinearColorRGBA8(s, p, color, ...) \
  swgl_commitTextureLinearColor(RGBA8, s, p, color, __VA_ARGS__)
 #define swgl_commitTextureLinearColorR8(s, p, color, ...) \
@ -230,7 +219,7 @@ static ALWAYS_INLINE PackedRGBA8 convertYUV(int colorSpace, U16 y, U16 u,
 // Helper functions to sample from planar YUV textures before converting to RGB
 template <typename S0>
 static inline PackedRGBA8 sampleYUV(S0 sampler0, vec2 uv0, int layer0,
-                                    int colorSpace, int rescaleFactor) {
+                                    int colorSpace, UNUSED int rescaleFactor) {
  ivec2 i0(uv0);
  switch (sampler0->format) {
    case TextureFormat::RGBA8: {
@ -252,15 +241,15 @@ template <typename S0, typename C>
 static inline WideRGBA8 sampleColorYUV(S0 sampler0, vec2 uv0, int layer0,
                                       int colorSpace, int rescaleFactor,
                                       C color) {
-  return muldiv255(
+  return muldiv256(
      unpack(sampleYUV(sampler0, uv0, layer0, colorSpace, rescaleFactor)),
-      pack_pixels_RGBA8(color));
+      pack_pixels_RGBA8(color, 256.0f));
 }

 template <typename S0, typename S1>
 static inline PackedRGBA8 sampleYUV(S0 sampler0, vec2 uv0, int layer0,
                                    S1 sampler1, vec2 uv1, int layer1,
-                                    int colorSpace, int rescaleFactor) {
+                                    int colorSpace, UNUSED int rescaleFactor) {
  ivec2 i0(uv0);
  ivec2 i1(uv1);
  switch (sampler1->format) {
@ -287,9 +276,9 @@ static inline WideRGBA8 sampleColorYUV(S0 sampler0, vec2 uv0, int layer0,
                                       S1 sampler1, vec2 uv1, int layer1,
                                       int colorSpace, int rescaleFactor,
                                       C color) {
-  return muldiv255(unpack(sampleYUV(sampler0, uv0, layer0, sampler1, uv1,
+  return muldiv256(unpack(sampleYUV(sampler0, uv0, layer0, sampler1, uv1,
                                    layer1, colorSpace, rescaleFactor)),
-                   pack_pixels_RGBA8(color));
+                   pack_pixels_RGBA8(color, 256.0f));
 }

 template <typename S0, typename S1, typename S2>
@ -336,10 +325,10 @@ static inline WideRGBA8 sampleColorYUV(S0 sampler0, vec2 uv0, int layer0,
                                       S2 sampler2, vec2 uv2, int layer2,
                                       int colorSpace, int rescaleFactor,
                                       C color) {
-  return muldiv255(
+  return muldiv256(
      unpack(sampleYUV(sampler0, uv0, layer0, sampler1, uv1, layer1, sampler2,
                       uv2, layer2, colorSpace, rescaleFactor)),
-      pack_pixels_RGBA8(color));
+      pack_pixels_RGBA8(color, 256.0f));
 }

 // Commit a single chunk of a YUV surface represented by multiple planar
@ -362,7 +351,7 @@ static ALWAYS_INLINE WideRGBA8 applyColor(WideRGBA8 src, NoColor) {
 }

 static ALWAYS_INLINE WideRGBA8 applyColor(WideRGBA8 src, WideRGBA8 color) {
-  return muldiv255(src, color);
+  return muldiv256(src, color);
 }

 static ALWAYS_INLINE PackedRGBA8 applyColor(PackedRGBA8 src, NoColor) {
@ -370,7 +359,7 @@ static ALWAYS_INLINE PackedRGBA8 applyColor(PackedRGBA8 src, NoColor) {
 }

 static ALWAYS_INLINE PackedRGBA8 applyColor(PackedRGBA8 src, WideRGBA8 color) {
-  return pack(muldiv255(unpack(src), color));
+  return pack(muldiv256(unpack(src), color));
 }

 // Samples an axis-aligned span of on a single row of a texture using 1:1
@ -469,7 +458,7 @@ static void blendTextureNearestRGBA8(S sampler, const ivec2_scalar& i, int span,

 #define swgl_commitTextureNearestColor(format, s, p, uv_rect, color, ...) \
  swgl_commitTextureNearest(format, s, p, uv_rect,                        \
-                            pack_pixels_##format(color), __VA_ARGS__)
+                            pack_pixels_##format(color, 256.0f), __VA_ARGS__)
 #define swgl_commitTextureNearestColorRGBA8(s, p, uv_rect, color, ...) \
  swgl_commitTextureNearestColor(RGBA8, s, p, uv_rect, color, __VA_ARGS__)
 #define swgl_commitTextureNearestColorR8(s, p, uv_rect, color, ...) \
@ -554,19 +543,26 @@ static inline WideRGBA8 sampleGradient(sampler2D sampler, int address,

 // Variant that allows specifying a color multiplier of the gradient result.
 #define swgl_commitGradientColorRGBA8(sampler, address, entry, color)        \
-  swgl_commitChunk(RGBA8, muldiv255(sampleGradient(sampler, address, entry), \
-                                    pack_pixels_RGBA8(color)))
+  swgl_commitChunk(RGBA8, muldiv256(sampleGradient(sampler, address, entry), \
+                                    pack_pixels_RGBA8(color, 256.0f)))

 // Extension to set a clip mask image to be sampled during blending. The offset
 // specifies the positioning of the clip mask image relative to the viewport
 // origin. The bounding box specifies the rectangle relative to the clip mask's
-// origin that constrains sampling within the clip mask.
+// origin that constrains sampling within the clip mask. Blending must be
+// enabled for this to work.
+enum SWGLClipFlag {
+  SWGL_CLIP_FLAG_MASK = 1 << 0,
+  SWGL_CLIP_FLAG_AA = 1 << 1,
+};
+static int swgl_ClipFlags = 0;
 static sampler2D swgl_ClipMask = nullptr;
 static IntPoint swgl_ClipMaskOffset = {0, 0};
 static IntRect swgl_ClipMaskBounds = {0, 0, 0, 0};
 #define swgl_clipMask(mask, offset, bb_origin, bb_size)        \
  do {                                                         \
    if (bb_size != vec2_scalar(0.0f, 0.0f)) {                  \
+      swgl_ClipFlags |= SWGL_CLIP_FLAG_MASK;                   \
      swgl_ClipMask = mask;                                    \
      swgl_ClipMaskOffset = make_ivec2(offset);                \
      swgl_ClipMaskBounds =                                    \
@ -574,6 +570,25 @@ static IntRect swgl_ClipMaskBounds = {0, 0, 0, 0};
    }                                                          \
  } while (0)

+// Extension to enable anti-aliasing for the given edges of a quad.
+// Blending must be enable for this to work.
+static int swgl_AAEdgeMask = 0;
+
+static ALWAYS_INLINE int calcAAEdgeMask(bool on) { return on ? 0xF : 0; }
+static ALWAYS_INLINE int calcAAEdgeMask(int mask) { return mask; }
+static ALWAYS_INLINE int calcAAEdgeMask(bvec4_scalar mask) {
+  return (mask.x ? 1 : 0) | (mask.y ? 2 : 0) | (mask.z ? 4 : 0) |
+         (mask.w ? 8 : 0);
+}
+
+#define swgl_antiAlias(edges)                \
+  do {                                       \
+    swgl_AAEdgeMask = calcAAEdgeMask(edges); \
+    if (swgl_AAEdgeMask) {                   \
+      swgl_ClipFlags |= SWGL_CLIP_FLAG_AA;   \
+    }                                        \
+  } while (0)
+
 // Dispatch helper used by the GLSL translator to swgl_drawSpan functions.
 // The number of pixels committed is tracked by checking for the difference in
 // swgl_SpanLength. Any varying interpolants used will be advanced past the
--- a/gfx/wr/swgl/src/texture.h
+++ b/gfx/wr/swgl/src/texture.h
@ -470,7 +470,7 @@ SI T samplerScale(S sampler, T P) {
 }

 template <typename T>
-SI T samplerScale(sampler2DRect sampler, T P) {
+SI T samplerScale(UNUSED sampler2DRect sampler, T P) {
  return P;
 }