Calling ClientWebGLContext::UniformData() many times causes the
command buffer to fill up and we spend a fair amount of time flushing
the old buffer and allocating a new one, as well as serializing the
values.
The uniforms themselves are very small but they add up over a large
number of calls. We already have some code to track whether the
uniform values are dirty to avoid some redundancy, but a) this doesn't
cover every uniform, and b) we invalidate them all when switching
program.
This patch makes us track the value of every uniform that gets set
dynamically, and tracks the values separately for each program
used. It then uses these to avoid calling UniformData redundantly.
Differential Revision: https://phabricator.services.mozilla.com/D190269
For lines and rects, we don't have to worry about AARect generating overlapping
triangles when alpha is used. In these cases we can avoid drawing to a mask first
and avoid a performance cliff.
Differential Revision: https://phabricator.services.mozilla.com/D180525
Since AAStroke can't deal with non-opaque stroked path, we first generate a normal opaque, anti-aliased
stroked path with AAStroke and render it to a cache texture bound to a render target. We can then later
just use that texture with alpha to support the initial alpha stroke request.
One caveat is that trying to both render to a texture bound to a framebuffer and also upload directly to
it with texSubImage2D can expose bugs in some OpenGL drivers that have different underlying representations
for data textures and for render target textures. To avoid this problem, we segregate the texture cache
pages based on whether they are used as render targets or for direct data uploads.
This ultimately all avoids the fallback of having to draw the alpha stroke in software with Skia and
then upload it to a texture. For stroked paths with large hollow areas, uploading a Skia surface whose
bounds contain the full stroke can cause a lot of uploading of unnecessary pixel data. This allows us
to only upload the triangle mesh for AAStroke and otherwise keep generation solely on the GPU.
Differential Revision: https://phabricator.services.mozilla.com/D180143
This implements some optimizations targeted at Canvas2D's putImageData:
1) Track whether the canvas is in the initially clear state so that we avoid
reading back from the WebGL framebuffer into the Skia framebuffer when a
fallback does occur or when a data snapshot is needed.
2) For surfaces that are too large to upload to a texture, directly use
glTexSubImage2D to draw data to the WebGL framebuffer, bypassing a separate
texture upload.
3) Disregard the surface size limits for SurfacePatterns containing a
compatible texture handle.
Differential Revision: https://phabricator.services.mozilla.com/D171773
If we choose to accelerate a single line path, we need to take care not to use
the line cap when the path is closed. When the path is closed, we need to use
the line join instead.
Differential Revision: https://phabricator.services.mozilla.com/D170469
Skia upstream removed deprecated clip ops that could be used to replace
the clipping stack and bypass clips. We shouldn't really need to do this
anymore, as we can work around it just using public APIs.
The only SkCanvas operation that allows us to bypass clipping is
writePixels, which still allows us to implement CopySurface/putImageData.
Other instances where we were using the replace op for DrawTargetWebgl
layering support can just be worked around by creating a separate
DrawTargetSkia pointing to the same pixel data, but on which no clipping
or transforms are applied so that we can freely do drawing operations
on it to the base layer pixel data regardless of any user-applied clipping.
Differential Revision: https://phabricator.services.mozilla.com/D168039
This updates the version wpf-gpu-raster which adds support for
GPUs/drivers that use truncation instead of rounding when converting
vertices to fixed point.
It also adds the GL vendor to InitContextResult so that we can detect
AMD on macOS and tell wpf-gpu-raster that truncation is going to happen.
Differential Revision: https://phabricator.services.mozilla.com/D167503
CanvasRenderingContext2D relies upon CreateSimilarDrawTarget to create extract
a subrect from a surface to draw. However, DrawTargetWebgl does not return an
accelerated DT for that API as creating an entirely new context can be quite
expensive.
To work around this, this adds a specific ExtractSubrect API for SourceSurface
that can bypass the entire need to create a temporary DrawTarget to copy into.
Differential Revision: https://phabricator.services.mozilla.com/D164118
This pre-allocates a vertex output buffer in DrawTargetWebgl so that we can generate
wpf-gpu-raster and aa-stroke output into it. This way, they don't have to realloc
a Vec for pushes or changing into a boxed slice. This can net 5-10% on profiles for
the demos noted in the bug.
Depends on D163989
Differential Revision: https://phabricator.services.mozilla.com/D163990
It seems like this is slow for now until we implement a better way than WPF-gpu-raster
for stroking paths. Just hide this behind a pref so we can at least test it but not
impact performance as badly.
Differential Revision: https://phabricator.services.mozilla.com/D163248
For use-cases that repeatedly pop and re-push the same clips over and over, we can regenerate the
same mask that is already still stored, because we only detect that clip state changed, rather than
that it changed to exactly the same state it was previously.
This just remembers the previous state of the clip stack at the time the clip mask was generated
so that we can compare the previous and current state. If they're the same, we can assume there
is no need to regenerate the clip mask again and simply reuse it.
Differential Revision: https://phabricator.services.mozilla.com/D162699
WebGL doesn't reliably implement line smoothing, so we can't rely on it, making it
useless for canvas lines. Instead, just fall back to emulating it manually with paths.
Differential Revision: https://phabricator.services.mozilla.com/D162540
Some paths may contain so many types that their vertex representation far exceeds their
software rasterized representation in memory size. As a sanity-check, we should just set
a hard limit on the maximum allowed complexity of a path that we attempt to supply to
wpf-gpu-raster. Beyond that, we will instead just rasterize in software and upload
to a texture which can be more performant.
Differential Revision: https://phabricator.services.mozilla.com/D162481
By default, BorrowSnapshot is pessimistic and forces DrawTargetWebgl to return a data snapshot on
the assumption that the snapshot might be used off thread. However, if we actually know the DrawTarget
we're going to be drawing the snapshot to, then we can check if they're both DrawTargetWebgls with
the same internal SharedContext. In that case, we can use a SourceSurfaceWebgl snapshot which can
pass through a GPU texture to the target. This requires us to plumb the DrawTarget down through
SurfaceFromElement all the way to DrawTargetWebgl to make this decision.
Differential Revision: https://phabricator.services.mozilla.com/D162176
This adds a path vertex buffer where triangle list output from WGR is stored.
Each PathCacheEntry can potentially reference a range of vertexes in this buffer
corresponding to triangles for that entry. When this buffer is full, it gets
orphaned and clears corresponding cache entries, so that it can start anew.
Differential Revision: https://phabricator.services.mozilla.com/D161479
This adds a path vertex buffer where triangle list output from WGR is stored.
Each PathCacheEntry can potentially reference a range of vertexes in this buffer
corresponding to triangles for that entry. When this buffer is full, it gets
orphaned and clears corresponding cache entries, so that it can start anew.
Differential Revision: https://phabricator.services.mozilla.com/D161479
For canvas users that rapidly create and destroy canvases, we may end up creating
a new SharedContext (and hence ClientWebGLContext) if there are no more canvases
left between destruction and creation. To work around this, just keep alive the
SharedContext for the main thread (other threads are unfortunately a bit tricky
to support) so that canvas creation remains fast in this instance.
Differential Revision: https://phabricator.services.mozilla.com/D158904
For canvas users that rapidly create and destroy canvases, we may end up creating
a new SharedContext (and hence ClientWebGLContext) if there are no more canvases
left between destruction and creation. To work around this, just keep alive the
SharedContext for the main thread (other threads are unfortunately a bit tricky
to support) so that canvas creation remains fast in this instance.
Differential Revision: https://phabricator.services.mozilla.com/D158904
For canvas users that rapidly create and destroy canvases, we may end up creating
a new SharedContext (and hence ClientWebGLContext) if there are no more canvases
left between destruction and creation. To work around this, just keep alive the
SharedContext for the main thread (other threads are unfortunately a bit tricky
to support) so that canvas creation remains fast in this instance.
Depends on D158903
Differential Revision: https://phabricator.services.mozilla.com/D158904
Previously we were reusing the framebuffer's Skia DT to render the clip mask.
This was the path of least resistance since SkCanvas does not allow exporting
clip information, and there is no way to reset the bitmap storage inside an
SkCanvas temporarily.
However, this can cause a feedback cycle of unnecessary WaitForShmem operations,
since we need to wait before we can generate the clip mask into the Skia target,
and then anything else after it needs to wait for the clip mask to finish uploading
before the Skia DT can be used again.
To alleviate this, we just allocate a new DrawTargetSkia to render the clip mask
into. We carefully clip the size of the DT so that in the common case we avoid
having to upload a surface the size of the entire framebuffer. Further, since
this is a completely different DT, we can now use an A8 format (1/4 the memory
overhead) instead of a BGRA8 format for the clip mask, which gives a further
memory usage gain.
A further complication is that we need to log the current clip stack state so
that we can replay it onto the new DrawTargetSkia. This avoids having to add
a mechanism to SkCanvas to export clip information.
Differential Revision: https://phabricator.services.mozilla.com/D157050
Sometimes the clip state is thrashed when we need to temporarily override
clipping to disable it. However, in this case, the clip mask itself remains
unchanged. The current invalidation scheme doesn't discern between generation
of the clip mask itself and setting the clip state for the shader, leading to
unnecessary regeneration of the clip mask.
This code just tries to discern when this is happening so we can refresh the
clip state without having to regenerate the clip mask unless truly necessary.
Differential Revision: https://phabricator.services.mozilla.com/D157048
Sometimes we hit requests to stroke a path with a rounded line in it that can't
be accelerated inside StrokeLine. This causes it to push a layer which can be
expensive. Go through DrawPath instead in this case which will still try to
accelerate the drawing with a cached texture that does not use a layer.
Differential Revision: https://phabricator.services.mozilla.com/D156791
DrawTargetWebgl currently only supports aligned rectangular clips that can be approximated
with a scissor. However, many use-cases require complex clips like rounded rectangles or
not-aligned regions. We can support these cases more generally by using a mask texture that
modulates the shader color. The mask texture is generated by doing a solid fill in the Skia
target over a clear background, which is safe because the Skia target is not in use while
the WebGL target is being rendered to. This adds one unconditional texture lookup to the
shaders which shouldn't have a big performance impact. When no clip mask is needed, we just
default to using a 1x1 solid texture.
Depends on D156224
Differential Revision: https://phabricator.services.mozilla.com/D156225
Currently we only support filled glyphs in DrawTargetWebgl. PDF.js can often render PDFs
that have stroked glyphs, so support for stroked glyphs is useful to prevent fallbacks.
This just adds support for plumbing StrokeOptions through to GlyphCache.
Differential Revision: https://phabricator.services.mozilla.com/D156224
mWebglValid gets initialized to false, but it will never get reset to true until the next
frame, causing us to render into Skia the first frame rather than accelerate. Therefor, we
should just initialize it to valid. Since it is cleared to zero initially, this is safe.
Differential Revision: https://phabricator.services.mozilla.com/D151896
mWebglValid gets initialized to false, but it will never get reset to true until the next
frame, causing us to render into Skia the first frame rather than accelerate. Therefor, we
should just initialize it to valid. Since it is cleared to zero initially, this is safe.
Differential Revision: https://phabricator.services.mozilla.com/D151896
BorrowSnapshot can be called by OffScreenCanvas in various places that may send
a SourceSurfaceWebgl to the main thread. If it did not originate from the main
thread, then this can cause multiple threads to use it. In general we want to
avoid this. For now, override BorrowSnapshot and make it always force a Skia
snapshot that can be safely shared between threads instead of SourceSurfaceWebgl.
Differential Revision: https://phabricator.services.mozilla.com/D152417
When rendering large and/or fullscreen Canvas2Ds, excessive time can be spent
in calls to TexImage/ReadPixels copying into and out of Shmems to the separate
buffer for DrawTargetSkia. To alleviate this, we can make the DrawTargetSkia
directly wrap the Shmem, so that calls to TexImage/ReadPixels then directly
read or write to this without any separate copy. We modify RawTexImage to use
the IPDL SendTexImage path so that Shmems can be sent via SurfaceDescriptor.
Since SendTexImage is nominally async (which is beneficial), we rely on a
call to GetError later to verify that the Shmem processing is completely before
we further modify the DrawTargetSkia. We further add a ReadPixelsIntoShmem IPDL
call to allow sending the Shmem in the other direction directly.
Differential Revision: https://phabricator.services.mozilla.com/D151286
When rendering large and/or fullscreen Canvas2Ds, excessive time can be spent
in calls to TexImage/ReadPixels copying into and out of Shmems to the separate
buffer for DrawTargetSkia. To alleviate this, we can make the DrawTargetSkia
directly wrap the Shmem, so that calls to TexImage/ReadPixels then directly
read or write to this without any separate copy. We modify RawTexImage to use
the IPDL SendTexImage path so that Shmems can be sent via SurfaceDescriptor.
Since SendTexImage is nominally async (which is beneficial), we rely on a
call to GetError later to verify that the Shmem processing is completely before
we further modify the DrawTargetSkia. We further add a ReadPixelsIntoShmem IPDL
call to allow sending the Shmem in the other direction directly.
Differential Revision: https://phabricator.services.mozilla.com/D151286
When rendering large and/or fullscreen Canvas2Ds, excessive time can be spent
in calls to TexImage/ReadPixels copying into and out of Shmems to the separate
buffer for DrawTargetSkia. To alleviate this, we can make the DrawTargetSkia
directly wrap the Shmem, so that calls to TexImage/ReadPixels then directly
read or write to this without any separate copy. We modify RawTexImage to use
the IPDL SendTexImage path so that Shmems can be sent via SurfaceDescriptor.
Since SendTexImage is nominally async (which is beneficial), we rely on a
call to GetError later to verify that the Shmem processing is completely before
we further modify the DrawTargetSkia. We further add a ReadPixelsIntoShmem IPDL
call to allow sending the Shmem in the other direction directly.
Differential Revision: https://phabricator.services.mozilla.com/D151286
With async present we can now rely on being able to do readbacks from WebGL
in the GPU process, rather than needing CopySnapshotTo to accelerate this in
the content process. Just remove CopySnapshotTo since it doesn't help anymore.
Differential Revision: https://phabricator.services.mozilla.com/D150721
If a DrawTargetWebgl's snapshot is mapped, then subsequently drawn, this can cause an assert to
trigger as we may not have a handle available when we go to unlink the cached snapshot that later
gets added to the texture cache. This assert is otherwise harmless, so we just fix the assert.
Otherwise, there are some inefficiencies with this scenario that this patch also tries to address.
When we go to draw the snapshot, DrawTargetWillChange gets invoked on the snapshot, and we can
ensure in this case the handle is copied efficiently here rather than later uploaded from mapped
data.
Differential Revision: https://phabricator.services.mozilla.com/D148102