Before this CL, SkRecord only adjusted the bounds of draw ops for SaveLayers' paints.
That worked fine, but as a final step we intersect the bounds of draw ops with the
bounds of the current clip, essentially undoing all that work.
I think the right fix here is to also adjust the bounds of the clip ops.
BUG=skia:2957, 415468
R=robertphillips@google.com
Review URL: https://codereview.chromium.org/595953002
Refactored text blob backend for improved performance: instead of using
separate buffers for runs/positions/glyphs, everything is now packed in
a consolidated slab (including the SkTextBlob object itself!).
Benefits:
* number of allocations per blob construction reduced from ~4 to 1
(also minimizes internal fragmentation)
* run record size reduced by 8 bytes
This takes the blob construction overhead down to negligible levels
(for the current Blink uncached textblob implementation).
Unfortunately, the code is much more finicky (run merging in
particular) -- hence the assert spree.
Multi-run blobs are vulnerable to realloc storms but this is not a
problem at the moment because Blink is using one-run blobs 99% of the
time. Will be addressed in the future.
R=mtklein@google.com, reed@google.com, robertphillips@google.com
Committed: https://skia.googlesource.com/skia/+/13645ea0ea87038ebd71be3bd6d53b313069a9e4
Author: fmalita@chromium.org
Review URL: https://codereview.chromium.org/581173003
Reason for revert:
Broke the new blobshader gm.
Original issue's description:
> Souped-up SkTextBlob.
>
> Refactored text blob backend for improved performance: instead of using
> separate buffers for runs/positions/glyphs, everything is now packed in
> a consolidated slab (including the SkTextBlob object itself!).
>
> Benefits:
>
> * number of allocations per blob construction reduced from ~4 to 1
> (also minimizes internal fragmentation)
> * run record size reduced by 8 bytes
>
> This takes the blob construction overhead down to negligible levels
> (for the current Blink uncached textblob implementation).
>
> Unfortunately, the code is much more finicky (run merging in
> particular) -- hence the assert spree.
>
> Multi-run blobs are vulnerable to realloc storms but this is not a
> problem at the moment because Blink is using one-run blobs 99% of the
> time. Will be addressed in the future.
>
>
> R=reed@google.com,mtklein@google.com,robertphillips@google.com
>
> Committed: https://skia.googlesource.com/skia/+/13645ea0ea87038ebd71be3bd6d53b313069a9e4R=mtklein@google.com, reed@google.com, robertphillips@google.comTBR=mtklein@google.com, reed@google.com, robertphillips@google.com
NOTREECHECKS=true
NOTRY=true
Author: fmalita@chromium.org
Review URL: https://codereview.chromium.org/588853002
Refactored text blob backend for improved performance: instead of using
separate buffers for runs/positions/glyphs, everything is now packed in
a consolidated slab (including the SkTextBlob object itself!).
Benefits:
* number of allocations per blob construction reduced from ~4 to 1
(also minimizes internal fragmentation)
* run record size reduced by 8 bytes
This takes the blob construction overhead down to negligible levels
(for the current Blink uncached textblob implementation).
Unfortunately, the code is much more finicky (run merging in
particular) -- hence the assert spree.
Multi-run blobs are vulnerable to realloc storms but this is not a
problem at the moment because Blink is using one-run blobs 99% of the
time. Will be addressed in the future.
R=mtklein@google.com, reed@google.com, robertphillips@google.com
Author: fmalita@chromium.org
Review URL: https://codereview.chromium.org/581173003
This is intended to disconnect the lifetimes of the optimization data, cached layers and replacement objects.
Note that the optimization data already makes a copy of the paint in the SkPicture. Additionally the replacement object will probably go away at some point.
R=bsalomon@google.com
Author: robertphillips@google.com
Review URL: https://codereview.chromium.org/579843002
This is a bit like a limited SkData, geared to store really tiny byte strings.
This is not hooked up anywhere beyond the new unit test. I did experimentally
plumb it into SkRecord for drawPosTextH: just over 40% of drawPosTextH calls in
our repo can fit into ShortData.
BUG=skia:
R=reed@google.com, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/573323002
blink skips all pending commands during picture recording if it is drawing an opaque full-frame
geometry or image. This may improve performance for some edge cases. To recognize an opaque
full-frame drawing should be cheap enough. Otherwise, the overhead will offset the improvement.
Unfortunately, data from perf for content_shell on Nexus7 shows that SkDeferredCanvas::isFullFrame
is far from cheap. Table below shows that how much isFullFrame() costs in the whole render process.
benchmark percentage
my local benchmark(draw 1000 sprites) 4.1%
speedReading 2.8%
FishIETank(1000 fishes) 1.5%
GUIMark3 Bitmap 2.0%
By contrast, real recording (SkGPipeCanvas::drawBitmapRectToRect) and real rasterization
(GrDrawTarget::drawRect) cost ~4% and ~6% in the whole render process respectively. Apparently,
SkDeferredCanvas::isFullFrame() is nontrivial.
getDeviceSize() is the main contributor to this hotspot. The change simply save the canvasSize and
reuse it among drawings if it is not a fresh frame. This change cut off ~65% (or improved ~2 times)
of isFullFrame().
telemetry smoothness canvas_tough_test didn't show obvious improvement or regression.
BUG=411166
Committed: https://skia.googlesource.com/skia/+/8e45c3777d886ba3fe239bb549d06b0693692152R=junov@chromium.org, tomhudson@google.com, reed@google.com
Author: yunchao.he@intel.com
Review URL: https://codereview.chromium.org/545813002
Reason for revert:
This is leaking memory:
http://108.170.220.120:10117/builders/Test-Ubuntu13.10-GCE-NoGPU-x86_64-Debug-ASAN/builds/2516/steps/RunDM/logs/stdio
Original issue's description:
> Picture Recording: fix the performance bottleneck in SkDeferredCanvas::isFullFrame
>
> blink skips all pending commands during picture recording if it is drawing an opaque full-frame
> geometry or image. This may improve performance for some edge cases. To recognize an opaque
> full-frame drawing should be cheap enough. Otherwise, the overhead will offset the improvement.
> Unfortunately, data from perf for content_shell on Nexus7 shows that SkDeferredCanvas::isFullFrame
> is far from cheap. Table below shows that how much isFullFrame() costs in the whole render process.
>
> benchmark percentage
> my local benchmark(draw 1000 sprites) 4.1%
> speedReading 2.8%
> FishIETank(1000 fishes) 1.5%
> GUIMark3 Bitmap 2.0%
>
> By contrast, real recording (SkGPipeCanvas::drawBitmapRectToRect) and real rasterization
> (GrDrawTarget::drawRect) cost ~4% and ~6% in the whole render process respectively. Apparently,
> SkDeferredCanvas::isFullFrame() is nontrivial.
>
> getDeviceSize() is the main contributor to this hotspot. The change simply save the canvasSize and
> reuse it among drawings if it is not a fresh frame. This change cut off ~65% (or improved ~2 times)
> of isFullFrame().
>
> telemetry smoothness canvas_tough_test didn't show obvious improvement or regression.
>
> BUG=411166
>
> Committed: https://skia.googlesource.com/skia/+/8e45c3777d886ba3fe239bb549d06b0693692152R=junov@chromium.org, tomhudson@google.com, reed@google.com, yunchao.he@intel.comTBR=junov@chromium.org, reed@google.com, tomhudson@google.com, yunchao.he@intel.com
NOTREECHECKS=true
NOTRY=true
BUG=411166
Author: mtklein@google.com
Review URL: https://codereview.chromium.org/571053002
blink skips all pending commands during picture recording if it is drawing an opaque full-frame
geometry or image. This may improve performance for some edge cases. To recognize an opaque
full-frame drawing should be cheap enough. Otherwise, the overhead will offset the improvement.
Unfortunately, data from perf for content_shell on Nexus7 shows that SkDeferredCanvas::isFullFrame
is far from cheap. Table below shows that how much isFullFrame() costs in the whole render process.
benchmark percentage
my local benchmark(draw 1000 sprites) 4.1%
speedReading 2.8%
FishIETank(1000 fishes) 1.5%
GUIMark3 Bitmap 2.0%
By contrast, real recording (SkGPipeCanvas::drawBitmapRectToRect) and real rasterization
(GrDrawTarget::drawRect) cost ~4% and ~6% in the whole render process respectively. Apparently,
SkDeferredCanvas::isFullFrame() is nontrivial.
getDeviceSize() is the main contributor to this hotspot. The change simply save the canvasSize and
reuse it among drawings if it is not a fresh frame. This change cut off ~65% (or improved ~2 times)
of isFullFrame().
telemetry smoothness canvas_tough_test didn't show obvious improvement or regression.
BUG=411166
R=junov@chromium.org, tomhudson@google.com, reed@google.com
Author: yunchao.he@intel.com
Review URL: https://codereview.chromium.org/545813002
This adds SkResourceCache::Remove() which will remove a resource from
its cache. The resource is required to be unlocked at the time Remove()
is called.
Then SkBitmapCache::Find() makes use of this to Remove() bitmaps from
the cache whose pixels have been evicted. This allows the bitmap to be
re-added to the cache with pixels again.
After this change, background a tab (and discarding all the bitmaps'
contents) no longer disables image caching for those discarded images
once the tab is visible again.
BUG=skia:2926
NOTRY=true
R=reed@android.com, tomhudson@google.com, reed@google.com
Author: danakj@chromium.org
Review URL: https://codereview.chromium.org/561953002
There are two ways negative sigma values may occur: in
the original filter parameters, or after multiplication
by a negative scaling CTM. The former case is
invalid according to the spec, so we continue to check
for it at validation time. In the latter case, we should
interpret it as a horizontal flip in the kernel pixel
access, and simply take the absolute value (since the
filter kernel is symmetric).
Also refactor all this logic into a single place for the
CPU, GPU and onFilterBounds() paths.
BUG=https://code.google.com/p/chromium/issues/detail?id=409602R=sugoi@google.com, reed@google.com, sugoi@chromium.org
Author: senorblanco@chromium.org
Review URL: https://codereview.chromium.org/555603002
This has the nice property of being able to double-check hashes after the fact.
mtklein@mtklein ~/skia (hash-png)> md5sum bad/8888/3x3bitmaprect.png
deede70ab2f34067d461fb4a93332d4c bad/8888/3x3bitmaprect.png
mtklein@mtklein ~/skia (hash-png)> grep 3x3bitmaprect_8888 bad/dm.json
"3x3bitmaprect_8888" : "deede70ab2f34067d461fb4a93332d4c",
I have checked that no two premultiplied colors map to the same unpremultiplied
color (math nerds: unpremultiplication is injective), so a change in
premultiplied SkBitmap will always imply a change in the encoded
unpremultiplied .png. This means, it's safe to hash .pngs; we won't miss
subtle changes.
BUG=skia:
R=jcgregorio@google.com, stephana@google.com, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/549203003
SkTaskGroup is like SkThreadPool except the threads stay in
one global pool. Each SkTaskGroup itself is tiny (4 bytes)
and its wait() method applies only to tasks add()ed to that
instance, not the whole thread pool.
This means we don't need to bring up new thread pools when
tests themselves want to use multithreading (e.g. pathops,
quilt). We just create a new SkTaskGroup and wait for that
to complete. This should be more efficient, and allow us
to expand where we use threads to really latency sensitive
places. E.g. we can probably now use these in nanobench
for CPU .skp rendering.
Now that all threads are sharing the same pool, I think we
can remove most of the custom mechanism pathops tests use
to control threading. They'll just ride on the global pool
with all other tests now.
This (temporarily?) removes the GPU multithreading feature
from DM, which we don't use.
On my desktop, DM runs a little faster (57s -> 55s) in
Debug, and a lot faster in Release (36s -> 24s). The bots
show speedups of similar proportions, cutting more than a
minute off the N4/Release and Win7/Debug runtimes.
BUG=skia:
Committed: https://skia.googlesource.com/skia/+/9c7207b5dc71dc5a96a2eb107d401133333d5b6fR=caryclark@google.com, bsalomon@google.com, bungeman@google.com, mtklein@google.com, reed@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/531653002
Reason for revert:
Leaks, leaks, leaks.
Original issue's description:
> SkThreadPool ~~> SkTaskGroup
>
> SkTaskGroup is like SkThreadPool except the threads stay in
> one global pool. Each SkTaskGroup itself is tiny (4 bytes)
> and its wait() method applies only to tasks add()ed to that
> instance, not the whole thread pool.
>
> This means we don't need to bring up new thread pools when
> tests themselves want to use multithreading (e.g. pathops,
> quilt). We just create a new SkTaskGroup and wait for that
> to complete. This should be more efficient, and allow us
> to expand where we use threads to really latency sensitive
> places. E.g. we can probably now use these in nanobench
> for CPU .skp rendering.
>
> Now that all threads are sharing the same pool, I think we
> can remove most of the custom mechanism pathops tests use
> to control threading. They'll just ride on the global pool
> with all other tests now.
>
> This (temporarily?) removes the GPU multithreading feature
> from DM, which we don't use.
>
> On my desktop, DM runs a little faster (57s -> 55s) in
> Debug, and a lot faster in Release (36s -> 24s). The bots
> show speedups of similar proportions, cutting more than a
> minute off the N4/Release and Win7/Debug runtimes.
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/9c7207b5dc71dc5a96a2eb107d401133333d5b6fR=caryclark@google.com, bsalomon@google.com, bungeman@google.com, reed@google.com, mtklein@chromium.orgTBR=bsalomon@google.com, bungeman@google.com, caryclark@google.com, mtklein@chromium.org, reed@google.com
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Author: mtklein@google.com
Review URL: https://codereview.chromium.org/533393002
SkTaskGroup is like SkThreadPool except the threads stay in
one global pool. Each SkTaskGroup itself is tiny (4 bytes)
and its wait() method applies only to tasks add()ed to that
instance, not the whole thread pool.
This means we don't need to bring up new thread pools when
tests themselves want to use multithreading (e.g. pathops,
quilt). We just create a new SkTaskGroup and wait for that
to complete. This should be more efficient, and allow us
to expand where we use threads to really latency sensitive
places. E.g. we can probably now use these in nanobench
for CPU .skp rendering.
Now that all threads are sharing the same pool, I think we
can remove most of the custom mechanism pathops tests use
to control threading. They'll just ride on the global pool
with all other tests now.
This (temporarily?) removes the GPU multithreading feature
from DM, which we don't use.
On my desktop, DM runs a little faster (57s -> 55s) in
Debug, and a lot faster in Release (36s -> 24s). The bots
show speedups of similar proportions, cutting more than a
minute off the N4/Release and Win7/Debug runtimes.
BUG=skia:
R=caryclark@google.com, bsalomon@google.com, bungeman@google.com, mtklein@google.com, reed@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/531653002
Scale all images to the nearest rounded integer, and if there's still
any scaling factor left over, pass it on to the subsequent bilerp code.
Should avoid artifacts when tiling scaled images.
Original CL received an LGTM from reed; new version disabled tiling
in the downsamplebitmap GM; I verified that this fixes the issue
we were seeing there on non-neon androids.
BUG=skia:2888
R=reed@android.com
TBR=reed
Author: humper@google.com
Review URL: https://codereview.chromium.org/514383003