pull the latest from libwebp.
Original source:
http://git.chromium.org/webm/libwebp.git
100644 blob 264210ba2807e4da47eb5d18c04cf869d89b9784 src/utils/thread.c
commit 46fd44c1042c9903b2f1ab87e9f200a13c7e702d
Author: James Zern <jzern@google.com>
Date: Tue Jul 8 19:53:28 2014 -0700
thread: remove harmless race on status_ in End()
if a thread was still doing work when End() was called there'd be a race
on worker->status_. in these cases, however, the specific value is
meaningless as it would be >= OK and the thread would have been shut
down properly, but we'll check 'impl_' instead to avoid any potential
TSan/DRD reports.
Change-Id: Ib93cbc226a099f07761f7bad765549dffb8054b1
Change-Id: Ib0ef25737b3c6d017fa74822e21ed58508230b91
Currently, vp9_diamond_search_sadx4() is only called when sse3 is
enabled, which is improper since sse2 optimization of sdx4df
functions are available. Changed to always use
vp9_diamond_search_sadx4().
Change-Id: I4b95d6b7a3c6c645783c373f0ba8d645ece24717
- fix indent, spelling
- drop some whitespace in some comments
- add an assert in vp9_setup_mask, it shouldn't be called on decode
error
Change-Id: Ic312a815e977a6f9cb81ceb7b039eeada76c5aa0
There are sse2 optimization of sdx4df functions. Instead of calling
vp9_refining_search_sadx4 only when sse3 is enabled, call it always.
Change-Id: I24f93818f7d4209d1425039e0eb099ff9ff08fe9
Use FAST_HEX in speed 5 and 6, which covers more points than
FAST_DIAMOND and improves motion search quality.
At speed 6, RTC set borg tests showed slight quality gain (psnr
gain: 0.143%, ssim gain: 0.226%). No noticeable encoding speed
change.
Change-Id: Ifa62875d9a52ee382ec494f271382bb77d8c67bf
This commit combined the full pel and sub pel motion search into a
single function to avoid code duplication. The commit does not change
encoder outputs.
Change-Id: Ibe18342c4f64073bef20f9cf6c6ca0a20d01bf0d
This commit enables a new quantization process for 32x32 2D-DCT
transform coefficient blocks. It improves the compression
performance of speed 5 by 1.4%. The overall compression gains of
speed 5 due to the new quantization scheme is 4.7%. It also includes
the SSSE3 implementation of the 32x32 quantization process.
Change-Id: I0855b124fd6462418683f783f5bcb44255c9993b
This patch fixes bug 633:
https://code.google.com/p/webm/issues/detail?id=633
The first decoded frame does not have to be a keyframe,
it could be an inter-frame that is coded intra-only.
This patch fixes the handling of intra-only frames.
A test vector has also been added that encodes 3
intra-only frames at the start of the clip. The
test vector was generated using the code in the
following patch:
https://gerrit.chromium.org/gerrit/#/c/70680/
Change-Id: Ib40b1dbf91aae2bc047e23c626eaef09d1860147
In the previous version, only certain buffers in the macroblockd were saved and
the restored. In this version, all of the buffers are saved and restored. The
code was then rolled into a loop for readability.
Also contains a tiny fix for when the -DOUTPUT_YUV_DENOISED flag is used.
Change-Id: Id925ef8b3fa122ae88acfa1d9a1e4df45df83518
Prepare for frame parallel decoding, the reference count buffers
need to be protected by mutex. Move vp9_thread.* to common
folder so that those buffers could use cross-platform mutex
from vp9_thread.*.
Change-Id: I541277cf15eefed6641555944f67f4a0bcdc8154
* Replace max_step_search_steps with constant MAX_MVSEARCH_STEPS
* Fold (reduce_first_step_size + speed > 5) into reduce_first_step_size
replacing uses of reduce_first_step_size that don't add the speed
check with zero.
Change-Id: Iae46395dbf3eaca138bf4d18b838a9e364b5a198
The y4m extension used is the same as the one used in ffmpeg/x264.
The patch is adapted from the highbitdepth branch.
Also adds unit tests for y4m header parsing and md5 check
of the raw frame data, as well as y4m writing.
[build fix for Mac/VS by not using tuples with strings]
Change-Id: I40897ee37d289e4b6cea6fedc67047d692b8cb46
vp9_rdopt is for making rd optimal mode decisions. vp9_rd is for all
other rd related routines. Anything used outside of making an rd optimal
decision belongs in rd.
Change-Id: I772a3073f7588bdf139f551fb9810b6864d8e64b
Moved the threshold adjustment before reference flag checking,
which could set the threshold to INT_MAX for disabled reference
frame, and cause overflow if the adjustment is done after that.
Change-Id: I85e94f8726d5e3ae93f65965aa978721dddc9957
Renamed updating_running_avg() to filter(). Extended function with the rest of
the filter procedure. Made all of the empirically-determined constants used in
VP8 into functions so they can be tweaked more easily.
Change-Id: I41730c8c92370c76885950a43742347477ca4e7e
As in VP8.
Currently, this parameter is set with the VP8E_SET_NOISE_SENSITIVITY flag.
The flag was not renamed so that we don't break the interface for webrtc. This
should probably be changed at some point in the future.
Change-Id: Ic73fcb0dde9d1d019e9d042050b617333ac65472
Add test code to turn multi-arf on and off depending
on group length and zero motion.
Changes to active max group length for mult-arf.
Fund second arf only from normal frame bits.
Change-Id: I920287fac1c886428c15a39f731a25d07c2b796c
This commit added a speed feature to control the step_param used in
full pixel motion search. The intention is to reduced the search
steps for high speed real time coding.
Change-Id: I21d2f0105c2b647783a6688615da7fcf2b6d670b
Adapt the use of segmentation in AQ mode 2 based on
the ambient kf/arf/gf Q.
Disable segmentation where the rate per SB is very
low and overheads are likely to outweigh the benefits.
This patch reduces the -ve average metrics impact
of AQ mode 2 while allowing stronger 3 segment AQ
in some cases. Average improvement ~0.5-1.0%.
Change-Id: I5892dfcc7507c5cc6444531cc7fe17554cf8d0c7
The y4m extension used is the same as the one used in ffmpeg/x264.
The patch is adapted from the highbitdepth branch.
Also adds unit tests for y4m header parsing and md5 check
of the raw frame data, as well as y4m writing.
Change-Id: Ie2794daf6dbafd2f128464f9b9da520fc54c0dd6
This commit re-designs the quantization process for transform
coefficient blocks of size 4x4 to 16x16. It improves compression
performance for speed 7 by 3.85%. The SSSE3 version for the
new quantization process is included.
The average runtime of the 8x8 block quantization is reduced
from 285 cycles -> 255 cycles, i.e., over 10% faster.
Change-Id: I61278aa02efc70599b962d3314671db5b0446a50
Add a conditional compile flag for this feature. Also add a
switch to enable the encoder to use these statistics in the
second pass. Currently, the switch is turned off.
Change-Id: Ia1c858c35ec90e36f19f5cffe156b97ddaa04922
The current threshold is knid of low, and in many cases NEWMV
mode is checked but not picked as the best mode. This patch
added a speed feature to increase NEWMV threshold, so that
less partition mode checking goes to check NEWMV. This feature
is enabled for speed 6 and 7.
Rtc set borg tests showed:
1. Speed 6, overall psnr: -0.088%, ssim: -1.339%;
Average speedup on rtc set is 11.1%.
2. Speed 7, overall psnr: -0.505%, ssim: -2.320%
Average speedup on rtc set is 12.9%.
Change-Id: I953b849eeb6e0d5a1f13eacba30c14204472c5be
pull the latest from WebP, which adds a worker interface abstraction
allowing an application to override init/reset/sync/launch/execute/end
this has the side effect of removing a harmless, but annoying, TSan
warning.
Original source:
http://git.chromium.org/webm/libwebp.git
100644 blob 08ad4e1fecba302bf1247645e84a7d2779956bc3 src/utils/thread.c
100644 blob 7bd451b124ae3b81596abfbcc823e3cb129d3a38 src/utils/thread.h
Local modifications:
- s/WebP/VP9/g
- camelcase functions -> lower with _'s
- associate '*' with the variable, not the type
Change-Id: I875ac5a74ed873cbcb19a3a100b5e0ca6fcd9aed
The caller should reset the state instead of letting worker
to reset.
This reverts commit 34b2ce15f9.
Change-Id: Idb546ea6386cffc44e98dee772900d21ab79710f
Encoder still uses SWITCHABLE as default via DEFAULT_INTERP_FILTER,
but does not override the default if it is not SWITCHABLE.
Change-Id: I3c0f6653bd228381a623a026c66599b0a87d01d5
When the frame is intra coded only, the encoder takes the RD
coding flow. Hence the function set_mode_info is not practically
in use. This commit removes it and the associated conditional
branches.
Change-Id: I1e42659ceb55b771ba712d1cdecacb446aa6460d
This fixes the hang in VP9/InvalidFileTest.ReturnCode/3
due to worker->had_error has not been reset after getting
error.
Change-Id: Ia3608225094758a2bd88f6ae4dd9dfd93bbaad27
For real time speed 7, once encode breakout is on(i.e. encoding
setting --static-thresh=1), a proper encode breakout threshold
is set to speed up the encoder.
Set --static-thresh=1, RTC set borg test showed a slight overall
psnr loss of 0.162%, but ssim gain of 0.287%. The average speedup
on RTC set is 6%, and for some clips, the speedup can be 10+%.
Change-Id: Id522d9ce779ff7c699936d13d0c47083de4afb85
Before encoding a frame, calculate and store each 16x16 block's
variance of source difference between last and current frame.
Find partitioning threshold T for the frame from its variance
histogram, and then use T to make partition decisions.
Comparing with fixed 16x16 partitioning, rtc set test showed an
overall psnr gain of 3.242%, and ssim gain of 3.751%. The best
psnr gain is 8.653%.
The overall encoding speed didn't change much. It got faster for
some clips(for example, 12% speedup for vidyo1), and a little
slower for others.
Also, a minor modification was made in datarate unit test.
Change-Id: Ie290743aa3814e83607b93831b667a2a49d0932c
the buffer is only used in encoding and only when
CONFIG_INTERNAL_STATS or CONFIG_VP9_POSTPROC is enabled.
a future change should decouple this from the frame buffer allocation
and make it conditional based on runtime flags when the above config
options are enabled.
reduces decode heap usage by at least 12%
Change-Id: Id0b97620d4936afefa538d3aadf32106743d9caf
This reverts commit b336356198.
This causes a hang in:
VP9/InvalidFileTest.ReturnCode/3
the change to test/user_priv_test.cc remains with a minor update
Change-Id: I4a8a272ca37ea329b0f413f0b1cd827a238bd9fd
This patch checks that a decoder never tries to reference frame that's
outside the range of 2x to 1/16th the size of this frame. Any attempt
to do so causes a failure.
Change-Id: I5c98fa7bb95ac4f29146f29dd92b62fe96164e4c
Bug introduced in I930dced169c9d53f8044d2754a04332138347409. If
svc.number_temporal_layers == 1 and svc.number_spatial_layers == 1, the system
attempt to do spatial SVC. It no longer does that.
Change-Id: Ie6b130a72b1eea40c547c9a64447e40695f811c5
For the primary arf in a group, if multiple arfs
are enabled and we were using arfs in the previous
group, then allow the second arf from the previous
group to be used as an additional reference.
Change-Id: Iaf41706a52f54ef21548026851cd77100d6aebda
This commit enables an adaptive transform size selection method
for speed -6. It uses largest transform size when the sse is more
than 4 times of variance, i.e., most energy is compacted in the
DC coefficient. Otherwise, use the default TX_8X8. It improves
the compression efficiency for rtc set of speed -6 by 0.8%, no
speed change observed.
Change-Id: Ie6ed1e728ff7bf88ebe940a60811361cdd19969c
This patch allows the encoder to skip the partition search for the
frame if it is an inter frame and only zero motion vectors have
been detected in the first pass. The partition size is directly
assigned according to the difference variance.
Borg tests show overall little performance changes in term of PSNR
(derf -0.027%, yt 0.152%, hd 0.078%, stdhd 0%). The worst case of
PSNR loss is -0.514% from yt. The best PSNR gain is 4.293% from yt.
The second pass encoding speedup for slideshow clips is 15%-40%.
Change-Id: I881f347d286553ee5594a9ea09ba1a61ac684045
This commit enables a fast reference motion vector search scheme.
It checks the nearest top and left neighboring blocks to decide the
most probable predicted motion vector. If it finds the two have
the same motion vectors, it then skip finding exterior range for
the second most probable motion vector, and correspondingly skips
the check for NEARMV.
The runtime of speed -5 goes down
pedestrian at 1080p 29377 ms -> 27783 ms
vidyo at 720p 11830 ms -> 10990 ms
i.e., 6%-8% speed-up.
For rtc set, the compression performance
goes down by about -1.3% for both speed -5 and -6.
Change-Id: I2a7794fa99734f739f8b30519ad4dfd511ab91a5
Bug introduced during multiple iterations on: I3831*
gf_group->arf_update_idx[] cannot currently be used
to select the arf buffer index if buffer flipping on overlays
is enabled (still currently the case when multi arf OFF).
Change-Id: I4ce9ea08f1dd03ac3ad8b3e27375a91ee1d964dc
This commit fixes the potential issue in the non-RD mode decision
flow that only checks part of the block to estimate the cost. It
was due to the use of fixed transform size, in replacing the
largest transform block size. This commit enables per transform
block cost estimation of the intra prediction mode in the non-RD
mode decision.
Change-Id: I14ff92065e193e3e731c2bbf7ec89db676f1e132
same reasoning as:
9f3a0db vp9_rtcd: correct avx2 references
these are all intrinsics, so don't depend on x86inc.asm
Change-Id: I915beaef318a28f64bfa5469e5efe90e4af5b827
This patch reverts the previous revert from Jim and also add a
variable user_priv in the FrameWorker to save the user_priv
passed from the application. In the decoder_get_frame function,
the user_priv will be binded with the img. This change is needed
or it will fail the unit test added here:
https://gerrit.chromium.org/gerrit/#/c/70610/
This reverts commit 9be46e4565.
Change-Id: I376d9a12ee196faffdf3c792b59e6137c56132c1
Cosmetic patch only in response to comments on
previous patches suggesting a couple of name changes
for consistency and clarity.
Change-Id: Ida3a359b0d5755345660d304a7697a3a3686b2a3
the max is 6. there are assumptions throughout the decode regarding
this; fixes a crash with a fuzzed bitstream
$ zzuf -s 5861 -r 0.01:0.05 -b 6- \
< vp90-2-00-quantizer-00.webm.ivf \
| dd of=invalid-vp90-2-00-quantizer-00.webm.ivf.s5861_r01-05_b6-.ivf \
bs=1 count=81883
Change-Id: I6af41bb34252e88bc156a4c27c80d505d45f5642
This commit replaces a few use cases of cpi->common with preset
variable cm, to avoid unnecessary pointer fetch in the non-RD
coding mode.
Change-Id: I4038f1c1a47373b8fd7bc5d69af61346103702f6
In real-time speed 6, no partition search is done. The inter
prediction results got from picking mode can be reused in the
following encoding process. A speed feature reuse_inter_pred_sby
is added to only enable the resue in speed 6.
This patch doesn't change encoding result. RTC set tests showed
that the encoding speed gain is 2% - 5%.
Change-Id: I3884780f64ef95dd8be10562926542528713b92c
There is a normative scaling range of (x1/2, x16)
for VP9. This patch fixes the maximum downscaling
tests that are applied in the convolve function.
The code used a maximum downscaling limit of x1/5
for historic reasons related to the scalable
coding work. Since the downsampling in this
application is non-normative it will revert to
using a separate non-normative scaler.
Change-Id: Ide80ed712cee82fe5cb3c55076ac428295a6019f
Add indirection to the section of buffer indices.
This is to help simplify things in the future if we
have other codec features that switch indices.
Limit the max GF interval for static sections to fit
the gf_group structures.
Change-Id: I38310daaf23fd906004c0e8ee3e99e15570f84cb