This change has been imported from VP9 and
alters the nature and use of exhaustive motion search.
Firstly any exhaustive search is preceded by a normal step search.
The exhaustive search is only carried out if the distortion resulting
from the step search is above a threshold value.
Secondly the simple +/- 64 exhaustive search is replaced by a
multi stage mesh based search where each stage has a range
and step/interval size. Subsequent stages use the best position from
the previous stage as the center of the search but use a reduced range
and interval size.
For example:
stage 1: Range +/- 64 interval 4
stage 2: Range +/- 32 interval 2
stage 3: Range +/- 15 interval 1
This process, especially when it follows on from a normal step
search, has shown itself to be almost as effective as a full range
exhaustive search with step 1 but greatly lowers the computational
complexity such that it can be used in some cases for speeds 0-2.
This patch also removes a double exhaustive search for sub 8x8 blocks
which also contained a bug (the two searches used different distortion
metrics).
For best quality in my test animation sequence this patch has almost
no impact on quality but improves encode speed by more than 5X.
Restricted use in good quality speeds 0-2 yields significant quality gains
on the animation test of 0.2 - 0.5 db with only a small impact on encode
speed. On most natural video clips, however, where the step search
is performing well, the quality gain and speed impact are small.
Change-Id: Iac24152ae239f42a246f39ee5f00fe62d193cb98
Fix copied over from VP9 master to VP10 master.
Do not reset the alt ref active flag when overlaying the middle
arf(s) of a multi arf group.
Change-Id: I1b7392107e7c675640d5ee1624012f39cc374c58
The range_check is not used because the bit range
in fdct# is not correct. Since we are going to merge in a new version
of fdct# from nextgenv2, we won't fix the incorrect bit range now.
Change-Id: I54f27a6507f27bf475af302b4dbedc71c5385118
The old workaround "p = 0 ? 0 : p -1" is misleading.
?: happens before =
assigning back to p truncates to one byte.
Therefore it is equivalent to (p - 1) & 0xFF, but the check just exists
to work around a first pass bug, so let's make the work around more
clear.
https://bugs.chromium.org/p/webm/issues/detail?id=1089
Change-Id: I587c44dd61c1f3767543c0126376f881889935af
This reverts commit 7f56cb2978.
It causes uninitialized reads in the first pass setting up later cost tables.
Change-Id: I2df498df3f5c03eff359f79edf045aed0c618dc9
Remove delta index 254 from probability remapping and subexp coding.
Saves 1-bit when the delta index is 129.
Change-Id: I88aba565fc766b1769165be458d2efd3ce45817e
The old workaround "p = 0 ? 0 : p -1" is misleading.
?: happens before =
assigning back to p truncates to one byte.
Therefore it is equivalent to (p - 1) & 0xFF, but the check just exists
to work around a first pass bug, so let's make the work around more
clear.
https://code.google.com/p/webm/issues/detail?id=1089
Change-Id: Ia6dcc8922e1acbac0eeca23a4d564a355c489572
The custom LCG is based on the POSIX recommend constants for a 16-bit
rand(). This implementation uses less computation than typical standard
library procedures which have been extended for 32-bit support, is
guaranteed to be reentrant, and identical everywhere.
Change-Id: I3140bbd566f44ab820d131c584a5d4ec6134c5a0
Ref: http://pubs.opengroup.org/onlinepubs/9699919799/functions/rand.html
Add the row and column index to the argument list of unit functions
called by foreach_transformed_block wrapper. This avoids the
repeated internal parsing according to the block index.
Change-Id: Ie7508acdac0b498487564639bc5cc6378a8a0df7
This causes the output of find_ref_mvs() to always be unique or zero.
A nice side-effect of this is that it also causes the output of
find_ref_mvs_sub8x8() to be unique-or-zero, and it will not ignore
available candidate MVs under certain conditions.
See issue 1012.
Change-Id: If4792789cb7885dbc9db420001d95f9b91b63bfa
Added optimization of the 8 bit assembly quantizer routines. This makes
these functions up to 100% faster, depending on encoding parameters.
This patch maskes the encoder faster in both the high bitdepth and 8bit
configurations. In the high bitdepth configuration, it effects profile 0
only.
Based on my profiling using 1080p input the net gain is between 1-3% for
the 8 bit config, and around 2.5-4.5% for the high bitdepth config,
depending on target bitrate. The difference between the 8 bit and high
bitdepth configurations for the same encoder run is reduced by 1% in all
cases I have profiled.
Change-Id: I86714a6b7364da20cd468cd784247009663a5140
VP8E_UPD_ENTROPY, VP8E_UPD_REFERENCE and VP8E_USE_REFERENCE have been
deprecated since the initial public release
Change-Id: Ied16b441eec13434d85f1ab115d49ccaf5f2f7b0
Some more testing of this patch would probably be useful, but I
think the basics of it should work fine now.
See issue 1035.
Change-Id: I4a36d58f671c5391cb09d564581784a00ed26245
This experiment allows using full above/right edges for all transform
sizes whenever available (for d45/d63), and adds bottom/left edges for
d207.
See issue 1043.
Change-Id: I5cf7f345e783e8539bb6b6d2c9972fb1d6d0a78b
In VP9, the ref MV had to point to a block that itself fully resided
within the visible image, i.e. all borders of the image had to be
within the visible borders of the coded frame. This is somewhat
illogical, and had obscure side effects, e.g. clamping of fairly
reasonable motion vectors such as 0,0 were clipped to negative values
if the block was overhanging on frame edges (such as the last rows
on 1080p content), which makes no sense whatsoever.
Instead, relax clamping constraints such that the ref MVs are allowed
to point to blocks exactly outside the visible edges in both Y as well
as UV planes, including the 8tap filter edges (that's why the offset is
8 pixels + block size).
See issue 1037.
Change-Id: I2683eb2a18b24955e4dcce36c2940aa2ba3a1061
This has various benefits:
- simplify implementations because we don't have to switch between
multiple probability tables depending on frametype
- allows fw subexp and bw adaptivity for partitions/uvmode in keyframes
See issue 1040 point 5.
Change-Id: Ia566aa2863252d130cee9deedcf123bb2a0d3765
Locate them (code-wise) in frame_context, and have them be updated
as any other probability using the subexp forward and adaptive bw
updates.
See issue 1040 point 1.
TODOs:
- real-world default probabilities
- why is counts sometimes NULL in the decoder? Does that mean bw
adaptivity updates only work on some frames? (I haven't looked
very closely yet, maybe this is a red herring.)
Change-Id: I23b57b4e5e7574b75f16eb64823b29c22fbab42e
Account for rounding in distortion calculation in k-means;
carry out rounding before duplicates removal of base colors;
replace numbers with macros;
use prefix increment.
Slight coding gain (<0.1%) on screen_content testset.
Change-Id: Ie8bd241266da6b82c7b2874befc3a0c72b4fcd8c
This change (in a new config experiment: universal_hp) removes the
bitstream parsing dependency of the HP MV bit on the ref MV to be
coded. It also cleans up clearing of the HP bit in near/nearestMV,
since HP is always on if it's set in the frame header.
This admittedly doesn't clean up the crap that could be cleaned up,
but that's mostly because I think this needs some careful review;
not so much for coding style, but more from hardware people and from
the codec team on what we/you want. It would also be nice to get some
actual numbers on the real quality impact of this change. If, for
example, hardware people come up and tell us they don't actually care
anymore, we should probably just this code as-is and do nothing (i.e.
discard this patch).
See issue 1036.
Change-Id: Ic9b106f34422aa0f79de0c28125b72d566bd511a
This actually has no effect whatsoever, since the input MVs themselves
are clamped by clamp_mv_ref() already, which is significantly more
restrictive in its bounds.
Change-Id: I4a3a7b2b121ee422c56428c2a12d930c3813c06e
We only write EOSB tokens if we write tokens (i.e. not for skip blocks),
and we write EOSB tokens per-plane instead of per block.
Change-Id: I8d7ee99f8ec50eb7ae809f9f9282c1c91dbf6537
Add palette mode for keyframe luma channel. Palette mode is enabled
when using "--tune-content=screen" in encoding config parameters.
on screen_content testset: +6.89%
on derlr : +0.00%
Design doc (WIP):
https://goo.gl/lD4yJw
Change-Id: Ib368b216bfd3ea21c6c27436934ad87afdaa6f88
We have historically added new bits to cat6 whenever we added a new
transform size (or bitdepth, for that matter). However, we have
always coded these new bits regardless of the actual transform size,
which means that for smaller transforms, we code bits that cannot
possibly be set. The coding (quality) impact of this is negligible,
but the bigger issue is that this allows creating bitstreams with
coefficient values that are nonsensible and can cause int overflows,
which then de facto become part of the bitstream spec. By not coding
these bits, we remove this possibility.
See issue 1065.
Change-Id: Ib3186eca2df6a7a15ddc60c8b55af182aadd964d
This is identical to what the tile size does for the last tile. See
issue 1042 (which covers generalizing the superframe/tile concepts).
Change-Id: I1f187d2e3b984e424e3b6d79201b8723069e1a50
See issue 1051. 6 bits is fairly arbitrary but at least allows writing
delta Q values that are fairly normal in other codecs. I can extend to
8 if people want full range, although I personally don't have any need
for that.
Change-Id: I0a5a7c3d9b8eb3de4418430ab0e925d4a08cd7a0
This is more a proof of concept than anything else. The problem here
isn't so much how to code it, but rather where to place the resulting
code. All intrapred DSP code lives in vpx_dsp, so do we want the vp10
specific intra pred functions to live there, or in vp10/?
See issue 1015.
Change-Id: I675f7badcc8e18fd99a9553910ecf3ddf81f0a05
In the decoder, map this to the output variable vpx_image_t.r_w/h.
This is intended as an improved version of VP9D_GET_DISPLAY_SIZE,
which doesn't work with parallel frame decoding. In the encoder,
map this to a codec control func (VP9E_SET_RENDER_SIZE) that takes
a w/h pair argument in a int[2] (identical to VP9D_GET_DISPLAY_SIZE).
Also add render_size to the encoder_param_get_to_decoder unit test.
See issue 1030.
Change-Id: I12124c13602d832bf4c44090db08c1009c94c7e8
The name "display_*" (or "d_*") is used for non-compatible information
(that is, the cropped frame dimensions in pixels, as opposed to the
intended screen rendering surface size). Therefore, continuing to use
display_* would be confusing to end users. Instead, rename the field
to render_*, so that struct vpx_image can include it.
Change-Id: Iab8d2eae96492b71c4ea60c4bce8121cb2a1fe2d