CLPF performance had degraded by about 0.5% over the past six months,
which isn't totally surprising since the codec is a moving target.
About half of that degradation comes from the improved 7 bit filter
coefficients. Therefore, CLPF needs to be retuned for the current
codec.
This patch makes two (normative) changes to the CLPF kernel:
* The clipping function was changed from clamp(x, -s, s) to
sign(x) * max(0, abs(x) - max(0, abs(x) - s +
(abs(x) >> (bitdepth - 3 - log2(s)))))
This adds a rampdown to 0 at -32 and 32 (for 8 bit, -128 & 128
for 10 bit, etc), so large differences are ignored.
* 8 taps instead of 6 taps:
1
4 3
13 31 -> 13 31
4 3
1
AWCY results: low delay high delay
PSNR: -0.40% -0.47%
PSNR HVS: 0.00% -0.11%
SSIM: -0.31% -0.39%
CIEDE 2000: -0.22% -0.31%
APSNR: -0.40% -0.48%
MS SSIM: 0.01% -0.12%
About 3/4 of the gains come from the new clipping function.
Change-Id: Idad9dc4004e71a9c7ec81ba62ebd12fb76fb044a
This was part of the old ans zero token handling. It has been replaced
by the new ec_multisymbol zero token handling.
Change-Id: I9c1fcb42ac0d214178cf4fbf8755ad68dcbbc11f
When the functions were added in
https://aomedia-review.googlesource.com/6545 they were not restricted to
x86_64 builds.
Fixes "undefined reference to
`aom_highbd_convolve8_add_src_sse2'" for --target=x86-linux-gcc
Also remove SSE2 specializations from
`aom_highbd_convolve8_add_src[_horiz/_vert]`, since those functions
don't actually have SSE2 versions (this was left in by accident
in the original patch).
Change-Id: I9f7d0c11b58b6f5a0e6a1fdaed0f92175bdeab34
assume __clang_major__==0 has the latest version of
_mm256_broadcastsi128_si256. fixes builds with custom clang toolchains.
cherry-picked from libvpx:
33aef48f2 vpx_subpixel_8t_intrin_avx2: tolerate unversioned clang
BUG=b/30970831
Change-Id: I90becd56278e4716bd46e2ba9d910af977e8dfa6
Separate the aom_read_cdf() functionality from aom_read_symbol() which
can optionally adapt the cdf when run with --enable-ec_adapt.
Change-Id: I5446d6402835dfcf68d3462a2bd8835704fe6603
Separate the aom_write_cdf() functionality from aom_write_symbol() which
can optionally adapt the cdf when run with --enable-ec_adapt.
Change-Id: Ibc58690eddb647d69f08d72f0f0712779aab11d1
VS compiling for 32 bit targets does not support vector types in
structs as arguments, which makes the v256 type of the intrinsics hard
to support, so optimizations for this target are disabled.
Change-Id: I675394cf1aed0cb18a48f21216470867031b30ce
Add aom_write_symbol_unscaled() and aom_read_symbol_unscaled() calls
for encoding and decoding symbols with non-dyadic CDFs, e.g. that
don't add up to 32768.
This currently only works with the DAALA_EC backend, but does support
AOM bit accounting.
Change-Id: Icb37500f1b051dd2e8893ff0920302ece1d6ccfd
The convolve filters generated by loop_wiener_filter_tile
are not compatible with some existing convolve implementations
(they can have coefficients >128, sums of (certain subsets of)
coefficients >128, etc.)
So we implement a new variant, which takes a filter with 128
subtracted from its central element and which adds an extra copy
of the source just before clipping to a pixel (reinstating the
128 we subtracted). This should be easy to adapt from the existing
convolve functions, and this patch includes SSE2 highbd and
SSSE3 lowbd implementations.
Change-Id: I0abf4c2915f0665c49d88fe450dbc77b783f69e1
Provide primitive modules for cb4x4 mode use. This resolves compiler
warnings when both high bit-depth and cb4x4 mode are turned on.
Change-Id: If6ecac50578b3e665b602419a0701c3e047ce623
This commit fixes the 2x2 d45 intra prediction. It avoids the use
of out-of-boundary position as reference. This resolves an enc/dec
mismatch issue in cb4x4 mode.
Change-Id: I93d01536a0c004190cc9fe3c724bf41364f6fdde
Includes:
Some cleanups/refactoring
Better buffer management.
Some preps for future chrominance restoration.
Change-Id: Ia264b8989b5f4a53c0764ed3e8258ddc212723fc
The aom_write_bit() was not calling buf_uabs_write_bit() while the
aom_read_bit() function was calling uabs_read_bit().
Change-Id: If98975341472988e8d809aa80a647d7a2531e21e
Calling aom_write_bit() and aom_read_bit() with --enable-daala_ec
would call aom_write() and aom_read() with probability 128 which
would ultimately call od_ec_enc_bits() and od_ec_dec_bits().
This refactors that code and makes the call explicit.
objective-1-fast:
master@2016-12-14T18:38:33Z -> daala_ec_bits@2016-12-14T18:36:22Z
PSNR | PSNR Cb | PSNR Cr | PSNR HVS | SSIM | MS SSIM | CIEDE 2000
0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000
Change-Id: Ib69e98734fadcdc8b89936b7b6fbd0574afc7e34
The RD_DEBUG experiment computes stats in the _record() functions which
then proxy calls through to the actual bit writer.
The aom_write_bit_record() should proxy calls through to aom_write_bit()
instead of aom_write() with probability 128.
Change-Id: I7617fad0f2c25dc05cf111c660a90068c3f4c513
- For all blocks with width >= 16.
- Add test_count to make the unit tests harder to pass.
- Speed testing on 1080p, 100 frames, 5 Mbps, CPU, i7-6700
User level time reduction:
baseline: 3.68%
baseline + ext-partition: 36.12%
Change-Id: I78c5d9ca216f0fd91f1a360dca2190b11fd54a08
1) Not every transform's internal signal is designed to fit in 16 bits.
2) If overflow happens in this function, it indicates that we need to
adjust the txfm's scaling. We shouldn't mute the overflow signal.
3) Saturation might be handy when all of our transform design are stable,
but I don't think we are at the stable point yet.
4) This will fix C/Trans16x16DCT.AccuracyCheck/1 failure in highbd mode.
Change-Id: I5ef5d130c22adb4b8c3b608ffcb0f2c99dc7523f
The final ANS state gets further compacted because aliasing the super
frame marker is not an issue.
Change-Id: I26208accb117a6748abb6f1c32c28fadbc48de09
This should have no effect on the bitstream format (see also no related
encoder change). This is like moving code from the top of the loop to
the bottom of the loop.
This change allows us to:
* Make sure we consume the final renormalization byte after the last
symbol in an ANS partition.
* Move back toward a single renormalization operation for some ANS modes
since we know the bounds of the state mutation algorithm that got us out
of the valid state range.
Change-Id: Ia80246fd0ed805aa61b913a362546b3f08e4d79c
Let aom_convolve8_### SIMD implementation support any block width.
Turn on SIMD optimization when interpolation filter types on two
directions are different.
This will reduce 30% of encoding time when dual_filter and ext_interp
both on.
Change-Id: I539dbb2737f01835034b7269656a15b2058fa3cc