The convolve filters generated by loop_wiener_filter_tile
are not compatible with some existing convolve implementations
(they can have coefficients >128, sums of (certain subsets of)
coefficients >128, etc.)
So we implement a new variant, which takes a filter with 128
subtracted from its central element and which adds an extra copy
of the source just before clipping to a pixel (reinstating the
128 we subtracted). This should be easy to adapt from the existing
convolve functions, and this patch includes SSE2 highbd and
SSSE3 lowbd implementations.
Change-Id: I0abf4c2915f0665c49d88fe450dbc77b783f69e1
Provide primitive modules for cb4x4 mode use. This resolves compiler
warnings when both high bit-depth and cb4x4 mode are turned on.
Change-Id: If6ecac50578b3e665b602419a0701c3e047ce623
This commit fixes the 2x2 d45 intra prediction. It avoids the use
of out-of-boundary position as reference. This resolves an enc/dec
mismatch issue in cb4x4 mode.
Change-Id: I93d01536a0c004190cc9fe3c724bf41364f6fdde
Includes:
Some cleanups/refactoring
Better buffer management.
Some preps for future chrominance restoration.
Change-Id: Ia264b8989b5f4a53c0764ed3e8258ddc212723fc
The aom_write_bit() was not calling buf_uabs_write_bit() while the
aom_read_bit() function was calling uabs_read_bit().
Change-Id: If98975341472988e8d809aa80a647d7a2531e21e
Calling aom_write_bit() and aom_read_bit() with --enable-daala_ec
would call aom_write() and aom_read() with probability 128 which
would ultimately call od_ec_enc_bits() and od_ec_dec_bits().
This refactors that code and makes the call explicit.
objective-1-fast:
master@2016-12-14T18:38:33Z -> daala_ec_bits@2016-12-14T18:36:22Z
PSNR | PSNR Cb | PSNR Cr | PSNR HVS | SSIM | MS SSIM | CIEDE 2000
0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000
Change-Id: Ib69e98734fadcdc8b89936b7b6fbd0574afc7e34
The RD_DEBUG experiment computes stats in the _record() functions which
then proxy calls through to the actual bit writer.
The aom_write_bit_record() should proxy calls through to aom_write_bit()
instead of aom_write() with probability 128.
Change-Id: I7617fad0f2c25dc05cf111c660a90068c3f4c513
- For all blocks with width >= 16.
- Add test_count to make the unit tests harder to pass.
- Speed testing on 1080p, 100 frames, 5 Mbps, CPU, i7-6700
User level time reduction:
baseline: 3.68%
baseline + ext-partition: 36.12%
Change-Id: I78c5d9ca216f0fd91f1a360dca2190b11fd54a08
1) Not every transform's internal signal is designed to fit in 16 bits.
2) If overflow happens in this function, it indicates that we need to
adjust the txfm's scaling. We shouldn't mute the overflow signal.
3) Saturation might be handy when all of our transform design are stable,
but I don't think we are at the stable point yet.
4) This will fix C/Trans16x16DCT.AccuracyCheck/1 failure in highbd mode.
Change-Id: I5ef5d130c22adb4b8c3b608ffcb0f2c99dc7523f
The final ANS state gets further compacted because aliasing the super
frame marker is not an issue.
Change-Id: I26208accb117a6748abb6f1c32c28fadbc48de09
This should have no effect on the bitstream format (see also no related
encoder change). This is like moving code from the top of the loop to
the bottom of the loop.
This change allows us to:
* Make sure we consume the final renormalization byte after the last
symbol in an ANS partition.
* Move back toward a single renormalization operation for some ANS modes
since we know the bounds of the state mutation algorithm that got us out
of the valid state range.
Change-Id: Ia80246fd0ed805aa61b913a362546b3f08e4d79c
Let aom_convolve8_### SIMD implementation support any block width.
Turn on SIMD optimization when interpolation filter types on two
directions are different.
This will reduce 30% of encoding time when dual_filter and ext_interp
both on.
Change-Id: I539dbb2737f01835034b7269656a15b2058fa3cc
The new prefixes are
0: 15 bits of state are added to the base state.
10: 22 bits of state are added to the base state.
110: Reserved for super frame marker
111: 29 bits of state are added to the base state.
The likelihood of any final state is proportional to 1 / state. Given a
state range of [2**15, 2 **23) this should save on average 0.4 bits
per serialized final state.
BDRATE
subset1: -.000%
lowres: -.010%
Change-Id: I8e66e4a6667f5692c541083e6d6edc35ff411181
This is added as part of ALT_INTRA experiment.
This uses interpolation between top row and estimated bottom row; as
well as left column and estimated right column to generate the
predicted block.The interpolation is done using a predefined weight
array.
Based on experiments, the currently chosen weight array was created
to represent a quadratic curve, but can be tuned further if needed.
Improvement from baseline on Derf set:
ALL Keyframes: 1.279%
Improvement from existing ALT_INTRA:
ALL Keyframes: 1.146%
Change-Id: I12637fa1b91bd836f1c59b27d6caee2004acbdd4
This commit revert the previous fix of the ubsan warning for unsigned
int overflow, and use a better fix by moving the offs-- inside the
while loop to avoid "0-1" situation.
Change-Id: Id4a3e03859ebcdf264df0808412b30841028f87c
In https://aomedia-review.googlesource.com/#/c/5864/ , some
code to stop the decoder at a preselected point was left enabled.
This code should only be uncommented when debugging, so comment
it out by default.
Change-Id: Ie168e8a1588ba92971e3ff1a056f597a7dfca136
av1_txfm.h: left shift of a negative number
av1/encoder/quantize.c: unsigned int overflow
aom_dsp/entenc.c: unsigned int overflow
Change-Id: I6143e68f7d6e2621f97900808c8ef7ee0ad0c814
In several places bits are overwritten in the bitstream. These
functions avoid zeroing bytes during writing so that this can
happen correctly when the number of bits is not 8*N.
Re-addresses the attempted fix in
133c57c331
which broke threaded encoding tests, which relied on re-using
byte buffers.
Change-Id: I682c5e3a7869eac7ad475584db8bf170d47a56c9
This reverts commit 133c57c331,
which appears to cause test failures in
AV1/AVxEncoderThreadTest.EncoderResultTest/*
Change-Id: I200b7a135ed65dc2c3f23b23b8c3dbf0872715fa
This commit reinstates portion of a reverted commit to fix warnings
and errors with MSVC2013 build.
Change-Id: Ibb5fd665db6d8c897a657e5994547a1f82e3f188
The rans experiment is dead. The ans experiment with the ec_multisymbol
experiment also turned on takes its place.
Change-Id: Ie9f30ec7cf73aae6b2ea580a7b1f208485a8a7a7
PVQ replaces the scalar quantizer and coefficient coding with a new
design originally developed in Daala. It currently depends on the
Daala entropy coder although it could be adapted to work with another
entropy coder if needed:
./configure --enable-experimental --enable-daala_ec --enable-pvq
The version of PVQ in this commit is adapted from the following
revision of Daala:
fb51c1ade6
More information about PVQ:
- https://people.xiph.org/~jm/daala/pvq_demo/
- https://jmvalin.ca/papers/spie_pvq.pdf
The following files are copied as-is from Daala with minimal
adaptations, therefore we disable clang-format on those files
to make it easier to synchronize the AV1 and Daala codebases in the future:
av1/common/generic_code.c
av1/common/generic_code.h
av1/common/laplace_tables.c
av1/common/partition.c
av1/common/partition.h
av1/common/pvq.c
av1/common/pvq.h
av1/common/state.c
av1/common/state.h
av1/common/zigzag.h
av1/common/zigzag16.c
av1/common/zigzag32.c
av1/common/zigzag4.c
av1/common/zigzag64.c
av1/common/zigzag8.c
av1/decoder/decint.h
av1/decoder/generic_decoder.c
av1/decoder/laplace_decoder.c
av1/decoder/pvq_decoder.c
av1/decoder/pvq_decoder.h
av1/encoder/daala_compat_enc.c
av1/encoder/encint.h
av1/encoder/generic_encoder.c
av1/encoder/laplace_encoder.c
av1/encoder/pvq_encoder.c
av1/encoder/pvq_encoder.h
Known issues:
- Lossless mode is not supported, '--lossless=1' will give the same result as
'--end-usage=q --cq-level=1'.
- High bit depth is not supported by PVQ.
Change-Id: I1ae0d6517b87f4c1ccea944b2e12dc906979f25e
This is just partial implementation
Compare token cost of pack_mb_tokens/pack_txb_tokens with token cost
from rate-distortion loop. If there is any difference, dump out mode
info.
Change-Id: I46b373ee2522c5047f799f36baf7cec5fbc06f06