- Add unit tests to verify the bit-exact result.
- User level time reduction (EXT_TX):
encoder: 3.63%
decoder: 2.36%
- Also add tx_type=V_DCT...H_FLIPADST SSE2 for 16x16 inv txfm.
Change-Id: Idc6d9e8254aa536e5f18a87fa0d37c6bd551c083
The EC_ADAPT experiment cannot work unless EC_MULTISYMBOL is also
enabled.
This patch replaces all individual checks with a centralized check in
both the bitreader.h and bitwriter.h.
Change-Id: I418852d95c5012cc074ed65cd24997e08bc2aadd
The new ec_multisymbol experiment supersedes the rans experiment and is
used for multisymbol features that can be backed by either daala_ec or
rans.
This experiment is automatically enabled by ec_adapt and will try to
enable daala_ec or ans (in that order).
Change-Id: Ie75b4002b7a9d7f5f7b4d130c1aacb3dbe97e54f
This experiment performs symbol-by-symbol statistics
adaptation for non-binary symbols. It requires DAALA_EC or
RANS and ANS to be enabled. The adaptation is currently
based on a simple recursive filter and is taken from
Daala. It has an adaptation rate dependent on alphabet size,
taken from Daala. It applies wherever non-binary symbols
are encoded using Cumulative Probability Functions rather
than trees.
Where symbols are adapted, forward updates in the compressed
header are removed.
In the case of RANS coefficient token values are adapted,
with the exception of the zero token which remains a
binary symbol. In the case of DAALA_EC other values
such as inter and intra modes are adapted as CDFs are
provided in those cases.
The experiment is configured with:
./configure --enable-experimental --enable-daala-ec --enable-ec-adapt
or
./configure --enable-experimental --enable-ans --enable-rans \
--enable-ec-adapt
EC_ADAPT is not currently compatible with tiles.
BDR results on Objective-1-fast give a small loss:
PSNR YCbCr: 0.51% 0.49% 0.48%
PSNRHVS: 0.50%
SSIM: 0.50%
MSSSIM: 0.51%
CIEDE2000: 0.50%
Change-Id: I3888718e42616f3fd87144de7f125228446ac984
- Change FDCT32x32_2D_AVX2 output parameter to tran_low_t.
- Add unit tests for CONFIG_AOM_HIGHBITDEPTH=1.
- Update TODO notes.
BUG=webm:1323
Change-Id: If4766c919a24231fce886de74658b6dd7a011246
Cherry-pick Daala b5020bee:
Remove redundant test in od_ec_decode_bool_q15().
Using a test that decodes 100M random binary symbols, making this change
produced a speed up of 8.81% with gcc-4.9.3 and 3.71% with clang-3.7.1,
both compiled with -O2.
Change-Id: If6d0077a56121a575ae53bcd4d1d9b7d800a317d
* changes:
Palette: Use inverse_color_order to find color index faster.
Rewrite some loops to avoid -Wunsafe-loop-optimizations warnings.
Remove some useless casts
Add compiler warning flag -Wextra and fix related warnings.
Declare some array sizes to be constants (known at compile time).
Note: some of these warnings are enabled by a combination of -Wunused
(added earlier) and -Wextra.
Cherry-picked from aomedia/master: 4790a69
Change-Id: I322a1366bd4fd6c0dec9e758c2d5e88e003b1cbf
* changes:
Define SIMD_INLINE using AOM_FORCE_INLINE
AOM_FORCE_INLINE: fix always_inline attribute
Free memory allocated by daala_ec encoder.
Move clpf_sse4_1.c to clpf_sse4.c in agreement with convention
sync avg_test.cc with aom/master
Free the two memory buffers allocated by the daala_ec encoder when
calling od_ec_enc_clear() from aom_daala_stop_encode().
Change-Id: If20e86374ea29e51ee59111012905e56039dd4cc
The (new) ans experiment replaces the bool coder with uABS bools. The
'rans' experiment adds multisymbol coding.
This matches the setup in aom/master.
Change-Id: Ida8372ccabf1e1e9afc45fe66362cda35a491222
* changes:
Fix warnings reported by -Wshadow: Part4: main directory
Fix warnings reported by -Wshadow: Part3: test/ directory
Fix warnings reported by -Wshadow: Part2b: more from av1 directory
Fix warnings reported by -Wshadow: Part2: av1 directory
Fix warnings reported by -Wshadow: Part1b: scan_order struct and variable
Fix warnings reported by -Wshadow: Part1: aom_dsp directory
Move STAT_TYPE enum to source file.
Code cleanup: mainly rd_pick_partition and methods called from there.
The bit accounting functions aom_reader_tell() and aom_reader_tell_frac()
return the number of bits and 1/8th bits respectively.
This patch changes the return type from ptrdiff_t which is signed to
uint32_t which is unsigned.
The size_t type is not used since we only care about the number of bits
or 1/8 bits per entropy coder context and we don't expect to code more
than 512 megabits per tile.
Change-Id: I84a119d1f52829dcbdb66a92656eacca06e42b11
This patch adds bit account infrastructure to the bit reader API.
When configured with --enable-accounting, every bit reader API
function records the number of bits necessary to decoding a symbol.
Accounting symbol entries are collected in global accounting data
structure, that can be used to understand exactly where bits are
spent (http://aomanalyzer.org). The data structure is cleared and
reused each frame to reduce memory usage. When configured without
--enable-accounting, bit accounting does not incur any runtime
overhead.
All aom_read_xxx functions now have an additional string parameter
that specifies the symbol name. By default, the ACCT_STR macro is
used (which expands to __func__). For more precise accounting,
these should be replaced with more descriptive names.
Change-Id: Ia2e1343cb842c9391b12b77272587dfbe307a56d
While we are at it:
- Rename some variables to more meaningful names
- Reuse some common consts from a header instead of redefining them.
Cherry-picked from aomedia/master: 09eea2193
Change-Id: I61030e773137ae107d3bd43556c0d5bb26f9dbf8
This commit ports the following from aom/master:
4c46278 Add aom_reader_tell() support.
b9c9935 Remove an erroneous declaration.
56c9c3b Fix ANS build.
Change-Id: I59bd910f58c218c649a1de2a7b5fae0397e13cb1
Cherry-pick Daala 211c2a41: Clean up EC tell() and tell_frac() functions.
Add a const qualifier to the od_ec_enc and od_ec_dec parameters of
the od_ec_enc_tell(), od_ec_enc_tell_frac(), od_ec_dec_tell(), and
od_ec_dec_tell_frac() functions.
Add an OD_WARN_UNUSED_RESULT to od_ec_enc_tell_frac().
Change-Id: Ia50e2fd75e98d8a03d993449d658b695cf56e6fb
The formatting of OD_UNIFORM_CDFS_Q15[] in entcode.c is helpful for
for understanding what is contained in the array (e.g., the uniform
probability distributions of small sizes 2 through 16).
This patch reverts the change made in f4b2926d and adds linter hints to
ignore the formatting.
Change-Id: I2ad9fe6673b86e6067cb97b40f0f0e69a119cdf5
* changes:
Add missing CONFIG_DAALA_EC declaration.
Add API for writing trees using a CDF.
Add macro to build a simple cdf table.
Use Daala entropy coder to code trees.
Silence clang-format code review warning.
Use Daala entropy coder to code bits.
Clear existing format issue in the codebase
Add Daala entropy coder.
Move the av1_indices_from_tree() function from av1/encoder/treewriter.c
to aom_dsp/prob.c so that it can be used by both the encoder and
the decoder.
Change-Id: Ie43c599f425c3503b1ff93f0c77b5033a05b1bb4
Without first including ./aom_config.h in aom_dsp/prob.c the memmove
function is implicitly defined and causes a compiler warning.
Change-Id: I339d0389f10324a1085aba7d6492b2159a14da92
Added aom_write_tree_cdf() and aom_read_tree_cdf() function calls to
bitwriter.h and bitreader.h respectively.
These calls take a multisymbol CDF and an index and directly encode the
symbol using the enabled entropy coder.
Currently only the daala entropy encoder supports this (enabled with
--enable-daala_ec) and a compile error is thrown otherwise.
Change-Id: I2fa1e87af4352c94384e0cfdbfd170ac99cf3705
Add the av1_tree_to_cdf() macro which takes a aom_tree_index tree and
associated aom_prob probabilities and constructs a daala uint16_t cdf.
The av1_tree_to_cdf_1D() and av1_tree_to_cdf_2D() apply av1_tree_to_cdf()
across 1D and 2D arrays respectively.
Change-Id: If79fa5ae034263f279d7d0842493570885272fb2
When building with --enable-daala_ec, calls to aom_write_tree() and
aom_read_tree() will convert a aom_tree_index structure with associated
aom_prob probabilities into a CDF on the fly for use with the
od_ec_encode_cdf_q15().
The number of symbols in the CDF is capped at 16, and trees that contain
more than 16 leaf nodes are handled by splitting the most likely, e.g.,
highest probability symbols, first and coding multiple symbols if
necessary.
ntt-short-1:
MEDIUM (%) HIGH (%)
PSNR 0.000227 0.000213
PSNRHVS 0.000215 0.000205
SSIM 0.000229 0.000209
FASTSSIM 0.000229 0.000214
subset1:
RATE (%) DSNR (dB)
PSNR -0.00026 0.00002
PSNRHVS -0.00026 0.00002
SSIM -0.00026 0.00001
FASTSSIM -0.00026 0.00001
Change-Id: Icb1a8cb854fd81fdd88fbe4bc6761c7eb4757dfe
When building with --enable-daala_ec, calls to aom_write() and aom_read()
use the daala entropy coder to write and read bits.
When the probability is exactly 0.5 (128), then raw bits are used.
ntt-short-1:
MEDIUM (%) HIGH (%)
PSNR -0.027556 -0.020114
PSNRHVS -0.027401 -0.020169
SSIM -0.027587 -0.020151
FASTSSIM -0.027592 -0.020102
subset1:
RATE (%) DSNR (dB)
PSNR 0.03296 -0.00210
PSNRHVS 0.03537 -0.00281
SSIM 0.03299 -0.00161
FASTSSIM 0.03458 -0.00111
Change-Id: I48ad8eb40fc895d62d6e241ea8abc02820d573f7
The subtrahend is small enough to fit into uint32_t.
Change-Id: Ic4d7128aaa665eaf6b25d562610ba8942c46137f
(cherry picked from commit c0241664aac3a1805db9bd8e09e071ac326531e0)
To get ready for pulling AV1 to nextgenv2
Replace the experimental flag by MOTION_VAR. Rename major variables.
Change-Id: If6cf4f37b9319c46d8f90df551cc7295d66ca205
* changes:
Deringing cleanup: remove DERING_REFINEMENT (always on now)
Don't run the deringing filter on skipped blocks within a superblock
Don't dering skipped superblocks
On x86 use _mm_set_epi32 when _mm_cvtsi64_si128 isn't available
* changes:
Remove custom rans types
Remove add_token_no_extra.
Remove unused aom_rans_build_cdf_from_pdf
Add the tool used to generate the constrained tokenset.
Remove the starting zero from ANS CDFs.
Import the aom_read/write_symbol abstractions from aom/master
(cherry picked from aom/master commit 11206c60d9)
Includes renames in a bunch of places not handled by the original
due to differing tree states.
Change-Id: Ic74d9d8850b8c80a51e55e425bbf472a67e2653f
This brings it in line with the Daala CDFs and will make it easier to
share code.
Change-Id: Idfd2d2b33c3b9b2c4e72ce72fb3d8039013448b9
(cherry picked from aom/master commit af98507ca9)
- av1_fht32x32 AVX2 function level time reduction ~89% compared to C.
- av1_fht32x32_avx2() on DCT_DCT improves 42.62% over aom_fdct32x32_avx2()
But function replacement must go with the corresponding inverse txfm.
- No obvious user level time reduction due to 32x32 TX_TYPE selection.
- Zero high 128b YMM to avoid AVX-SSE transition penalties
(fix 16x16 case).
- Added 32x32 AVX2 unit tests to verify bitexact.
- AVX2 optimization summary:
On CPU i7-6700, based on 16x16/32x32 fwd txfm optimization results:
C to AVX2: function level time reduction, ~86-89%.
SSE2 to AVX2: function level time reduction, ~51%.
Change-Id: Idd0cd8bf066a61c7117140ef15ab6c1f8eb4b036
Rename av1_write_tree() to aom_write_tree() and move it into bitwriter.h
to match aom_read_tree() in bitreader.h.
Manually cherry-picked from aom/master:
33a143fa7a
Change-Id: I6c686cdd3e0f179d7e95c5bc6984558b62d46d67