This decoder control requires AV1 to be compiled with --enable-accounting.
Note that bit accounting data is only available after a frame has been
decoded.
Change-Id: I8a15213d9f2587638e0edb62932738e985160e03
Move computing the class0_fp_cdf and fp_cdf tables per coded mv
symbol to computing it only when the probabilities are updated.
Change-Id: Ib4957c8ab21e6189bcc3817a07b7681dfb343223
Move computing the class_cdf table per coded mv class symbol to
computing it only when the probabilities are updated.
Change-Id: I6c4a9075817e8ba2e251f0e82436995f08f2ec5c
Move computing the joint_cdf table per coded mv joint symbol to
computing it only when the probabilities are updated.
Change-Id: If5d195f70e6fad7b60f69606c8386ad5e69657d2
Move computing the inter_mode_cdf tables per coded inter mode symbol to
computing them only when the probabilities are updated.
Change-Id: I7a7b059ee75723cb6f278ed82a20cf34c27915d8
Move computing the uv_mode_cdf tables per coded intra mode symbol to
computing them only when the probabilities are updated.
Change-Id: I627b59d30726c913f5d7ba7753cb0446a12655bb
Move computing the y_mode_cdf tables per coded intra mode symbol to
computing them only when the probabilities are updated.
Change-Id: I8c43d09b8ef5febe2a3ec64bd51d28bd78ea73ed
Move computing the kf_y_mode_cdf tables per coded intra mode symbol to
computing them only when the probabilities are updated.
Change-Id: I5999447050c2f7d5dbccde80bee05ecd1c5440ab
This CL will facilitate adapt_scan experiment.
In adapt_scan experiment, dynamic scan order will be stored in
AV1_COMMON
Change-Id: I4763ea931b5e1af54d4f173971befeb01a4db335
* changes:
Palette: Use inverse_color_order to find color index faster.
Rewrite some loops to avoid -Wunsafe-loop-optimizations warnings.
Remove some useless casts
Add compiler warning flag -Wextra and fix related warnings.
Declare some array sizes to be constants (known at compile time).
Handle the sub8x8 chroma component at the unit of 2x2/4x2/2x4 level
and use the motion vector inherited from the luma component. This
improves the coding performance:
lowres 0.4%
midres 0.25%
hdres 0.15%
Change-Id: I34dff4218cfa3e5d55e7ed0341f36f4719389f7e
* changes:
Define SIMD_INLINE using AOM_FORCE_INLINE
AOM_FORCE_INLINE: fix always_inline attribute
Free memory allocated by daala_ec encoder.
Move clpf_sse4_1.c to clpf_sse4.c in agreement with convention
sync avg_test.cc with aom/master
av1_init_scan_order
initialize data structures related to adaptive scan order
av1_update_scan_prob
update nonzero probabilities from nonzero counts
av1_augment_prob
embed r + c and coeff_idx info with nonzero probabilities.
When sorting the nonzero probabilities, if there is a tie,
the coefficient with smaller r + c will be scanned first
av1_update_sort_order
apply quick sort on nonzero probabilities to obtain a sort order
av1_update_scan_order
apply topological sort on the nonzero probabilities sorting order to
guarantee each to-be-scanned coefficient's upper and left coefficient
will be scanned before the to-be-scanned coefficient.
av1_update_neighbors
For each coeff_idx in scan[], update its above and left neighbors in
neighbors[] accordingly.
Change-Id: I64c4938057daf8e30e48609a00ecc08d2e3062f4
Plus a small code clean up. The experiment of EXT_REFS, compared against
the baseline, using Overall PSNR, now obtains a gain on lowres as:
Avg: -5.818; BDRate: -5.653
Compared against the previous EXT_REFS results on lowres, a tiny gain is
obtained as:
Avg: -0.047, BDRate: -0.063
(1) 780952 Add encoder first pass support to bi-prediction in EXT_REFS
(2) f91498 Add pred prob handling for new references in EXT_REFS
(3) e91472 Add decoder support for bi-direct prediction in EXT_REFS
(4) 0dbac9 Add encoder support to new references in EXT_REFS
(5) ad70cc Remove hard-coded number for EXT_REFS
(6) 9c1e2f Add the use of new reference frames at encoder in EXT_REFS
(7) 6d4fde Add the experiment flag of EXT_REFS
Change-Id: I26f7ca45b9ede7579fdb9d0d6a1a91f4334599bd
- Use range check function to avoid DCT_DCT overflow.
We need to re-develop the column txfm side scaling/rounding. Now,
we prefer to maintain the current BDRate level.
- Encoder user level time reduction <1% owing to av1_fht32x32_avx2.
- Add MemCheck unit test and fdct32() unit test.
Change-Id: I1e67030f67bc637859798ebe2f6698afffb8531c
The (new) ans experiment replaces the bool coder with uABS bools. The
'rans' experiment adds multisymbol coding.
This matches the setup in aom/master.
Change-Id: Ida8372ccabf1e1e9afc45fe66362cda35a491222
* changes:
Fix warnings reported by -Wshadow: Part4: main directory
Fix warnings reported by -Wshadow: Part3: test/ directory
Fix warnings reported by -Wshadow: Part2b: more from av1 directory
Fix warnings reported by -Wshadow: Part2: av1 directory
Fix warnings reported by -Wshadow: Part1b: scan_order struct and variable
Fix warnings reported by -Wshadow: Part1: aom_dsp directory
Move STAT_TYPE enum to source file.
Code cleanup: mainly rd_pick_partition and methods called from there.
This patch adds bit account infrastructure to the bit reader API.
When configured with --enable-accounting, every bit reader API
function records the number of bits necessary to decoding a symbol.
Accounting symbol entries are collected in global accounting data
structure, that can be used to understand exactly where bits are
spent (http://aomanalyzer.org). The data structure is cleared and
reused each frame to reduce memory usage. When configured without
--enable-accounting, bit accounting does not incur any runtime
overhead.
All aom_read_xxx functions now have an additional string parameter
that specifies the symbol name. By default, the ACCT_STR macro is
used (which expands to __func__). For more precise accounting,
these should be replaced with more descriptive names.
Change-Id: Ia2e1343cb842c9391b12b77272587dfbe307a56d
Currently the RD loop traverses 4X8 blocks in inverted N order while
the bitstream stores blocks smaller than 8x8 in Z order. This causes a
discrepancy where the RD loop reads uninitialized data while
performing intra prediction. As a temporary fix simply disable the
use of the extended right edge for 4X8 blocks, until the bitstream can
be changed to match the logical structure of the blocks.
Change-Id: I44a9e4fc1a15cd551a7b38c3c1227bc5dac77e9a
While we are at it:
- Rename some variables to more meaningful names
- Reuse some common consts from a header instead of redefining them.
Cherry-picked from aomedia/master: 863b0499
Change-Id: Ida5de713156dc0126a27f90fdd36d29a398a3c88
- Change struct name to all caps SCAN_ORDER to be locally consistent.
- Rename struct pointers to 'scan_order' instead of hard to read short
names 'so' and 'sc'.
Cherry-picked from aomedia/master: 30abc082
Change-Id: Ib9f0eefe28fa97d23d642b77d7dc8e5f8613177d
- Const correctness
- Refactoring
- Make variables local when possible etc
- Remove -Wcast-qual to allow explicitly casting away const.
Cherry-picked from aomedia/master: c27fcccc
And then a number of more const correctness changes to make sure other
experiments build OK.
Change-Id: I77c18d99d21218fbdc9b186d7ed3792dc401a0a0
Move computing the segmentation_probs.tree_cdf table per symbol to
computing it only when the probabilities are updated.
Change-Id: I3826418094bbaca4ded87de5ff04d4b27c85e35a
This commit ports the following from aom/master:
4c46278 Add aom_reader_tell() support.
b9c9935 Remove an erroneous declaration.
56c9c3b Fix ANS build.
Change-Id: I59bd910f58c218c649a1de2a7b5fae0397e13cb1
Move computing the partition_cdf tables per symbol to
computing them only when the probabilities are updated.
Change-Id: I442f9230ba00be7f5d0558d7c38d7324ad009ee8
Move computing the inter_ext_tx_cdf tables per symbol to
computing them only when the probabilities are updated.
Change-Id: I5e1e62f8eae8f6b2edbbd378beeb786649502c10
Move computing the intra_ext_tx_cdf tables per symbol to
computing them only when the probabilities are updated.
Change-Id: I26d5e419e103093e98a7d896c196176305b50fc9
Move from computing the switchable_interp_cdf per symbol to
computing once per frame when the probabilities are adapted.
Change-Id: I6571126239f0327e22bb09ee8bad94114291683e
When building with --enable-daala_ec, calls to aom_write() and aom_read()
use the daala entropy coder to write and read bits.
When the probability is exactly 0.5 (128), then raw bits are used.
ntt-short-1:
MEDIUM (%) HIGH (%)
PSNR -0.027556 -0.020114
PSNRHVS -0.027401 -0.020169
SSIM -0.027587 -0.020151
FASTSSIM -0.027592 -0.020102
subset1:
RATE (%) DSNR (dB)
PSNR 0.03296 -0.00210
PSNRHVS 0.03537 -0.00281
SSIM 0.03299 -0.00161
FASTSSIM 0.03458 -0.00111
Change-Id: I48ad8eb40fc895d62d6e241ea8abc02820d573f7
This flag was already added to aomedia/master, so bringing it back to
webm/nextgenv2, as part of an effort to get the two codebases in sync.
Change-Id: I2b933a6a160e4210d1411a9e7978149eb8553205
This reverts commit 9b25f30674 to
reinstate the reverted commit with fixes that solved the build issues
when --enalbe-clpf is used in configure.
Change-Id: I15447cae7fa9b3deb27976345dc3db230a4a7a60
These signals were in the uncompressed frame header (as a temporary
hack), which caused two problems:
* We don't want that header to be duplicated in the slice header
* It was necessary to signal the number of bits to transmit up front
However, the filter size can be 128x128 which is greater than the SB
size, and a decoder wouldn't be able to know whether to read a bit or
not until the final SB of that 128x128 block has been decoded
(depending on whether the 128x128 is all skip or not). Therefore the
signalling was changed for 128x128 blocks so that every top left SB of
a 128x128 filter block contains a signal regardless of whether the
block is all skip or not. Also, all the MB's of 128x128 block are
filtered even if they are skip MB's. This gives the signal a purpose
even when the 128x128 block is all skip, and it also gives a slight
coding gain as it leaves a way to filter skip blocks, which was
previously forbidden.
Low latency:
PSNR YCbCr: -0.19% -0.14% -0.06%
PSNRHVS: -0.15%
SSIM: -0.13%
MSSSIM: -0.15%
CIEDE2000: -0.19%
High latency:
PSNR YCbCr: -0.03% -0.01% -0.09%
PSNRHVS: 0.04%
SSIM: 0.00%
MSSSIM: 0.02%
CIEDE2000: -0.02%
Change-Id: I69ba7144d07d388b4f0968f6a53558f480979171
To get ready for pulling AV1 to nextgenv2
Replace the experimental flag by MOTION_VAR. Rename major variables.
Change-Id: If6cf4f37b9319c46d8f90df551cc7295d66ca205
* changes:
Deringing cleanup: remove DERING_REFINEMENT (always on now)
Don't run the deringing filter on skipped blocks within a superblock
Don't dering skipped superblocks
On x86 use _mm_set_epi32 when _mm_cvtsi64_si128 isn't available
* changes:
Remove custom rans types
Remove add_token_no_extra.
Remove unused aom_rans_build_cdf_from_pdf
Add the tool used to generate the constrained tokenset.
Remove the starting zero from ANS CDFs.
Import the aom_read/write_symbol abstractions from aom/master
(cherry picked from aom/master commit 11206c60d9)
Includes renames in a bunch of places not handled by the original
due to differing tree states.
Change-Id: Ic74d9d8850b8c80a51e55e425bbf472a67e2653f
This brings it in line with the Daala CDFs and will make it easier to
share code.
Change-Id: Idfd2d2b33c3b9b2c4e72ce72fb3d8039013448b9
(cherry picked from aom/master commit af98507ca9)
Fix a bug in the C implementation of the ihalfright32
transform, in the case that its input and output buffers are the same.
This occurs when it is called by av1_iht32x16_512_add_c.
Change-Id: I61c652e2662178520c0639a2879ae128a9c7ec3f
- av1_fht32x32 AVX2 function level time reduction ~89% compared to C.
- av1_fht32x32_avx2() on DCT_DCT improves 42.62% over aom_fdct32x32_avx2()
But function replacement must go with the corresponding inverse txfm.
- No obvious user level time reduction due to 32x32 TX_TYPE selection.
- Zero high 128b YMM to avoid AVX-SSE transition penalties
(fix 16x16 case).
- Added 32x32 AVX2 unit tests to verify bitexact.
- AVX2 optimization summary:
On CPU i7-6700, based on 16x16/32x32 fwd txfm optimization results:
C to AVX2: function level time reduction, ~86-89%.
SSE2 to AVX2: function level time reduction, ~51%.
Change-Id: Idd0cd8bf066a61c7117140ef15ab6c1f8eb4b036
This can be re-added after aligning AOM's ANS with nextgenv2's ANS.
This partially reverts commit 3829cd2f2f.
Change-Id: I78afc587f1abfe33ffcd53b3262910cfae135534
* Move clipping tests from inside to outside loops
* Let sizex and sizey to clpf_block() be the clipped block size rather
than both just bs
* Make fallback tests to C more accurate
Change-Id: Icdc57540ce21b41a95403fdcc37988a4ebf546c7
When CLPF was extended to chroma, the chroma RDO accidentally
discarded the optimal block size found in the luma RDO.
PSNR YCbCr: -0.25% 0.05% 0.06%
PSNRHVS: -0.19%
SSIM: -0.36%
MSSSIM: -0.23%
Conflicts:
av1/common/clpf.c
Change-Id: Ie49cd30f9276a311ada88cb2f13d14757617f030
av1_clpf_frame() was always called with the same src and dst,
so we only need one argument and the code supporting different
src and dst was removed.
Change-Id: I70919f50e5cfb19c22eb4dff9ee7c0fa2697fad3
Instead of having CLPF write to an entire new frame and
copy the result back into the original frame, make the
filter able to work in-place by keeping a buffer of size
frame_width*filter_block_size and delay the write-back
by one filter_block_size row.
This reduces the cycles spent in the filter to ~75%.
Change-Id: I78ca74380c45492daa8935d08d766851edb5fbc1
- Unit tests are added for AVX2 SIMD.
- Encoder speed improvement:
AV1 baseline and EXT_TX, three 1080p sequences at bitrate:
800 Kbps, 2 Mbps, 6 Mbps, on i7-6700 CPU, average
user level time reduction: 3.86%.
Change-Id: Ibbd7837ee3a831c6b1e4e471bf6c8d3fa3a19ff4
This allows the hardware decoder to start decoding ref_mv_idx
syntax prior to the sorting stage and hide the latency of entropy
decoding. The compression performance change is about 0.01% level.
Change-Id: I86b34f31f6c99a36ae2780416175cc0bd90ff492
This is a similar change to following aom CL
https://aomedia-review.googlesource.com/#/c/1961/
Move SIMD related functions from filter.c/h to following files
av1_convolve_ssse3.c
av1_highbd_convolve_filters_sse4.c
Change following c files to header files.
av1_highbd_convolve_filters_sse4.c
av1_convolve_filters_ssse3.c
Change-Id: I41a3cc6b0789e632451aeda82f5eb97a4d78e370
Refactor to streamline the number of profiles needed, in
preparation for the next steps.
NO change in performance.
Change-Id: I753b89299897857f3c250c316b4cdc4fedcb90e8
When the block has width/height above or equal to 64, use 16x16
block search step for reference motion vector search in the non-
immediate rows and columns.
Change-Id: If11ce97a9328b879f30ef87115086aa0cd985a2f
Disable rect_tx because we only support 4x4 Walsh-Hadamard transform
in lossless mode.
Fixes failure in ./test_libaom --gtest_filter=*Large*ScreencastQ0/1
Configuration: --enable-experimental --enable-var-tx --enable-rect-tx
--enable-ref-mv --enable-ext_intra --enable-ext_tx --enable-debug
--disable-optimizations
Change-Id: Ib6b3494c7dcf7182f1cab9b138388d054851a23d
Also adds hooks to choose different profiles for UV and intra.
Results
lowres: -0.15%
midres: -0.24%
Change-Id: I4af8bc3e9b82b6f8a061dce9f52c89afa6239ae1
Includes a major refactoring/enhancement to support
tile-adaptive switchable restoration. The framework can be
readily extended to add more restoration schemes in the
future. Also includes various cleanups and fixes.
Specifically the framework allows restoration to be conducted
on tiles such that each tile can be either left unrestored, or
use bilateral or wiener filtering.
There is a modest improvemnt in coding efficiency (0.1 - 0.2%).
Further enhancements will be added subsequently to improve coding
efficiency and complexity.
Change-Id: I5ebedb04785ce1ef6f324abe209e925c2d6cbe8a
this matches style guidelines and stabilizes successive runs of
clang-format across the tree. remaining types should be address in
successive commits.
Change-Id: I6ad3f69cf0a22cb9a9b895b272195f891f71170f
Some minor adjustments to tile size and bilateral filters.
About 0.1% improvement for midres and hdres, very small change for
lowres.
Change-Id: Ia94f68a926867dfd67da1a8795fd8de0ddd8e2d6
This allows for a clean subtraction of 1 along the transform
matrix diagonal and also makes the order of the parameter list
a little more intuitive.
Change-Id: I6a5d754af41b8d1292f241f9b21473160517d24f
Bitstream syntax:
For a rectangular inter block, 'rect_tx' flag is sent to indicate if
the biggest rect tx is used. If no, continue to decode regular
recursive tx partition.
Change-Id: I127e35cc619b65acb5e9a0717f399cdcdb73fbf0
Uses an array to map block sizes, y tx sizes, and subsampling
factors to various transform sizes for UV.
Results improve by 0.1-0.2%
Change-Id: Icb58fd96bc7c01a72cbf1332fe2be4d55a0feedc
- Localize static lookup tables in the sole functions that use them.
- Remove dead high bit-depth IDST functions.
- Apply clang-format
Change-Id: Ibbd7db4259f9ea64d695b2f13f5c118aac8f1cf9
This patch completes the global motion experiment
implementation. It modifies the format of the motion
parameters to use the mv union to facilitate faster
copying and checks for parameters equal to 0 that occur
frequently in rdopt. The rd decisions for the global motion experiment
have also been added to rdopt.
Change-Id: Idfb9f0c6d23e538221763881099c5a2a3891f5a9
03394bd Remove dead code from av1_dering_search.
337b23a Changing the weights of the first CRF filter in deringing
Change-Id: I1216c146dc3f72f24ceec3d3c65c4dd6cd73623e
When the experiment is ON, we use Paeth predictor instead of TM
predictor.
For derf set, this gives about 0.09% improvement overall, and 0.55%
improvement if all frames are forced to be intra-only.
Also, if the EXT_INTRA experiment is also on, the improvement overall
is 0.056%, and improvement if all frames are forced to be intra-only is
0.465%.
Change-Id: Id74e107ede70a8d2107fa14fcb3f44b23a437274
Cherry-Picked the following commits:
0defd8f Changed "WebM" to "AOMedia" & "webm" to "aomedia"
54e6676 Replace "VPx" by "AVx"
5082a36 Change "Vpx" to "Avx"
7df44f1 Replace "Vp9" w/ "Av1"
967f722 Remove kVp9CodecId
828f30c Change "Vp8" to "AOM"
030b5ff AUTHORS regenerated
2524cae Add ref-mv experimental flag
016762b Change copyright notice to AOMedia form
81e5526 Replace vp9 w/ av1
9b94565 Add missing files
fa8ca9f Change "vp9" to "av1"
ec838b7 Convert "vp8" to "aom"
80edfa0 Change "VP9" to "AV1"
d1a11fb Change "vp8" to "aom"
7b58251 Point to WebM test data
dd1a5c8 Replace "VP8" with "AOM"
ff00fc0 Change "VPX" to "AOM"
01dee0b Change "vp10" to "av1" in source code
cebe6f0 Convert "vpx" to "aom"
17b0567 rename vp10*.mk to av1_*.mk
fe5f8a8 rename files vp10_* to av1_*
Change-Id: I6fc3d18eb11fc171e46140c836ad5339cf6c9419