VS compiling for 32 bit targets does not support vector types in
structs as arguments, which makes the v256 type of the intrinsics hard
to support, so optimizations for this target are disabled.
Change-Id: I675394cf1aed0cb18a48f21216470867031b30ce
The convolve filters generated by loop_wiener_filter_tile
are not compatible with some existing convolve implementations
(they can have coefficients >128, sums of (certain subsets of)
coefficients >128, etc.)
So we implement a new variant, which takes a filter with 128
subtracted from its central element and which adds an extra copy
of the source just before clipping to a pixel (reinstating the
128 we subtracted). This should be easy to adapt from the existing
convolve functions, and this patch includes SSE2 highbd and
SSSE3 lowbd implementations.
Change-Id: I0abf4c2915f0665c49d88fe450dbc77b783f69e1
Provide primitive modules for cb4x4 mode use. This resolves compiler
warnings when both high bit-depth and cb4x4 mode are turned on.
Change-Id: If6ecac50578b3e665b602419a0701c3e047ce623
- For all blocks with width >= 16.
- Add test_count to make the unit tests harder to pass.
- Speed testing on 1080p, 100 frames, 5 Mbps, CPU, i7-6700
User level time reduction:
baseline: 3.68%
baseline + ext-partition: 36.12%
Change-Id: I78c5d9ca216f0fd91f1a360dca2190b11fd54a08
This is added as part of ALT_INTRA experiment.
This uses interpolation between top row and estimated bottom row; as
well as left column and estimated right column to generate the
predicted block.The interpolation is done using a predefined weight
array.
Based on experiments, the currently chosen weight array was created
to represent a quadratic curve, but can be tuned further if needed.
Improvement from baseline on Derf set:
ALL Keyframes: 1.279%
Improvement from existing ALT_INTRA:
ALL Keyframes: 1.146%
Change-Id: I12637fa1b91bd836f1c59b27d6caee2004acbdd4
PVQ replaces the scalar quantizer and coefficient coding with a new
design originally developed in Daala. It currently depends on the
Daala entropy coder although it could be adapted to work with another
entropy coder if needed:
./configure --enable-experimental --enable-daala_ec --enable-pvq
The version of PVQ in this commit is adapted from the following
revision of Daala:
fb51c1ade6
More information about PVQ:
- https://people.xiph.org/~jm/daala/pvq_demo/
- https://jmvalin.ca/papers/spie_pvq.pdf
The following files are copied as-is from Daala with minimal
adaptations, therefore we disable clang-format on those files
to make it easier to synchronize the AV1 and Daala codebases in the future:
av1/common/generic_code.c
av1/common/generic_code.h
av1/common/laplace_tables.c
av1/common/partition.c
av1/common/partition.h
av1/common/pvq.c
av1/common/pvq.h
av1/common/state.c
av1/common/state.h
av1/common/zigzag.h
av1/common/zigzag16.c
av1/common/zigzag32.c
av1/common/zigzag4.c
av1/common/zigzag64.c
av1/common/zigzag8.c
av1/decoder/decint.h
av1/decoder/generic_decoder.c
av1/decoder/laplace_decoder.c
av1/decoder/pvq_decoder.c
av1/decoder/pvq_decoder.h
av1/encoder/daala_compat_enc.c
av1/encoder/encint.h
av1/encoder/generic_encoder.c
av1/encoder/laplace_encoder.c
av1/encoder/pvq_encoder.c
av1/encoder/pvq_encoder.h
Known issues:
- Lossless mode is not supported, '--lossless=1' will give the same result as
'--end-usage=q --cq-level=1'.
- High bit depth is not supported by PVQ.
Change-Id: I1ae0d6517b87f4c1ccea944b2e12dc906979f25e
To get ready for pulling AV1 to nextgenv2
Replace the experimental flag by MOTION_VAR. Rename major variables.
Change-Id: If6cf4f37b9319c46d8f90df551cc7295d66ca205
- av1_fht32x32 AVX2 function level time reduction ~89% compared to C.
- av1_fht32x32_avx2() on DCT_DCT improves 42.62% over aom_fdct32x32_avx2()
But function replacement must go with the corresponding inverse txfm.
- No obvious user level time reduction due to 32x32 TX_TYPE selection.
- Zero high 128b YMM to avoid AVX-SSE transition penalties
(fix 16x16 case).
- Added 32x32 AVX2 unit tests to verify bitexact.
- AVX2 optimization summary:
On CPU i7-6700, based on 16x16/32x32 fwd txfm optimization results:
C to AVX2: function level time reduction, ~86-89%.
SSE2 to AVX2: function level time reduction, ~51%.
Change-Id: Idd0cd8bf066a61c7117140ef15ab6c1f8eb4b036
Instead of having CLPF write to an entire new frame and
copy the result back into the original frame, make the
filter able to work in-place by keeping a buffer of size
frame_width*filter_block_size and delay the write-back
by one filter_block_size row.
This reduces the cycles spent in the filter to ~75%.
Change-Id: I78ca74380c45492daa8935d08d766851edb5fbc1
- Unit tests are added for AVX2 SIMD.
- Encoder speed improvement:
AV1 baseline and EXT_TX, three 1080p sequences at bitrate:
800 Kbps, 2 Mbps, 6 Mbps, on i7-6700 CPU, average
user level time reduction: 3.86%.
Change-Id: Ibbd7837ee3a831c6b1e4e471bf6c8d3fa3a19ff4
When the experiment is ON, we use Paeth predictor instead of TM
predictor.
For derf set, this gives about 0.09% improvement overall, and 0.55%
improvement if all frames are forced to be intra-only.
Also, if the EXT_INTRA experiment is also on, the improvement overall
is 0.056%, and improvement if all frames are forced to be intra-only is
0.465%.
Change-Id: Id74e107ede70a8d2107fa14fcb3f44b23a437274
Cherry-Picked the following commits:
0defd8f Changed "WebM" to "AOMedia" & "webm" to "aomedia"
54e6676 Replace "VPx" by "AVx"
5082a36 Change "Vpx" to "Avx"
7df44f1 Replace "Vp9" w/ "Av1"
967f722 Remove kVp9CodecId
828f30c Change "Vp8" to "AOM"
030b5ff AUTHORS regenerated
2524cae Add ref-mv experimental flag
016762b Change copyright notice to AOMedia form
81e5526 Replace vp9 w/ av1
9b94565 Add missing files
fa8ca9f Change "vp9" to "av1"
ec838b7 Convert "vp8" to "aom"
80edfa0 Change "VP9" to "AV1"
d1a11fb Change "vp8" to "aom"
7b58251 Point to WebM test data
dd1a5c8 Replace "VP8" with "AOM"
ff00fc0 Change "VPX" to "AOM"
01dee0b Change "vp10" to "av1" in source code
cebe6f0 Convert "vpx" to "aom"
17b0567 rename vp10*.mk to av1_*.mk
fe5f8a8 rename files vp10_* to av1_*
Change-Id: I6fc3d18eb11fc171e46140c836ad5339cf6c9419