Граф коммитов

194 Коммитов

Автор SHA1 Сообщение Дата
David Barker be6cc07d82 Add new convolve variant for loop-restoration
The convolve filters generated by loop_wiener_filter_tile
are not compatible with some existing convolve implementations
(they can have coefficients >128, sums of (certain subsets of)
coefficients >128, etc.)

So we implement a new variant, which takes a filter with 128
subtracted from its central element and which adds an extra copy
of the source just before clipping to a pixel (reinstating the
128 we subtracted). This should be easy to adapt from the existing
convolve functions, and this patch includes SSE2 highbd and
SSSE3 lowbd implementations.

Change-Id: I0abf4c2915f0665c49d88fe450dbc77b783f69e1
2017-01-03 17:15:29 +00:00
Jingning Han cc5bdf4920 Add 2x2 block level variance functions for high bd
Change-Id: I38259c4074f77a8941baefbe7585fff2eded6b12
2016-12-20 17:28:13 +00:00
Jingning Han 324b4c6d6a Add 2x2 intra predictor for high bit-depth
Provide primitive modules for cb4x4 mode use. This resolves compiler
warnings when both high bit-depth and cb4x4 mode are turned on.

Change-Id: If6ecac50578b3e665b602419a0701c3e047ce623
2016-12-20 17:28:13 +00:00
Alex Converse 2cdf0d85a2 Specify ANS window size at initialization
Change-Id: Ia1757d580dd230d9e743b1f8c3e87df164008684
2016-12-17 03:56:10 +00:00
Jingning Han 8a7786d247 Fix 2x2 d45 intra prediction
This commit fixes the 2x2 d45 intra prediction. It avoids the use
of out-of-boundary position as reference. This resolves an enc/dec
mismatch issue in cb4x4 mode.

Change-Id: I93d01536a0c004190cc9fe3c724bf41364f6fdde
2016-12-16 16:20:08 +00:00
Jingning Han e2ffaf884d Add 2x4 and 4x2 variance functions
Change-Id: Ic2fbc66e9212da32930c6a8ba1a749e3a37c5b9a
2016-12-15 20:19:19 +00:00
Debargha Mukherjee 874d36d9c9 Misc cleanups and enhancements on loop restoration
Includes:
Some cleanups/refactoring
Better buffer management.
Some preps for future chrominance restoration.

Change-Id: Ia264b8989b5f4a53c0764ed3e8258ddc212723fc
2016-12-15 19:11:46 +00:00
Angie Chiang 9e963dc0ed Shorter-tap interp first in highbitdepth mode
BDRate varies within +-0.04%

Change-Id: I76f440c479d411c09ef39a19b46eb8dbc5330efb
2016-12-15 05:49:59 +00:00
Nathan E. Egge 67b9921bbf Fix aom_write_bit() to match aom_read_bit().
The aom_write_bit() was not calling buf_uabs_write_bit() while the
 aom_read_bit() function was calling uabs_read_bit().

Change-Id: If98975341472988e8d809aa80a647d7a2531e21e
2016-12-15 02:05:58 +00:00
Nathan E. Egge 08c99eb30f Explicitly call daala read/write bit functions.
Calling aom_write_bit() and aom_read_bit() with --enable-daala_ec
 would call aom_write() and aom_read() with probability 128 which
 would ultimately call od_ec_enc_bits() and od_ec_dec_bits().
This refactors that code and makes the call explicit.

objective-1-fast:
master@2016-12-14T18:38:33Z -> daala_ec_bits@2016-12-14T18:36:22Z

    PSNR | PSNR Cb | PSNR Cr | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
  0.0000 |  0.0000 |  0.0000 |   0.0000 | 0.0000 |  0.0000 |     0.0000

Change-Id: Ib69e98734fadcdc8b89936b7b6fbd0574afc7e34
2016-12-15 02:05:58 +00:00
Nathan E. Egge 90b305a9b9 Compute token_stats in aom_write_bit_record() function.
The RD_DEBUG experiment computes stats in the _record() functions which
 then proxy calls through to the actual bit writer.
The aom_write_bit_record() should proxy calls through to aom_write_bit()
 instead of aom_write() with probability 128.

Change-Id: I7617fad0f2c25dc05cf111c660a90068c3f4c513
2016-12-15 00:45:26 +00:00
David Barker 025b25459d Change Wiener filter in loop-restoration
The Wiener filter now uses the same convolution code as the
inter predictors.

Change-Id: Ia3bfbc778171eb25c6a0141426d1f69d92c17992
2016-12-14 18:58:21 +00:00
Alex Converse 5b5140b06e Unfork some ANS setup code
Change-Id: I85e1b3cc4174029b6d1bfa4109b37793537071c2
2016-12-14 17:56:22 +00:00
Steinar Midtskogen ea42c4e969 Remove aom_simd.c and replace simd_check with macro
Change-Id: If2bb7ab2b16ba44e2d6e43eeb8713aa6c05d9d7c
2016-12-13 08:25:12 +00:00
Alex Converse b0be6411db ans: Use a fixed N-symbol window
Accept a small compression loss is in exchange for a fixed sized encoder
side buffering requirement.

subset1:
rans_base@2016-12-02T22:55:56.809Z -> rans_nsym@2016-12-02T22:58:19.859Z

    PSNR | PSNR Cb | PSNR Cr | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
  0.0304 |  0.0303 |  0.0305 |   0.0317 | 0.0312 |  0.0309 |     0.0301

Change-Id: I09dd143e4f1638b97dc9bba7023efa837a7d48c7
2016-12-12 21:28:43 +00:00
Yi Luo e98325848d High bit depth motion search SAD optimization on avx2
- For all blocks with width >= 16.
- Add test_count to make the unit tests harder to pass.
- Speed testing on 1080p, 100 frames, 5 Mbps, CPU, i7-6700
  User level time reduction:
   baseline:                  3.68%
   baseline + ext-partition: 36.12%

Change-Id: I78c5d9ca216f0fd91f1a360dca2190b11fd54a08
2016-12-09 21:14:48 +00:00
Angie Chiang 48c06da2d0 Remove saturate_int16 from fdct_round_shift
1) Not every transform's internal signal is designed to fit in 16 bits.
2) If overflow happens in this function, it indicates that we need to
adjust the txfm's scaling. We shouldn't mute the overflow signal.
3) Saturation might be handy when all of our transform design are stable,
but I don't think we are at the stable point yet.
4) This will fix C/Trans16x16DCT.AccuracyCheck/1 failure in highbd mode.

Change-Id: I5ef5d130c22adb4b8c3b608ffcb0f2c99dc7523f
2016-12-09 18:13:32 +00:00
Tristan Matthews 3fb5c4c0bc intrapred_sse2: Fix nasm build
Fixes Issue 96: https://bugs.chromium.org/p/aomedia/issues/detail?id=96&q=&desc=3

Change-Id: I47381ef3930368901c7c2ca6d7f9064216de8ad0
2016-12-07 18:45:30 +00:00
Jingning Han 9e7c49fc8a Add 2x2 variance function
Change-Id: I73bcb8ab5727e2d07e34ca35e9e014f3c6f63d56
2016-12-07 05:47:55 +00:00
Alex Converse 1ecdf2bf33 ans: Move buf_ans_flush to the .c file
It is called relatively rarely and doesn't need to be inlined.

Change-Id: I4ee7f95548f008f2ee29da807aaca54b9a25aecd
2016-12-03 02:35:06 +00:00
Alex Converse b0bbd60685 ans: Allow compressed buffer reversal
The final ANS state gets further compacted because aliasing the super
frame marker is not an issue.

Change-Id: I26208accb117a6748abb6f1c32c28fadbc48de09
2016-12-03 02:35:06 +00:00
Alex Converse 2a1b3af329 ans: Give buf_ans ownership of the AnsCoder
Change-Id: I509bbba0d84c1d378044e2c612dd48cd8f99848d
2016-12-02 02:07:27 +00:00
Jingning Han 7833d2bfbf Enable 2x2 intra prediction
Bring 2x2 intra prediction online for chroma components.

Change-Id: Ia56af9101b2a977691bca4156a6dcf89e644b4a7
2016-12-02 01:46:59 +00:00
Alex Converse 52a4b11d51 ans: Refill state at the end of the decoding process.
This should have no effect on the bitstream format (see also no related
encoder change). This is like moving code from the top of the loop to
the bottom of the loop.

This change allows us to:
* Make sure we consume the final renormalization byte after the last
symbol in an ANS partition.
* Move back toward a single renormalization operation for some ANS modes
since we know the bounds of the state mutation algorithm that got us out
of the valid state range.

Change-Id: Ia80246fd0ed805aa61b913a362546b3f08e4d79c
2016-12-02 00:35:07 +00:00
Angie Chiang 7a483cffc8 Turn on SIMD optimization for dual_filter
Let aom_convolve8_### SIMD implementation support any block width.
Turn on SIMD optimization when interpolation filter types on two
directions are different.

This will reduce 30% of encoding time when dual_filter and ext_interp
both on.

Change-Id: I539dbb2737f01835034b7269656a15b2058fa3cc
2016-12-01 21:58:03 +00:00
Alex Converse 5943d41e36 ans: Factor out refill_state
Change-Id: I648f4eb2954b2d138c2128bbf3f638eea31ec28f
2016-12-01 17:59:43 +00:00
Yaowu Xu bde4ac8260 change to use AOMedia copyright notice
Change-Id: I82580120a154ecd7c41f4cd9bc0f8c669fca7774
2016-11-29 00:01:36 +00:00
Alex Converse fa9c9d1cc0 Adjust how the final ANS state is written.
The new prefixes are
0: 15 bits of state are added to the base state.
10: 22 bits of state are added to the base state.
110: Reserved for super frame marker
111: 29 bits of state are added to the base state.

The likelihood of any final state is proportional to 1 / state. Given a
state range of [2**15, 2 **23) this should save on average 0.4 bits
per serialized final state.

BDRATE
subset1: -.000%
lowres: -.010%

Change-Id: I8e66e4a6667f5692c541083e6d6edc35ff411181
2016-11-28 23:25:46 +00:00
Yi Luo 9e218747c4 SAD avg and 4D avx2 optimization for ext-partition
- User level time reduction <1% on i7-6700 cpu

Change-Id: I8f15bde07dddd938df0b065e20ae94109e7b3b5b
2016-11-28 22:42:08 +00:00
Urvang Joshi 6be4a54b89 Add a new intra prediction mode "smooth".
This is added as part of ALT_INTRA experiment.

This uses interpolation between top row and estimated bottom row; as
well as left column and estimated right column to generate the
predicted block.The interpolation is done using a predefined weight
array.

Based on experiments, the currently chosen weight array was created
to represent a quadratic curve, but can be tuned further if needed.

Improvement from baseline on Derf set:
ALL Keyframes: 1.279%

Improvement from existing ALT_INTRA:
ALL Keyframes: 1.146%

Change-Id: I12637fa1b91bd836f1c59b27d6caee2004acbdd4
2016-11-28 12:12:26 -08:00
Steinar Midtskogen 29c61068d7 Add v256_intrinsics_v128.h to aom_dsp.mk
Change-Id: I3f0af5cf71d17f4d331d846d92728396399f187b
2016-11-23 10:06:33 +01:00
Debargha Mukherjee 84c56af017 Support 64x64 intra prediction
Change-Id: I2536b5b55f28c2ee59445c3b70d3e073e69945cd
2016-11-21 20:06:46 +00:00
Yi Luo 63bd6dc96b Fix rectangle transform computation overflow
- Add 16-bit saturation in fdct_round_shift().
- Add extreme value tests and round trip error tests.
- Fix inv 4x8 txfm calculation accuracy.
- Fix 4x8, 8x4, 8x16, 16x8, 16x32, 32x16 extreme value tests.
- BDRate: lowres: -0.034
          midres: -0.036
          hdres:  -0.013
BUG=webm:1340

Change-Id: I48365c1e50a03a7b1aa69b8856b732b483299fb5
2016-11-21 17:18:27 +00:00
Yaowu Xu 88cbc5827b Use an alternative fix to ubsan warning.
This commit revert the previous fix of the ubsan warning for unsigned
int overflow, and use a better fix by  moving the offs-- inside the
while loop to avoid "0-1" situation.

Change-Id: Id4a3e03859ebcdf264df0808412b30841028f87c
2016-11-17 21:11:23 +00:00
David Barker 4c12cc5fc5 Comment out code accidentally left in the bitstream_debug patch
In https://aomedia-review.googlesource.com/#/c/5864/ , some
code to stop the decoder at a preselected point was left enabled.
This code should only be uncommented when debugging, so comment
it out by default.

Change-Id: Ie168e8a1588ba92971e3ff1a056f597a7dfca136
2016-11-17 16:26:14 +00:00
David Barker fa2865b502 Implement bitstream debug for daala_ec
Change-Id: I809eb52e8a632189c49b8ea0a2b5de760cc2a34c
2016-11-16 17:17:24 +00:00
Yaowu Xu 4d34154b66 Fix IOC warnings
av1_txfm.h: left shift of a negative number
av1/encoder/quantize.c: unsigned int overflow
aom_dsp/entenc.c: unsigned int overflow

Change-Id: I6143e68f7d6e2621f97900808c8ef7ee0ad0c814
2016-11-16 17:08:02 +00:00
Thomas Davies faa7fcfe1c Add overwrite functions which do not zero bytes.
In several places bits are overwritten in the bitstream. These
functions avoid zeroing bytes during writing so that this can
happen correctly when the number of bits is not 8*N.

Re-addresses the attempted fix in
133c57c331

which broke threaded encoding tests, which relied on re-using
byte buffers.

Change-Id: I682c5e3a7869eac7ad475584db8bf170d47a56c9
2016-11-16 03:57:10 +00:00
Sarah Parker caaf9f9270 Revert "Support overwriting an arbitrary number of literal bits."
This reverts commit 133c57c331,
which appears to cause test failures in
AV1/AVxEncoderThreadTest.EncoderResultTest/*

Change-Id: I200b7a135ed65dc2c3f23b23b8c3dbf0872715fa
2016-11-12 01:02:11 +00:00
Yaowu Xu f42bba2522 Reinstate "fix msvc build warnings and errors"
This commit reinstates portion of a reverted commit to fix warnings
and errors with MSVC2013 build.

Change-Id: Ibb5fd665db6d8c897a657e5994547a1f82e3f188
2016-11-12 00:36:55 +00:00
Yaowu Xu fdb4216d6b Revert "fix msvc build warnings and errors"
This reverts commit 32dbdff1b3.

Change-Id: I94ef281223f7abceb156714e8192d5ea5fdc2581
2016-11-10 22:32:29 +00:00
Yaowu Xu 32dbdff1b3 fix msvc build warnings and errors
This commit fix the msvc2013 build for configuration:
configure --target=x86_64-win64-vs12 --enable-experimental
 --enable-clpf --enable-dering --enable-motion-var --enable-ans

BUG=aomedia:80

Change-Id: I08b61e38e761ea4ed3175529fba4a50c57be44ac
2016-11-10 21:51:43 +00:00
Yi Luo 1f49624c7f SAD avx2 optimization for ext-partition
- User level improves 1.33% on i7-6700

Change-Id: I279fc7ec99f4c3500017ed079709227f96e9702e
2016-11-10 19:56:00 +00:00
Debargha Mukherjee 0e11912ae1 Support 64x64 quantizer functions
Also includes some refactoring and cleanups.

Change-Id: I2c2528c434a1e9e9b898251fa69489d884463929
2016-11-09 21:59:14 +00:00
Yaowu Xu febe9b06bb Fix msvc compiler warnings
BUG=aomedia:80

Change-Id: Ie4bccf053d2c24dcb64519650bcbcef4baffcdae
2016-11-09 19:38:21 +00:00
Thomas Davies 133c57c331 Support overwriting an arbitrary number of literal bits.
Overwriting was only guaranteed to work if a whole number
of bytes were being overwritten.

Change-Id: I5e72cb337ec6ff691e93288de9f751b583654a17
2016-11-09 11:11:57 +00:00
Alex Converse 1e4e29f776 Fix rans ec_multisymbol merge issues.
The rans experiment is dead. The ans experiment with the ec_multisymbol
experiment also turned on takes its place.

Change-Id: Ie9f30ec7cf73aae6b2ea580a7b1f208485a8a7a7
2016-11-09 01:25:29 +00:00
Angie Chiang d02001ddc2 Add txb_coeff_cost_map into TOKEN_STATS
This is to facilitate debugging process in var_tx experiment

Change-Id: Ibd5ea7f6054c598b8e686abb4e8158ef28c67aab
2016-11-08 22:32:02 +00:00
Yushin Cho 77bba8d30a New experiment: Perceptual Vector Quantization from Daala
PVQ replaces the scalar quantizer and coefficient coding with a new
design originally developed in Daala. It currently depends on the
Daala entropy coder although it could be adapted to work with another
entropy coder if needed:
./configure --enable-experimental --enable-daala_ec --enable-pvq

The version of PVQ in this commit is adapted from the following
revision of Daala:
fb51c1ade6

More information about PVQ:
- https://people.xiph.org/~jm/daala/pvq_demo/
- https://jmvalin.ca/papers/spie_pvq.pdf

The following files are copied as-is from Daala with minimal
adaptations, therefore we disable clang-format on those files
to make it easier to synchronize the AV1 and Daala codebases in the future:
 av1/common/generic_code.c
 av1/common/generic_code.h
 av1/common/laplace_tables.c
 av1/common/partition.c
 av1/common/partition.h
 av1/common/pvq.c
 av1/common/pvq.h
 av1/common/state.c
 av1/common/state.h
 av1/common/zigzag.h
 av1/common/zigzag16.c
 av1/common/zigzag32.c
 av1/common/zigzag4.c
 av1/common/zigzag64.c
 av1/common/zigzag8.c
 av1/decoder/decint.h
 av1/decoder/generic_decoder.c
 av1/decoder/laplace_decoder.c
 av1/decoder/pvq_decoder.c
 av1/decoder/pvq_decoder.h
 av1/encoder/daala_compat_enc.c
 av1/encoder/encint.h
 av1/encoder/generic_encoder.c
 av1/encoder/laplace_encoder.c
 av1/encoder/pvq_encoder.c
 av1/encoder/pvq_encoder.h

Known issues:
- Lossless mode is not supported, '--lossless=1' will give the same result as
'--end-usage=q --cq-level=1'.
- High bit depth is not supported by PVQ.

Change-Id: I1ae0d6517b87f4c1ccea944b2e12dc906979f25e
2016-11-06 22:18:01 -08:00
Angie Chiang d402282f69 Add token cost comparison in write_modes_b()
This is just partial implementation
Compare token cost of pack_mb_tokens/pack_txb_tokens with token cost
from rate-distortion loop. If there is any difference, dump out mode
info.

Change-Id: I46b373ee2522c5047f799f36baf7cec5fbc06f06
2016-11-04 11:09:24 -07:00