Граф коммитов

32 Коммитов

Автор SHA1 Сообщение Дата
Steinar Midtskogen 83307f33f2 Fix typos in comments
Change-Id: Id70b49e2a77c6837da75c684d622ddfe60f3d97e
2017-01-07 10:26:28 +01:00
Steinar Midtskogen d954f2d77d Disable unsupported SIMD optimisations for CLPF for 32 bit VS targets
VS compiling for 32 bit targets does not support vector types in
structs as arguments, which makes the v256 type of the intrinsics hard
to support, so optimizations for this target are disabled.

Change-Id: I675394cf1aed0cb18a48f21216470867031b30ce
2017-01-07 08:59:56 +00:00
David Barker be6cc07d82 Add new convolve variant for loop-restoration
The convolve filters generated by loop_wiener_filter_tile
are not compatible with some existing convolve implementations
(they can have coefficients >128, sums of (certain subsets of)
coefficients >128, etc.)

So we implement a new variant, which takes a filter with 128
subtracted from its central element and which adds an extra copy
of the source just before clipping to a pixel (reinstating the
128 we subtracted). This should be easy to adapt from the existing
convolve functions, and this patch includes SSE2 highbd and
SSSE3 lowbd implementations.

Change-Id: I0abf4c2915f0665c49d88fe450dbc77b783f69e1
2017-01-03 17:15:29 +00:00
Jingning Han cc5bdf4920 Add 2x2 block level variance functions for high bd
Change-Id: I38259c4074f77a8941baefbe7585fff2eded6b12
2016-12-20 17:28:13 +00:00
Jingning Han 324b4c6d6a Add 2x2 intra predictor for high bit-depth
Provide primitive modules for cb4x4 mode use. This resolves compiler
warnings when both high bit-depth and cb4x4 mode are turned on.

Change-Id: If6ecac50578b3e665b602419a0701c3e047ce623
2016-12-20 17:28:13 +00:00
Jingning Han e2ffaf884d Add 2x4 and 4x2 variance functions
Change-Id: Ic2fbc66e9212da32930c6a8ba1a749e3a37c5b9a
2016-12-15 20:19:19 +00:00
Yi Luo e98325848d High bit depth motion search SAD optimization on avx2
- For all blocks with width >= 16.
- Add test_count to make the unit tests harder to pass.
- Speed testing on 1080p, 100 frames, 5 Mbps, CPU, i7-6700
  User level time reduction:
   baseline:                  3.68%
   baseline + ext-partition: 36.12%

Change-Id: I78c5d9ca216f0fd91f1a360dca2190b11fd54a08
2016-12-09 21:14:48 +00:00
Jingning Han 9e7c49fc8a Add 2x2 variance function
Change-Id: I73bcb8ab5727e2d07e34ca35e9e014f3c6f63d56
2016-12-07 05:47:55 +00:00
Jingning Han 7833d2bfbf Enable 2x2 intra prediction
Bring 2x2 intra prediction online for chroma components.

Change-Id: Ia56af9101b2a977691bca4156a6dcf89e644b4a7
2016-12-02 01:46:59 +00:00
Yi Luo 9e218747c4 SAD avg and 4D avx2 optimization for ext-partition
- User level time reduction <1% on i7-6700 cpu

Change-Id: I8f15bde07dddd938df0b065e20ae94109e7b3b5b
2016-11-28 22:42:08 +00:00
Urvang Joshi 6be4a54b89 Add a new intra prediction mode "smooth".
This is added as part of ALT_INTRA experiment.

This uses interpolation between top row and estimated bottom row; as
well as left column and estimated right column to generate the
predicted block.The interpolation is done using a predefined weight
array.

Based on experiments, the currently chosen weight array was created
to represent a quadratic curve, but can be tuned further if needed.

Improvement from baseline on Derf set:
ALL Keyframes: 1.279%

Improvement from existing ALT_INTRA:
ALL Keyframes: 1.146%

Change-Id: I12637fa1b91bd836f1c59b27d6caee2004acbdd4
2016-11-28 12:12:26 -08:00
Debargha Mukherjee 84c56af017 Support 64x64 intra prediction
Change-Id: I2536b5b55f28c2ee59445c3b70d3e073e69945cd
2016-11-21 20:06:46 +00:00
Yi Luo 1f49624c7f SAD avx2 optimization for ext-partition
- User level improves 1.33% on i7-6700

Change-Id: I279fc7ec99f4c3500017ed079709227f96e9702e
2016-11-10 19:56:00 +00:00
Debargha Mukherjee 0e11912ae1 Support 64x64 quantizer functions
Also includes some refactoring and cleanups.

Change-Id: I2c2528c434a1e9e9b898251fa69489d884463929
2016-11-09 21:59:14 +00:00
Yushin Cho 77bba8d30a New experiment: Perceptual Vector Quantization from Daala
PVQ replaces the scalar quantizer and coefficient coding with a new
design originally developed in Daala. It currently depends on the
Daala entropy coder although it could be adapted to work with another
entropy coder if needed:
./configure --enable-experimental --enable-daala_ec --enable-pvq

The version of PVQ in this commit is adapted from the following
revision of Daala:
fb51c1ade6

More information about PVQ:
- https://people.xiph.org/~jm/daala/pvq_demo/
- https://jmvalin.ca/papers/spie_pvq.pdf

The following files are copied as-is from Daala with minimal
adaptations, therefore we disable clang-format on those files
to make it easier to synchronize the AV1 and Daala codebases in the future:
 av1/common/generic_code.c
 av1/common/generic_code.h
 av1/common/laplace_tables.c
 av1/common/partition.c
 av1/common/partition.h
 av1/common/pvq.c
 av1/common/pvq.h
 av1/common/state.c
 av1/common/state.h
 av1/common/zigzag.h
 av1/common/zigzag16.c
 av1/common/zigzag32.c
 av1/common/zigzag4.c
 av1/common/zigzag64.c
 av1/common/zigzag8.c
 av1/decoder/decint.h
 av1/decoder/generic_decoder.c
 av1/decoder/laplace_decoder.c
 av1/decoder/pvq_decoder.c
 av1/decoder/pvq_decoder.h
 av1/encoder/daala_compat_enc.c
 av1/encoder/encint.h
 av1/encoder/generic_encoder.c
 av1/encoder/laplace_encoder.c
 av1/encoder/pvq_encoder.c
 av1/encoder/pvq_encoder.h

Known issues:
- Lossless mode is not supported, '--lossless=1' will give the same result as
'--end-usage=q --cq-level=1'.
- High bit depth is not supported by PVQ.

Change-Id: I1ae0d6517b87f4c1ccea944b2e12dc906979f25e
2016-11-06 22:18:01 -08:00
David Barker 0602edfbc5 Fix aom_fdct8x8_ssse3 in high bit depth mode
Change-Id: I63e492163ef10e12a842837368c209b8ffc4eee0
2016-10-28 10:13:43 +01:00
Yi Luo 133c13d637 Fix incorrect merge of forward txfm function declarations
- Restore the fwd txfm HBD function declarations exposure.

Change-Id: I1e33df6297fd37e242f4b73c8ab97063b9feb7c6
2016-10-26 10:30:53 -07:00
Jingning Han 03b3514058 Add 2x2 directional intra predictors
Change-Id: Iaa25269a15231dadeaba0f4836c864fc10e858df
2016-10-19 21:58:09 +00:00
Yi Luo b9fbf38bff Merge "Delete some redundant function declarations in aom_dsp_rtcd_defs.pl" into nextgenv2 2016-10-14 17:50:37 +00:00
Yue Chen a48764d05f Merge "Renamings for OBMC experiment" into nextgenv2 2016-10-14 01:33:00 +00:00
Yi Luo 761ae880d7 Delete some redundant function declarations in aom_dsp_rtcd_defs.pl
Change-Id: I4df57a7faba5800c048b2dc469ec31545406f55c
2016-10-13 17:53:45 -07:00
Yue Chen cb60b185c7 Renamings for OBMC experiment
To get ready for pulling AV1 to nextgenv2
Replace the experimental flag by MOTION_VAR. Rename major variables.

Change-Id: If6cf4f37b9319c46d8f90df551cc7295d66ca205
2016-10-13 15:51:22 -07:00
Jingning Han e3954d8312 Sync 2x2 intra predictors
Add 2x2 DC, V, H, TM intra predictors.

Change-Id: I2a614adde553f821c45bc5a9bf09800a9f0aaa26
2016-10-12 21:04:01 -07:00
Yi Luo fed8e1c06d Hybrid forward transform 32x32 AVX2 optimization
- av1_fht32x32 AVX2 function level time reduction ~89% compared to C.

- av1_fht32x32_avx2() on DCT_DCT improves 42.62% over aom_fdct32x32_avx2()
  But function replacement must go with the corresponding inverse txfm.

- No obvious user level time reduction due to 32x32 TX_TYPE selection.

- Zero high 128b YMM to avoid AVX-SSE transition penalties
  (fix 16x16 case).

- Added 32x32 AVX2 unit tests to verify bitexact.

- AVX2 optimization summary:
  On CPU i7-6700, based on 16x16/32x32 fwd txfm optimization results:
  C to AVX2: function level time reduction, ~86-89%.
  SSE2 to AVX2: function level time reduction, ~51%.

Change-Id: Idd0cd8bf066a61c7117140ef15ab6c1f8eb4b036
2016-10-12 14:19:53 -07:00
Yaowu Xu f36d0b46d1 minor updates
1. vp8->aom
2. removed no-effect statements and spaces

Change-Id: I367d05ff9bf1b9f3c71c517c45d8049d9d4236ec
2016-10-12 10:50:08 -07:00
Steinar Midtskogen ecf9a0c821 Extend CLPF to chroma.
Objective quality impact (low latency):

PSNR YCbCr:      0.13%     -1.37%     -1.79%
   PSNRHVS:      0.03%
      SSIM:      0.24%
    MSSSIM:      0.10%
 CIEDE2000:     -0.83%

Change-Id: I8ddf0def569286775f0f9d4d4005932766a7fc27
2016-10-10 15:23:38 -07:00
Steinar Midtskogen 3dbd55a6c4 Added high bit-depth support in CLPF.
Change-Id: Ic5eadb323227a820ad876c32d4dc296e05db6ece
2016-10-10 11:27:04 -07:00
Steinar Midtskogen e8224c7ad5 Reduce memory footprint for CLPF decoding.
Instead of having CLPF write to an entire new frame and
copy the result back into the original frame, make the
filter able to work in-place by keeping a buffer of size
frame_width*filter_block_size and delay the write-back
by one filter_block_size row.

This reduces the cycles spent in the filter to ~75%.

Change-Id: I78ca74380c45492daa8935d08d766851edb5fbc1
2016-10-10 11:26:33 -07:00
Steinar Midtskogen be668e92c3 Added generic SIMD support for CLPF.
Change-Id: Ie03f9a5b0a4c708a586532198d755a1e7509f149
2016-10-10 11:19:37 -07:00
Yi Luo e8e8cd8f1b Hybrid forward transforms 16x16 AVX2 optimization
- Unit tests are added for AVX2 SIMD.
- Encoder speed improvement:
  AV1 baseline and EXT_TX, three 1080p sequences at bitrate:
  800 Kbps, 2 Mbps, 6 Mbps, on i7-6700 CPU, average
  user level time reduction: 3.86%.

Change-Id: Ibbd7837ee3a831c6b1e4e471bf6c8d3fa3a19ff4
2016-10-06 15:33:15 -07:00
Urvang Joshi 340593e530 Add ALT_INTRA experiment.
When the experiment is ON, we use Paeth predictor instead of TM
predictor.

For derf set, this gives about 0.09% improvement overall, and 0.55%
improvement if all frames are forced to be intra-only.

Also, if the EXT_INTRA experiment is also on, the improvement overall
is 0.056%, and improvement if all frames are forced to be intra-only is
0.465%.

Change-Id: Id74e107ede70a8d2107fa14fcb3f44b23a437274
2016-09-01 12:03:20 -07:00
Yaowu Xu f883b42cab Port renaming changes from AOMedia
Cherry-Picked the following commits:
0defd8f Changed "WebM" to "AOMedia" & "webm" to "aomedia"
54e6676 Replace "VPx" by "AVx"
5082a36 Change "Vpx" to "Avx"
7df44f1 Replace "Vp9" w/ "Av1"
967f722 Remove kVp9CodecId
828f30c Change "Vp8" to "AOM"
030b5ff AUTHORS regenerated
2524cae Add ref-mv experimental flag
016762b Change copyright notice to AOMedia form
81e5526 Replace vp9 w/ av1
9b94565 Add missing files
fa8ca9f Change "vp9" to "av1"
ec838b7  Convert "vp8" to "aom"
80edfa0 Change "VP9" to "AV1"
d1a11fb Change "vp8" to "aom"
7b58251 Point to WebM test data
dd1a5c8 Replace "VP8" with "AOM"
ff00fc0 Change "VPX" to "AOM"
01dee0b Change "vp10" to "av1" in source code
cebe6f0 Convert "vpx" to "aom"
17b0567 rename vp10*.mk to av1_*.mk
fe5f8a8 rename files vp10_* to av1_*

Change-Id: I6fc3d18eb11fc171e46140c836ad5339cf6c9419
2016-08-31 18:19:03 -07:00