Граф коммитов

42 Коммитов

Автор SHA1 Сообщение Дата
Ryan Lei 392d0ff726 implement combined parallel_deblocking experiment
The parallel_deblocking experiment is proposed jointly by Intel
and Microsoft. The following changes are implemented in this
experiment:

- deblocking filter order is changed to filter all vertical edges
  of the whole frame followed by filtering all horizontal edges
  of the whole frame

- filter length decision is made based on the transform block size
  on both sides of the edge. block with smaller transform size
  determines the final filter length.

- transform blocks on both sides of the edge are checked, only when
  both blocks are skipped and they belong to the same prediction
  block, filtering of that edge can be skipped.

- 15-tap filter and extended flat area detection are removed.

- special rule for handling 4x4 transform block on the super block
  boundary in VP9 is removed.

Change-Id: I1aa82c6b5335d47c2f73eec8fc8bee2c08a1cf74
2017-03-01 19:59:33 +00:00
Tom Finegan 0d3aeda300 Remove unused assembly sources and associated tests.
Change-Id: Ic8386743b1852ca1074528d04e2adc1d191b091b
2017-02-02 17:48:17 +00:00
James Zern 9303b941d8 aom_subpixel_8t_intrin_avx2: tolerate unversioned clang
assume __clang_major__==0 has the latest version of
_mm256_broadcastsi128_si256. fixes builds with custom clang toolchains.

cherry-picked from libvpx:
33aef48f2 vpx_subpixel_8t_intrin_avx2: tolerate unversioned clang

BUG=b/30970831

Change-Id: I90becd56278e4716bd46e2ba9d910af977e8dfa6
2017-01-20 17:56:40 -08:00
David Barker be6cc07d82 Add new convolve variant for loop-restoration
The convolve filters generated by loop_wiener_filter_tile
are not compatible with some existing convolve implementations
(they can have coefficients >128, sums of (certain subsets of)
coefficients >128, etc.)

So we implement a new variant, which takes a filter with 128
subtracted from its central element and which adds an extra copy
of the source just before clipping to a pixel (reinstating the
128 we subtracted). This should be easy to adapt from the existing
convolve functions, and this patch includes SSE2 highbd and
SSSE3 lowbd implementations.

Change-Id: I0abf4c2915f0665c49d88fe450dbc77b783f69e1
2017-01-03 17:15:29 +00:00
Angie Chiang 9e963dc0ed Shorter-tap interp first in highbitdepth mode
BDRate varies within +-0.04%

Change-Id: I76f440c479d411c09ef39a19b46eb8dbc5330efb
2016-12-15 05:49:59 +00:00
David Barker 025b25459d Change Wiener filter in loop-restoration
The Wiener filter now uses the same convolution code as the
inter predictors.

Change-Id: Ia3bfbc778171eb25c6a0141426d1f69d92c17992
2016-12-14 18:58:21 +00:00
Yi Luo e98325848d High bit depth motion search SAD optimization on avx2
- For all blocks with width >= 16.
- Add test_count to make the unit tests harder to pass.
- Speed testing on 1080p, 100 frames, 5 Mbps, CPU, i7-6700
  User level time reduction:
   baseline:                  3.68%
   baseline + ext-partition: 36.12%

Change-Id: I78c5d9ca216f0fd91f1a360dca2190b11fd54a08
2016-12-09 21:14:48 +00:00
Tristan Matthews 3fb5c4c0bc intrapred_sse2: Fix nasm build
Fixes Issue 96: https://bugs.chromium.org/p/aomedia/issues/detail?id=96&q=&desc=3

Change-Id: I47381ef3930368901c7c2ca6d7f9064216de8ad0
2016-12-07 18:45:30 +00:00
Angie Chiang 7a483cffc8 Turn on SIMD optimization for dual_filter
Let aom_convolve8_### SIMD implementation support any block width.
Turn on SIMD optimization when interpolation filter types on two
directions are different.

This will reduce 30% of encoding time when dual_filter and ext_interp
both on.

Change-Id: I539dbb2737f01835034b7269656a15b2058fa3cc
2016-12-01 21:58:03 +00:00
Yaowu Xu bde4ac8260 change to use AOMedia copyright notice
Change-Id: I82580120a154ecd7c41f4cd9bc0f8c669fca7774
2016-11-29 00:01:36 +00:00
Yi Luo 9e218747c4 SAD avg and 4D avx2 optimization for ext-partition
- User level time reduction <1% on i7-6700 cpu

Change-Id: I8f15bde07dddd938df0b065e20ae94109e7b3b5b
2016-11-28 22:42:08 +00:00
Yaowu Xu fdb4216d6b Revert "fix msvc build warnings and errors"
This reverts commit 32dbdff1b3.

Change-Id: I94ef281223f7abceb156714e8192d5ea5fdc2581
2016-11-10 22:32:29 +00:00
Yaowu Xu 32dbdff1b3 fix msvc build warnings and errors
This commit fix the msvc2013 build for configuration:
configure --target=x86_64-win64-vs12 --enable-experimental
 --enable-clpf --enable-dering --enable-motion-var --enable-ans

BUG=aomedia:80

Change-Id: I08b61e38e761ea4ed3175529fba4a50c57be44ac
2016-11-10 21:51:43 +00:00
Yi Luo 1f49624c7f SAD avx2 optimization for ext-partition
- User level improves 1.33% on i7-6700

Change-Id: I279fc7ec99f4c3500017ed079709227f96e9702e
2016-11-10 19:56:00 +00:00
Yi Luo 7317200002 Hybrid inverse transforms 16x16 AVX2 optimization
- Add unit tests to verify the bit-exact result.
- User level time reduction (EXT_TX):
    encoder: 3.63%
    decoder: 2.36%
- Also add tx_type=V_DCT...H_FLIPADST SSE2 for 16x16 inv txfm.

Change-Id: Idc6d9e8254aa536e5f18a87fa0d37c6bd551c083
2016-11-01 13:38:20 -07:00
David Barker 0602edfbc5 Fix aom_fdct8x8_ssse3 in high bit depth mode
Change-Id: I63e492163ef10e12a842837368c209b8ffc4eee0
2016-10-28 10:13:43 +01:00
Luca Barbato f0f98578df Namespace the idct/iad symbols
Make linking to libvpx and libaom at the same time possible.

Change-Id: I7bab8527a32e446e3d564e6fa5d94ccd056bc63f
2016-10-27 12:36:37 -07:00
Yi Luo 0c552dfd82 Fix aom_fdct32x32_avx2 output as CONFIG_AOM_HIGHBITDEPTH=1
- Change FDCT32x32_2D_AVX2 output parameter to tran_low_t.
- Add unit tests for CONFIG_AOM_HIGHBITDEPTH=1.
- Update TODO notes.
BUG=webm:1323

Change-Id: If4766c919a24231fce886de74658b6dd7a011246
2016-10-25 14:33:21 -07:00
Yi Luo 1a0f27aaa6 Fix avx2 16x16/32x32 fwd txfm coeff output on HBD
Change-Id: Ida036defe5688894a63007a31aa2dd0b3f0b5d59
2016-10-21 14:14:00 -07:00
Urvang Joshi d71a231c49 Add compiler warning flag -Wextra and fix related warnings.
Note: some of these warnings are enabled by a combination of -Wunused
(added earlier) and -Wextra.

Cherry-picked from aomedia/master: 4790a69

Change-Id: I322a1366bd4fd6c0dec9e758c2d5e88e003b1cbf
2016-10-20 15:49:16 -07:00
Urvang Joshi fdb60962f4 Fix warnings reported by -Wshadow: Part1: aom_dsp directory
While we are at it:
- Rename some variables to more meaningful names
- Reuse some common consts from a header instead of redefining them.

Cherry-picked from aomedia/master: 09eea2193

Change-Id: I61030e773137ae107d3bd43556c0d5bb26f9dbf8
2016-10-18 17:22:12 -07:00
Yi Luo e9fde265f7 Zero high 128b YMM registers to avoid SSE-AVX transition penalties
Documents:
- https://software.intel.com/en-us/articles/intel-avx-state-transitions-migrating-sse-code-to-avx
- https://software.intel.com/sites/default/files/m/d/4/1/d/8/11MC12_Avoiding_2BAVX-SSE_2BTransition_2BPenalties_2Brh_2Bfinal.pdf

Change-Id: I90f85fcb15a7a2c49ee068300be6ffe9c68d371c
2016-10-14 12:22:35 -07:00
James Zern 8c64331aa2 variance_avx2: sync variance functions with c-code
add missing int64 -> uint32 cast; quiets -Wshorten-64-to-32 warnings

Change-Id: I4850b36e18dc8b399108342be4bfe0b684aefb78
(cherry picked from commit 6acd061aad8cf62000cc9117390d0c94581a8591)
2016-10-13 20:15:18 -07:00
Alex Converse 2176b7acc2 Resolve -Wshorten-64-to-32 in variance.
The subtrahend is small enough to fit into uint32_t.

Change-Id: Ic4d7128aaa665eaf6b25d562610ba8942c46137f
(cherry picked from commit c0241664aac3a1805db9bd8e09e071ac326531e0)
2016-10-13 20:12:20 -07:00
David Barker 33231d4801 Add sse2 forward and inverse 16x32 and 32x16 transforms
Change-Id: I1241257430f1e08ead1ce0f31db8272b50783102
2016-10-13 14:01:22 +01:00
David Barker 4d03d6fc6f Add sse2 forward / inverse 4x8 and 8x4 transforms
Change-Id: I89ed93fb20cf975c2b463cff58879521ceaa4163
2016-10-10 09:02:45 -07:00
Yi Luo 3a8217f21b Merge "Hybrid forward transforms 16x16 AVX2 optimization" into nextgenv2 2016-10-07 01:52:11 +00:00
Debargha Mukherjee 609453e7e4 Merge "Added sse2 inverse 8x16 and 16x8 transforms" into nextgenv2 2016-10-07 00:03:34 +00:00
Yi Luo e8e8cd8f1b Hybrid forward transforms 16x16 AVX2 optimization
- Unit tests are added for AVX2 SIMD.
- Encoder speed improvement:
  AV1 baseline and EXT_TX, three 1080p sequences at bitrate:
  800 Kbps, 2 Mbps, 6 Mbps, on i7-6700 CPU, average
  user level time reduction: 3.86%.

Change-Id: Ibbd7837ee3a831c6b1e4e471bf6c8d3fa3a19ff4
2016-10-06 15:33:15 -07:00
Peter de Rivaz 1baecfeb03 Added sse2 inverse 8x16 and 16x8 transforms
Change-Id: I43628407b11e5c8e6af4df69f2acdc67ac827834
2016-10-06 11:23:14 -07:00
Yi Luo a674ba93fe Fix high bitdepth variance overflow on uint32_t
BUG=webm:1305

Change-Id: I4c56631359e298b99e618c07bcbae9f793c5e2ac
2016-10-03 16:37:00 -07:00
clang-format 67948d312d apply clang-format
Change-Id: If22018f8911d9d7ee99c2127bdfcc56e42b0e2d7
2016-09-15 16:41:21 -07:00
Yaowu Xu 66c41f9937 Merge "Clarify valid value ranges" into nextgenv2 2016-09-09 22:38:57 +00:00
Yaowu Xu 6feda0602a Clarify valid value ranges
This commit adds asserts to clarify value ranges in sum computations,
also corrects type conversion used in related calculations.

cherry-picked #738d5b19 from aom/master

Change-Id: Ib6d574ec23e5c28ccd994dac26f973eb3920430d
2016-09-09 11:58:53 -07:00
Geza Lore 1a800f6539 Add SSE2 versions of av1_fht8x16 and av1_fht16x8
Encoder speedup ~2% with ext-tx + rect-tx

Change-Id: Id56ddf102a887de31d181bde6d8ef8c4f03da945
2016-09-09 11:29:41 -07:00
Yaowu Xu 628d3c5839 variance_impl_avx2.c: align a table for better readability
Change-Id: I8cd99f9807dbfe6f70147615d2fd6775a7d98c16
2016-09-08 17:36:44 -07:00
Yaowu Xu 0dc4cbb059 sad_avx2.c: add hints for clang-foramt
Change-Id: I721c52e69395a99b3a0395dc229de1cbb32670e9
2016-09-07 00:29:13 +00:00
Yaowu Xu 037845507d Avoid re-use same temp variables
In highbd_quantize_intrin_sse2.c.

Change-Id: Iaf6360e456f1fb2f8ff06461afbfecfc0103dda3
2016-09-06 14:52:19 +00:00
Yaowu Xu 2ab7ff05f1 Change to use AOM copyright notice
Change-Id: I2b2b70e756b7eb9611b7b33b7d5f19b3b30e0a50
2016-09-02 19:52:03 +00:00
Yaowu Xu 9c01aa1b0c Change to use aom copyright notice
This minimize code differences between AOM master and nextgenv2

Change-Id: If144865bdf3ef0818e7aac11018b9e786444c550
2016-09-02 08:22:07 -07:00
Yaowu Xu f883b42cab Port renaming changes from AOMedia
Cherry-Picked the following commits:
0defd8f Changed "WebM" to "AOMedia" & "webm" to "aomedia"
54e6676 Replace "VPx" by "AVx"
5082a36 Change "Vpx" to "Avx"
7df44f1 Replace "Vp9" w/ "Av1"
967f722 Remove kVp9CodecId
828f30c Change "Vp8" to "AOM"
030b5ff AUTHORS regenerated
2524cae Add ref-mv experimental flag
016762b Change copyright notice to AOMedia form
81e5526 Replace vp9 w/ av1
9b94565 Add missing files
fa8ca9f Change "vp9" to "av1"
ec838b7  Convert "vp8" to "aom"
80edfa0 Change "VP9" to "AV1"
d1a11fb Change "vp8" to "aom"
7b58251 Point to WebM test data
dd1a5c8 Replace "VP8" with "AOM"
ff00fc0 Change "VPX" to "AOM"
01dee0b Change "vp10" to "av1" in source code
cebe6f0 Convert "vpx" to "aom"
17b0567 rename vp10*.mk to av1_*.mk
fe5f8a8 rename files vp10_* to av1_*

Change-Id: I6fc3d18eb11fc171e46140c836ad5339cf6c9419
2016-08-31 18:19:03 -07:00
Yaowu Xu c27fc14b02 Port folder renaming changes from AOM
Manually cherry-picked commits:
ceef058 libvpx->libaom part2
3d26d91 libvpx -> libaom
cfea7dd vp10/ -> av1/
3a8eff7 Fix a build issue for a test
bf4202e Rename vpx to aom

Change-Id: I1b0eb5a40796e3aaf41c58984b4229a439a597dc
2016-08-31 17:26:24 -07:00