Граф коммитов

48 Коммитов

Автор SHA1 Сообщение Дата
James Zern 97946622c0 Revert "mips msa vp9 subpel variance optimization"
This reverts commit a42df86c03.

this change causes MSA/VP9SubpelVarianceTest.Ref and
MSA/VP9SubpelVarianceTest.ExtremeRef failures under
mips32r5el-msa-linux-gnu and mips64r6el-msa-linux-gnu

Change-Id: I40b71a0b774eaeb31f66f795733f95cf360909f7
2015-07-02 12:06:51 -07:00
James Zern ced982640b Revert "mips msa vp9 avg subpel variance optimization"
This reverts commit 61774ad1c4.

this change causes MSA/VP9SubpelAvgVarianceTest.Ref failures under
mips32r5el-msa-linux-gnu and mips64r6el-msa-linux-gnu

Change-Id: I7fb520c12b2a3b212d5e84b7619a380a48e49bb0
2015-07-02 12:06:29 -07:00
Parag Salasakar 61774ad1c4 mips msa vp9 avg subpel variance optimization
average improvement ~3x-5x

Change-Id: Iefbcafc05daab77b38a4e63b551e427867a501a4
2015-07-01 13:46:41 +05:30
Parag Salasakar a42df86c03 mips msa vp9 subpel variance optimization
average improvement ~3x-5x

Change-Id: I4cbba2711467b0e205904769ebbb4a1fcbb1a311
2015-07-01 07:51:34 +05:30
Parag Salasakar 2d730a289a mips msa vpx_dsp variance optimization
average improvement ~2x-4x

Change-Id: Ia3eef3f390148c2eb5cdc580a94cb26369737f82
2015-06-30 12:22:18 +05:30
James Zern e0e4045db8 variance_test: fix build w/--disable-vp8-encoder
s/CONFIG_VP8\b/CONFIG_VP8_ENCODER/

Change-Id: I616aace9cf8f18d7e83f00f7aef3b8a26fc4c17b
2015-06-11 23:15:30 -07:00
James Zern 47fe535422 disable vp8_sub_pixel_variance8x8_neon
fails unit tests:
[  FAILED  ] NEON/VP8SubpelVarianceTest.ExtremeRef/0, where GetParam() = (3, 3, 0x14e36d, 0)
[  FAILED  ] NEON/VP8SubpelVarianceTest.Ref/0, where GetParam() = (3, 3, 0x14e36d, 0)

the tests were recently enabled in:
eb88b17 Make vp9 subpixel match vp8

the functions likely haven't changed since being converted from assembly

Change-Id: I6141717b111b8f735f436c160d74270af53ef722
2015-06-05 20:18:51 -07:00
Johann eb88b172fe Make vp9 subpixel match vp8
The only difference between the two was that the vp9 function allowed
for every step in the bilinear filter (16 steps) while vp8 only allowed
for half of those. Since all the call sites in vp9 (<< 1) the input, it
only ever used the same steps as vp8.

This will allow moving the subpel variance to vpx_dsp with the rest of
the variance functions.

Change-Id: I6fa2509350a2dc610c46b3e15bde98a15a084b75
2015-06-03 22:10:51 -07:00
Johann d90536c1a2 Unify reference variance functions
Use uint32_t for all output and make all functions static

Change-Id: I2c9c6f6310732dc53444607d1c1a268ac1ab83ba
2015-06-02 15:14:55 -07:00
Johann fdc549994a Cast variance reference output
The larger internal variables are required for the intermediates
but RoundHighBitDepth brings them down to uint32_t/unsigned int.

Fixes type warnings in visual studio.

Change-Id: I48d35284d6cbde330ccdc1f46b6215a645d5eb00
2015-06-01 10:56:52 -07:00
Johann a927aec5f8 Merge "Use correct parameters for NEON variance tests" 2015-05-28 19:53:50 +00:00
Johann efc2e9844e Use correct parameters for NEON variance tests
Change-Id: Ib2949d0a3e9273e7952bbf91956357c1138093f1
2015-05-28 11:28:06 -07:00
Johann c855ed72a6 Remove conversion warnings from hbd shifts
ROUND_POWER_OF_TWO has some poor side effects when used
with [u]int64_t such as doing the shifting in 32bits.

Change-Id: Ic85a19765cd316fb43657cb21c86f35ceb772773
2015-05-27 17:54:22 -07:00
Johann c5a7c89e89 Correct case in Get4x4SSEFunc
Change-Id: Ie8a7508798fa8e65c579a77cedb8305cee4ddc81
2015-05-27 11:38:43 -07:00
Johann c3bdffb0a5 Move variance functions to vpx_dsp
subpel functions will be moved in another patch.

Change-Id: Idb2e049bad0b9b32ac42cc7731cd6903de2826ce
2015-05-26 12:01:52 -07:00
Johann 1d7ccd5325 Relocate memory operations for common code
With the sad functions, and hopefully the variance functions soon,
moving to the vpx_dsp location, place the defines used in the
reference C code in a common location.

Change-Id: I4c8ce7778eb38a0a3ee674d2f1c488eda01cfeca
2015-05-13 11:41:15 -07:00
Frank Galligan ec1d8387e1 Add 64x64 sub_pel_variance Neon function
On Nexus 7 speed -5, -6, -7, and -8 saw about a 15% increase
in perf for 480p. Speeds -5, -6, -7, and -8 saw about a 10%
increase in perf for 720p.

Tested on Nexus 7, built with ndk r10d, gcc 4.9.

Change-Id: I2fa5315845e3021c9a6e2ea47e52e68b398d8334
2015-01-14 08:36:24 -08:00
Frank Galligan 74d40cd507 Add 64x variance Neon functions
Add optimized Neon functions of:
vp9_variance32x64
vp9_variance64x32
vp9_variance64x64

On Nexus 7 speed -5 and -6 saw about a 4% increase in perf.
Speeds -7 and -8 saw about a 6% increase in perf.
Tested on Nexus 7, built with ndk r10d, gcc 4.9.

Change-Id: I5a81f13c9897eb927fa39662530f5524a0f768fa
2015-01-13 15:08:13 -08:00
Peter de Rivaz 48032bfcdb Added sse2 acceleration for highbitdepth variance
Change-Id: I446bdf3a405e4e9d2aa633d6281d66ea0cdfd79f
(cherry picked from commit d7422b2b1eb9f0011a8c379c2be680d6892b16bc)
(cherry picked from commit 6d741e4d76a7d9ece69ca117d1d9e2f9ee48ef8c)
2014-11-14 15:18:53 -08:00
Scott LaVarnway fe2cc873dc VP8 encoder for ARMv8 by using NEON intrinsics 1
Add vp8_mse16x16_neon.c
- vp8_mse16x16_neon
- vp8_get4x4sse_cs_neon

Change-Id: I108952f60a9ae50613f0ce3903c2c81df19d99d0
Signed-off-by: James Yu <james.yu@linaro.org>
2014-09-15 12:04:09 -07:00
Dmitry Kovalev 1f19ebbab6 Replacing vp9_get_mb_ss_sse2 asm implementation with intrinsics.
Change-Id: Ib4f5dd733eb2939b108070a01e83da5d9990bac0
2014-09-06 00:10:25 -07:00
Dmitry Kovalev 202edb3d23 Actually resetting random generator for all variance test cases.
Calling Reset(int) method instead of overloaded operator()(int).
Adding underscore at the end of class member name.

Change-Id: I01934e7bc056d4b594e5d05d693328febd34ac3c
2014-09-04 12:24:52 -07:00
Dmitry Kovalev 12cd6f421d Removing variance MMX code.
Removed functions:
* vp9_mse16x16_mmx
* vp9_get_mb_ss_mmx
* vp9_get4x4var_mmx
* vp9_get8x8var_mmx
* vp9_variance4x4_mmx
* vp9_variance8x8_mmx
* vp9_variance16x16_mmx
* vp9_variance16x8_mmx
* vp9_variance8x16_mmx

They all have SSE2 equivalent.

Change-Id: I3796f2477c4f59b35b4828f46a300c16e62a2615
2014-08-29 10:26:42 -07:00
levytamar82 69a5f5ecf7 Fix bug 807
in the sub_pixel_*variance* function the dst is aligned to 16 bytes and not
to 32 bytes - now load unaligned data

Change-Id: I2e0b9745543697efc56fefa32857ea10117af135
2014-08-07 18:51:02 -07:00
Scott LaVarnway 98165ec074 Neon version of vp9_sub_pixel_variance8x8(),
vp9_variance8x8(), and vp9_get8x8var().

On a Nexus 7, vpxenc (in realtime mode, speed -12)
reported a performance improvement of ~1.2%.

Change-Id: I8a66ac2a0f550b407caa27816833bdc563395102
2014-08-01 11:35:55 -07:00
Scott LaVarnway d39448e2d4 Neon version of vp9_sub_pixel_variance32x32(),
vp9_variance32x32(), and vp9_get32x32var().

Change-Id: I8137e2540e50984744da59ae3a41e94f8af4a548
2014-07-31 08:00:36 -07:00
Scott LaVarnway 521cf7e879 Neon version of vp9_sub_pixel_variance16x16(),
vp9_variance16x16(), and vp9_get16x16var().

On a Nexus 7, vpxenc (in realtime mode, speed -12)
reported a performance improvement of ~16.7%.

Change-Id: Ib163aa99f56e680194aabe00dacdd7f0899a4ecb
2014-07-30 08:17:32 -07:00
Yunqing Wang 5c93e62e0a Allocate aligned source in variance test
The source buffer is an aligned buffer in VP9. Added the alignment
to make it consistent with libvpx.

Change-Id: I3ebb9d2e8555ed532951da479dd5cbbb8812e02d
2014-07-24 17:11:58 -07:00
James Zern 29e1b1a4b0 tests: add API_REGISTER_STATE_CHECK
used to wrap API functions to ensure full environment consistency as
opposed to the renamed ASM_REGISTER_STATE_CHECK which is used with
assembly functions.
currently checks the FPU tag word in x86/x86_64 gcc builds to ensure
emms has been called.

Change-Id: Ie241772dbf903d33d516a1add4c8c6783f2e1490
2014-07-10 12:40:31 -07:00
James Zern 520cb3f39f vp9_sub_pixel_*variance*: disable avx2 variants
tests failing under Win32/Win64

+ variance_test: add missing avx2 functions (partially disabled)

Change-Id: I6abc0657ea076379ab9ca65c12678b9ea199849d
2014-06-10 16:11:15 -07:00
James Zern 6e5e75fa21 Revert "Removing redundant variables from variance_test.cc."
This reverts commit 4725ab7e51.

The constants are necessary to avoid breakage in vs9 builds:
 warning C4180: qualifier applied to function type has no meaning; ignored
 error C2436: 'f2_' : member function or nested class in constructor initializer list
 while compiling class template member function 'std::tr1::tuple<T0,T1,T2,T3,T4,T5,T6,T7,T8,T9>::tuple(const int &,const int &,unsigned int (__cdecl &))'
 ..\test\variance_test.cc : see reference to class template instantiation 'std::tr1::tuple<T0,T1,T2,T3,T4,T5,T6,T7,T8,T9>' being compiled

Change-Id: Ia218b74fc473d40f02fee84cb7009adfbe82e5a7
2014-05-08 14:35:40 -07:00
Dmitry Kovalev 4725ab7e51 Removing redundant variables from variance_test.cc.
Change-Id: Icd44bce1c9d292f6e6f4d5157b694f6170b7b289
2014-05-07 14:40:21 -07:00
James Zern d5e07a8451 variance_test: add NEON functions
note not all functions have NEON implementations:
- variance4x4_neon

Change-Id: I03c1ba21f3b02aa2482d7ca8feedc3ef74b5947f
2014-02-26 19:25:02 -08:00
James Zern 002ad40897 test/: remove unnecessary extern "C"s
Change-Id: I826655a708010149de231ca31a2e3ba4f1842c0c
2014-01-23 19:42:59 -08:00
James Zern a0fcbcfa5f fix vp8-only build
Change-Id: Id9ce44f3364dd57b30ea491d956a2a0d6186be05
2013-09-17 18:47:25 -07:00
Yaowu Xu afffa3d9b0 cleanup cpplint warnings
Suggested by James Zern to clear out cpplint warnings for all unit
test code.

Change-Id: I731a3fa4d2a257eb9ef733426ba84286fbd7ea34
2013-09-06 10:13:49 -07:00
Jim Bankoski 5b307886fb variance x86inc guards
also fixed bug in sad calcs

Change-Id: I6571fcbe37556c16ae32be66dc0fd879852aac1d
2013-08-06 14:17:13 -07:00
James Zern e247ab09a6 variance_test: add missing ClearSystemState...
...to recently added SubpelVarianceTest

Change-Id: I8775e39fd5dbfba81ad42b79b47bf6dd6ca8cc0e
2013-06-26 18:32:21 -07:00
Ronald S. Bultje ac6ea2ab91 Allocate memory using appropriate expected alignment in unit tests.
Fixes crashes of test_libvpx on 32-bit Linux.

Change-Id: If94e7628a86b788ca26c004861dee2f162e47ed6
2013-06-21 17:03:57 -07:00
James Zern cc774c8bb0 variance_test: use REGISTER_STATE_CHECK
Change-Id: Id54ad9a781634f075e990d5bade5be8490959975
2013-06-21 14:30:08 -07:00
Ronald S. Bultje 1e6a32f1af SSE2/SSSE3 optimizations and unit test for sub_pixel_avg_variance().
Encoding of bus @ 1500kbps (first 50 frames) goes from 3min57 to
3min35, i.e. approximately a 10.5% speedup. Note that the SIMD versions
which use a bilinear filter (x_offset & 7 || y_offset & 7) aren't
perfectly interleaved, and can probably be improved further in the
future. I've marked this with a few TODOs/FIXMEs in the code.

Change-Id: I5c9e900c0f0d32e431a50fecae213b510b2549f9
2013-06-20 15:59:48 -07:00
Ronald S. Bultje 8fb6c58191 Implement sse2 and ssse3 versions for all sub_pixel_variance sizes.
Overall speedup around 5% (bus @ 1500kbps first 50 frames 4min10 ->
3min58). Specific changes to timings for each function compared to
original assembly-optimized versions (or just new version timings if
no previous assembly-optimized version was available):

sse2   4x4:    99 ->   82 cycles
sse2   4x8:           128 cycles
sse2   8x4:           121 cycles
sse2   8x8:   149 ->  129 cycles
sse2   8x16:  235 ->  245 cycles (?)
sse2  16x8:   269 ->  203 cycles
sse2  16x16:  441 ->  349 cycles
sse2  16x32:          641 cycles
sse2  32x16:          643 cycles
sse2  32x32: 1733 -> 1154 cycles
sse2  32x64:         2247 cycles
sse2  64x32:         2323 cycles
sse2  64x64: 6984 -> 4442 cycles

ssse3  4x4:           100 cycles (?)
ssse3  4x8:           103 cycles
ssse3  8x4:            71 cycles
ssse3  8x8:           147 cycles
ssse3  8x16:          158 cycles
ssse3 16x8:   188 ->  162 cycles
ssse3 16x16:  316 ->  273 cycles
ssse3 16x32:          535 cycles
ssse3 32x16:          564 cycles
ssse3 32x32:          973 cycles
ssse3 32x64:         1930 cycles
ssse3 64x32:         1922 cycles
ssse3 64x64:         3760 cycles

Change-Id: I81ff6fe51daf35a40d19785167004664d7e0c59d
2013-06-20 09:34:25 -07:00
James Zern 5b756748fd tests: clear system state after non-API calls
add ClearSystemState() to reset MMX registers avoiding corrupting
subsequent tests.

Change-Id: I668deb09aa7aa467709776e5819f936910698bc0
2013-06-18 11:32:27 -07:00
Yunqing Wang f4fcfe3075 Optimize variance functions
Added SSE2 version of variance functions for super blocks.

Change-Id: Ibeaae8771ca21c99d41dd74067574a51e97b412d
2013-05-22 10:29:38 -07:00
James Zern 1711cf2dbb add vp8 variance test
Change-Id: I4e94ee2c4e2360d6a11a454c323f2899c1bb6f72
2013-02-22 16:25:14 -08:00
John Koleszar fcccbcbb39 Add vp9_ prefix to all vp9 files
Support for gyp which doesn't support multiple objects in the same
static library having the same basename.

Change-Id: Ib947eefbaf68f8b177a796d23f875ccdfa6bc9dc
2012-11-27 14:12:30 -08:00
John Koleszar a9c7597adc support building vp8 and vp9 into a single lib
Change-Id: Ib8f8a66c9fd31e508cdc9caa662192f38433aa3d
2012-11-15 10:46:17 -08:00
James Zern 984734436d Fix variance (signed integer) overflow
In the variance calculations the difference is summed and later squared.
When the sum exceeds sqrt(2^31) the value is treated as a negative when
it is shifted which gives incorrect results.

To fix this we force the multiplication to be unsigned.

The alternative fix is to shift sum down by 4 before multiplying.
However that will reduce precision.

For 16x16 blocks the maximum sum is 65280 and sqrt(2^31) is 46340 (and
change).

This change is based on:
1698234 Missed some variance casts
fea3556 Fix variance overflow

Change-Id: I2c61856cca9db54b9b81de83b4505ea81a050a0f
2012-11-06 23:06:44 -08:00