Граф коммитов

1858 Коммитов

Автор SHA1 Сообщение Дата
Yaowu Xu 681ba36414 Merge "Merge changes from libvpx/master by cherry-pick" into nextgenv2 2016-07-18 22:43:40 +00:00
Zoe Liu e7869b7168 Correct the experiment names for ext-refs
Change-Id: I83a2b22d12e4573453e2ad866c7ceb430ff062c6
2016-07-18 11:28:31 -07:00
Johann 2967bf355e Merge changes from libvpx/master by cherry-pick
This commit bring all up-to-date changes from master that are
applicable to nextgenv2. Due to the remove VP10 code in master,
we had to cherry pick the following commits to get those changes:

Add default flags for arm64/armv8 builds

Allows building simple targets with sane default flags.

For example, using the Android arm64 toolchain from the NDK:
https://developer.android.com/ndk/guides/standalone_toolchain.html
./build/tools/make-standalone-toolchain.sh --arch=arm64 \
  --platform=android-24 --install-dir=/tmp/arm64
CROSS=/tmp/arm64/bin/aarch64-linux-android- \
  ~/libvpx/configure --target=arm64-linux-gcc --disable-multithread

BUG=webm:1143

vpx_lpf_horizontal_4_sse2: Remove dead load.

Change-Id: I51026c52baa1f0881fcd5b68e1fdf08a2dc0916e

Fail early when android target does not include --sdk-path

Change-Id: I07e7e63476a2e32e3aae123abdee8b7bbbdc6a8c

configure: clean up var style and set_all usage

Use quotes whenever possible and {} always for variables.

Replace multiple set_all calls with *able_feature().

Conflicts:
	build/make/configure.sh

vp9-svc: Remove some unneeded code/comment.

datarate_test,DatarateTestLarge: normalize bits type

quiets a msvc warning:
conversion from 'const int64_t' to 'size_t', possible loss of data

mips added p6600 cpu support

Removed -funroll-loops

psnr.c: use int64_t for sum of differences

Since the values can be negative.

*.asm: normalize label format

add a trailing ':', though it's optional with the tools we support, it's
more common to use it to mark a label. this also quiets the
orphan-labels warning with nasm/yasm.

BUG=b/29583530

Prevent negative variance

Due to rounding, hbd variance may become negative. This commit put in
check and clamp of negative values to 0.

configure: remove old visual studio support (<2010)

BUG=b/29583530

Conflicts:
	configure

configure: restore vs_version variable

inadvertently lost in the final patchset of:
078dff7 configure: remove old visual studio support (<2010)

this prevents an empty CONFIG_VS_VERSION and avoids make failure

Require x86inc.asm

Force enable x86inc.asm when building for x86. Previously there were
compatibility issues so a flag was added to simplify disabling this
code.

The known issues have been resolved and x86inc.asm is the preferred
abstraction layer (over x86_abi_support.asm).

BUG=b:29583530

convolve_test: fix byte offsets in hbd build

CONVERT_TO_BYTEPTR(x) was corrected in:
003a9d2 Port metric computation changes from nextgenv2
to use the more common (x) within the expansion. offsets should occur
after converting the pointer to the desired type.

+ factorized some common expressions

Conflicts:
	test/convolve_test.cc

vpx_dsp: remove x86inc.asm distinction

BUG=b:29583530

Conflicts:
	vpx_dsp/vpx_dsp.mk
	vpx_dsp/vpx_dsp_rtcd_defs.pl
	vpx_dsp/x86/highbd_variance_sse2.c
	vpx_dsp/x86/variance_sse2.c

test: remove x86inc.asm distinction

BUG=b:29583530

Conflicts:
	test/vp9_subtract_test.cc

configure: remove x86inc.asm distinction

BUG=b:29583530

Change-Id: I59a1192142e89a6a36b906f65a491a734e603617

Update vpx subpixel 1d filter ssse3 asm

Speed test shows the new vertical filters have degradation on Celeron
Chromebook. Added "X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON" to control
the vertical filters activated code. Now just simply active the code
without degradation on Celeron. Later there should be 2 set of vertical
filters ssse3 functions, and let jump table to choose based on CPU type.

improve vpx_filter_block1d* based on replace paddsw+psrlw to pmulhrsw

Make set_reference control API work in VP9

Moved the API patch from NextGenv2. An example was included.
To try it, for example, run the following command:
$ examples/vpx_cx_set_ref vp9 352 288 in.yuv out.ivf 4 30

Conflicts:
	examples.mk
	examples/vpx_cx_set_ref.c
	test/cx_set_ref.sh
	vp9/decoder/vp9_decoder.c

deblock filter : moved from vp8 code branch

The deblocking filters used in vp8 have been moved to vpx_dsp for
use by both vp8 and vp9.

vpx_thread.[hc]: update webp source reference

+ drop the blob hash, the updated reference will be updated in the
commit message

BUG=b/29583578

vpx_thread: use native windows cond var if available

BUG=b/29583578

original webp change:

commit 110ad5835ecd66995d0e7f66dca1b90dea595f5a
Author: James Zern <jzern@google.com>
Date:   Mon Nov 23 19:49:58 2015 -0800

    thread: use native windows cond var if available

    Vista / Server 2008 and up. no speed difference observed.

100644 blob 4fc372b7bc6980a9ed3618c8cce5b67ed7b0f412 src/utils/thread.c
100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h

vpx_thread: use InitializeCriticalSectionEx if available

BUG=b/29583578

original webp change:

commit 63fadc9ffacc77d4617526a50c696d21d558a70b
Author: James Zern <jzern@google.com>
Date:   Mon Nov 23 20:38:46 2015 -0800

    thread: use InitializeCriticalSectionEx if available

    Windows Vista / Server 2008 and up

100644 blob f84207d89b3a6bb98bfe8f3fa55cad72dfd061ff src/utils/thread.c
100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h

vpx_thread: use WaitForSingleObjectEx if available

BUG=b/29583578

original webp change:

commit 0fd0e12bfe83f16ce4f1c038b251ccbc13c62ac2
Author: James Zern <jzern@google.com>
Date:   Mon Nov 23 20:40:26 2015 -0800

    thread: use WaitForSingleObjectEx if available

    Windows XP and up

100644 blob d58f74e5523dbc985fc531cf5f0833f1e9157cf0 src/utils/thread.c
100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h

vpx_thread: use CreateThread for windows phone

BUG=b/29583578

original webp change:

commit d2afe974f9d751de144ef09d31255aea13b442c0
Author: James Zern <jzern@google.com>
Date:   Mon Nov 23 20:41:26 2015 -0800

    thread: use CreateThread for windows phone

    _beginthreadex is unavailable for winrt/uwp

    Change-Id: Ie7412a568278ac67f0047f1764e2521193d74d4d

100644 blob 93f7622797f05f6acc1126e8296c481d276e4047 src/utils/thread.c
100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h

vp9_postproc.c missing extern.

BUG=webm:1256

deblock: missing const on extern const.

postproc - move filling of noise buffer to vpx_dsp.

Fix encoder crashes for odd size input

clean-up vp9_intrapred_test

remove tuple and overkill VP9IntraPredBase class.

postproc: noise style fixes.

gtest-all.cc: quiet an unused variable warning

under windows / mingw builds

vp9_intrapred_test: follow-up cleanup

address few comments from ce050afaf3e288895c3bee4160336e2d2133b6ea

Change-Id: I3eece7efa9335f4210303993ef6c1857ad5c29c8
2016-07-18 10:31:10 -07:00
Yaowu Xu 06c297bd1c Merge "Merge branch 'master' into nextgenv2" into nextgenv2 2016-07-15 04:45:53 +00:00
Yaowu Xu 6fe07a207b Merge branch 'master' into nextgenv2
Change-Id: Ia3c0f2103fd997613d9f16156795028f89f63265
2016-07-14 16:05:48 -07:00
Jingning Han a387b19619 Fix highbd obmc_variance unit test
Fix the compiling errors in highbd obmc_variance unit test.

Change-Id: Id1bdfd50aeaff996e54067d5e9b369a5fd2d87a8
2016-07-14 10:12:03 -07:00
Geza Lore ebc2d34cd9 Add SSE4.1 vpx_obmc_variance* implementations and cosmetics
Speedup for these functions: 4x
Also include some cosmetic changes to SAD functions

Change-Id: I344c32c795492507ae08742f52d035a13f583799
2016-07-12 21:04:46 -07:00
Pascal Massimino 6de0e97d97 Merge "Clean up FunctionEquivalenceTest." into nextgenv2 2016-07-13 03:09:52 +00:00
Geza Lore a3f7ddc347 Clean up FunctionEquivalenceTest.
remove use of tuple in favor of struct.

Change-Id: If3b1aa5c2fc3cfe1446fff7a8fd270f2ca85fedf
2016-07-12 17:01:19 -07:00
Yi Luo fde48c980a Merge "HBD convolution filtering (10/12 taps) SSE4.1 optimization" into nextgenv2 2016-07-12 19:28:48 +00:00
Yi Luo 8cacca73bf HBD convolution filtering (10/12 taps) SSE4.1 optimization
- For experiment EXT_INTERP under high bit depth.
- Add unit test to verify bit-exact.
- Speed performance improvement:
  On Xeon E5-2680, park_joy_1080p_12.y4m, 50 frames, encoding time
  drops from 6682503 ms to 5390270 ms.

Change-Id: Iea4debf5414f3accf1eb5672abeab56a0539ac77
2016-07-12 10:13:30 -07:00
Geza Lore c804e0df05 Cleanup obmc_sad function prototypes.
Name 'wsrc', 'mask' and 'pre' explicitly, rather than
using 'b', 'm' and 'a'.

Change-Id: Iaee6d1ac1211b0b05b47cf98b50570089b12d600
2016-07-12 13:23:33 +01:00
Debargha Mukherjee 6bbadfb303 Merge "Improve vpx_blend_* functions." into nextgenv2 2016-07-11 19:30:04 +00:00
Geza Lore bfa59b4a5f Improve vpx_blend_* functions.
- Made source buffers pointers to const.
- Renamed vpx_blend_mask6b to vpx_blend_a64_mask. This is more
  indicative that the function does alpha blending. The 6, or 6b
  suffix was misleading, as the max mask value (64) does not fit into
  6 bits.
- Added VPX_BLEND_* macros to use when needing to blend scalars.
- Use VPX_BLEND_A256 in combine_interintra to be more explicit about
  the operation being done.
- Added versions of vpx_blend_a64_* which take 1D horizontal/vertical
  masks directly and apply them to all rows/columns
  (vpx_blend_a64_hmask and vpx_blend_a64_vmask). The SSE4.1 optimzied
  horizontal version now falls back on the 2D version. This can be
  improved upon if it show up high enough in a profile.
- All vpx_blend_a64_* functions now support block sizes down to 1x1
  (ie: a single pixel). This is for usage convenience. The SSE4.1
  optimized versions fall back on the C implementation if
  w <= 2 or h <= 2. This can again be improved if it becomes hot code.

Change-Id: I13ab3835146ffafe3e1d74d8e9cf64a5abe4144d
2016-07-11 19:05:17 +01:00
Pascal Massimino e5fb2d4e93 remove ROUNDZ_* macros in favor of just ROUND_* ones
Change-Id: I263088be8d71018deb9cc6a9d2c66307770b824d
2016-07-11 06:27:41 -07:00
Geza Lore 1178f71d99 Merge "Fix unused warning without ext-interp" into nextgenv2 2016-07-11 11:29:17 +00:00
Debargha Mukherjee 72ef6d7704 Refactor and clean up on blend_mask6
Change-Id: Ie9188471e7dc07ab9c95b22f258b1662e895c533
2016-07-08 15:02:57 -07:00
Geza Lore bb5059ff9b Fix unused warning without ext-interp
Change-Id: Ibb63c492eb8278d115262b8fc3cbc761c406b107
2016-07-08 15:48:02 +01:00
Geza Lore e6f8c17ac5 Remove various testing utilities.
test/assertion_helpers.h
test/randomise.{cc,h}
test/snapshot.h

Modfiy blend_mask6_test.cc not to rely on these.

Change-Id: I88b8933fe0a729a606797e5cd421795a544c612d
2016-07-07 16:22:07 +01:00
Debargha Mukherjee fabc0ed7ad Merge "Reinstate tests for wedge partition selection optimizations." into nextgenv2 2016-07-07 05:55:07 +00:00
Debargha Mukherjee 9303d428a2 Merge "Add tests for vpx_sum_squares_i16." into nextgenv2 2016-07-07 05:54:45 +00:00
Yue Chen c7a92f2cad Merge "Add SSE4.1 vpx_obmc_sad* implementations." into nextgenv2 2016-07-07 01:12:20 +00:00
Geza Lore aacdf98c9a Add SSE4.1 vpx_obmc_sad* implementations.
Speedup for these functions: 4x

Change-Id: I21baa04f53c6ab308ea3edf3ebacc62970e97454
2016-07-06 19:46:13 +00:00
Geza Lore 471362f61f Add tests for vpx_sum_squares_i16.
Change-Id: I529c34d5bfa85719cb6499a9a3c9d907eccccd56
2016-07-06 15:14:59 +01:00
Geza Lore 2791d9db1e Reinstate tests for wedge partition selection optimizations.
This reinstates the tests from commit
efda2831e5 with the appropriate
fixes for 32 bit x86 builds.

Change-Id: Ib331906c5b448ca964895ee9cbfd4266f67d1089
2016-07-06 15:09:46 +01:00
James Zern 6bbb8b79eb tests: remove redundant round() definition
use vpx_ports/msvc.h for compatibility

BUG=b/29583530

Change-Id: I9433d8586cd0b790e7f4d697304298feafe801f1
(cherry picked from commit 0a64929f19cc1ce89f993aa5c9d61a29679eb961)
2016-06-29 17:11:11 -07:00
Sarah Parker cbb7c65794 Merge "Fix compiler warnings in vp10_convolve_optimz_test.cc" into nextgenv2 2016-06-29 02:03:10 +00:00
Sarah Parker 9576374952 Fix compiler warnings in vp10_convolve_optimz_test.cc
Change-Id: I11b717e1652dff440a54f6977527d544b0c5ed29
2016-06-28 17:13:03 -07:00
Alex Converse 0dc56b6a15 ethread_test: Remove vp10 as test parameter.
Change-Id: I043418cde5a2562520ff37cdf81436abc2c9821a
2016-06-28 14:32:15 -07:00
Yi Luo dd2064a0ac Merge "Fix bugs in convolution filter optimization" into nextgenv2 2016-06-27 21:33:45 +00:00
Yi Luo 8404253f81 Fix bugs in convolution filter optimization
- Fix the over-writing bug in horizontal filtering as width = 2.
- Fix 10-tap vertical filtering which no longer reads one row of
  pixel above the block.
- Fix 10-tap filter zero padding.
- Encoder speed slow down ~4.0%, compared to,
  81ad953 Convolution vertical filter SSSE3 optimization

Change-Id: I9bb294a4529300081c29bf284e6bc6eb081cc536
2016-06-27 10:23:38 -07:00
Debargha Mukherjee 9f2167aede Merge "Turn on ActiveMapRefreshTest for Vp10" into nextgenv2 2016-06-25 00:32:21 +00:00
Debargha Mukherjee cf0cdfc55e Turn on ActiveMapRefreshTest for Vp10
Also reduce number of frames coded for VP10.

Change-Id: I7de908861620b6f4f08513516110fd584660d994
2016-06-24 12:55:03 -07:00
Yi Luo 2003cd8011 Merge "Change register loading to fix stack overflow issue" into nextgenv2 2016-06-24 18:47:21 +00:00
Yi Luo 08184e32de Change register loading to fix stack overflow issue
- Use _mm_loadl_epi64 instead of _mm_loadu_si128 for
  uint16_t temp2[4 * 4] buffer.
- Refer to:
  d0de89a remove vpx_highbd_1[02]_sub_pixel_variance4x4_sse4_1
BUG=webm:1242

Change-Id: Ieff555c8dd8070937f27f4ec8535b77e1ed5b8b2
2016-06-24 10:39:49 -07:00
Yi Luo 81ad95363a Convolution vertical filter SSSE3 optimization
- Apply 8-pixel vertical filtering direction parallelism.
- Add unit tests to verify bit exact.
- Encoder speed improves ~29% (enable EXT_INTERP) on Xeon E5-2680.
- Combinational cycle count of vp10_convolve() drops from 26.06%
  to 6.73%.

Change-Id: Ic1ae48f8fb1909991577947a8c00d07832737e57
2016-06-23 12:56:47 -07:00
Yi Luo f26a48bd52 Fix input buffer initialization in convolution filter test
Change-Id: I70c0da96a81463d752e88b134b6fde012bd5823d
2016-06-22 11:46:16 -07:00
James Zern 5d14586392 Merge "remove vpx_highbd_1[02]_sub_pixel_variance4x4_sse4_1" into nextgenv2 2016-06-22 03:13:31 +00:00
Geza Lore 7de2ba3eae Fix false uninitialized warnings (GCC 5+).
Change-Id: Ia00c754ddaf22bb7f1dfcd20106db6293bf4b070
2016-06-21 12:54:17 +01:00
Yi Luo f1a50db2d1 Merge "Convolution horizontal filter SSSE3 optimization" into nextgenv2 2016-06-20 20:06:02 +00:00
Yi Luo 229690a95c Convolution horizontal filter SSSE3 optimization
- Apply signal direction/4-pixel vertical/8-pixel vertical
  parallelism.
- Add unit test to verify the bit exact result.
- Overall encoding time improves ~24% on Xeon E5-2680 CPU.

Change-Id: I104dcbfd43451476fee1f94cd16ca5f965878e59
2016-06-20 11:10:30 -07:00
Debargha Mukherjee dc5431ad4b Merge "Turn on AqSegment tests for VP10" into nextgenv2 2016-06-20 16:47:13 +00:00
James Zern 4d9e876b44 realtime_test: remove decoded frame count check
decoding is done if the decoder is available, with errors handled
accordingly. the encoded frame count should be sufficient for this test.

+ remove HandleDecodeResult() as it's redundant given the base
  implementation

BUG=webm:1233

Change-Id: I513c1c3475c58a746f4df627491bdc392fe21416
2016-06-18 11:05:55 -07:00
James Zern d0de89a12a remove vpx_highbd_1[02]_sub_pixel_variance4x4_sse4_1
these cause ASan errors VP10/EndToEndTestLarge.EndtoEndPSNRTest

BUG=webm:1242

Change-Id: I0334e3b255b14e18f61970c3721ae748dc79727b
2016-06-17 19:46:20 -07:00
Geza Lore 7172e97abe Re-enable ActiveMapTest for VP10
Change-Id: I030fdde966b9911712eca131d095015afd9b0d8a
2016-06-17 20:33:58 +01:00
Tom Finegan 5a9f21db54 Output frames in first pass for VPX_DL_REALTIME.
Since combining VPX_DL_REALTIME with VPX_RC_FIRST_PASS is basically
nonsense, ignore the user's pass setting when this happens and
behave as if the requested encode is a single pass encode.

BUG=webm:1233

Change-Id: I5ee4c4e5838c4ca6d24988890aae490b10826db2
2016-06-17 11:25:55 -07:00
Yaowu Xu 0cb7f545ad Fix ubsan warning: test/datarate_test.cc
BUG=webm:1219

Change-Id: I48470a885cd64a60636a982cd68165c41a702306
2016-06-16 11:25:21 -07:00
Zoe Liu 5201280f70 Disable the unit test of ArfFreq for BIDIR_PRED
The test in arf_freq assumes any no-show frame as ALTREF_FRAME and
then calculate the minimum run between two consecutive ALTREF_FRAME's
based on this assumption. As BWDREF_FRAME is also a no-show frame and
the minimum run between two consecutive BWDREF_FRAME's may vary
between 1 and any arbitrary positive number as long as it does not
exceed the golden frame group interval, this test does not apply to
the experiment of BIDIR_PRED.

Change-Id: I70efb2c691fdc18601dbb8a7735ac2f27817e75a
2016-06-16 09:45:57 -07:00
Zoe Liu a0d122079d Merge "Fix the superframe unit test for BIDIR_PRED" into nextgenv2 2016-06-16 16:15:07 +00:00
Debargha Mukherjee 567ee69b24 Turn on AqSegment tests for VP10
Also shortens the test and changes some of the parameters.

Change-Id: Ieda4aeffa55550fbb9e4235f735c383ef6baf32c
2016-06-16 07:26:39 -07:00
Debargha Mukherjee f9fc898d56 Merge "Split some slower tests based on cpu-used" into nextgenv2 2016-06-16 11:46:36 +00:00
Debargha Mukherjee 6abddf37f8 Split some slower tests based on cpu-used
Change-Id: Idf84475fe06666d5c73c9d86dfc5c23bef170086
2016-06-15 23:14:51 -07:00
James Zern 94e84bbc07 cosmetics,test.mk: fix a typo
Change-Id: Ib74a494e1cf50a356f51e8185e19ca66fcb896a2
2016-06-15 20:33:04 -07:00
James Zern fba6f748e8 rename vp9_end_to_end_test.cc -> end_to_end_test.cc
this is shared between vp9/10

BUG=webm:1235

Change-Id: I2f44b15268a33453a1c1e0c691d4fc1fc12d0263
2016-06-15 18:30:22 -07:00
James Zern 2710f76692 vp9_end_to_end_test: enable in vp10-only builds
this file is shared between vp9 & vp10; this makes it available in the
presence of --disable-vp9

BUG=webm:1235

Change-Id: Iaf060c3c09afd2c7df69995b0c01589f78d4945e
2016-06-15 18:28:30 -07:00
Zoe Liu 1aa674b588 Fix the superframe unit test for BIDIR_PRED
Change-Id: I2ef8e479893403581711abc020509c6863c2035d
2016-06-15 17:18:26 -07:00
hui su 72d4890caf Add vp9 encoder API VP9E_GET_LEVEL to provide bitstream level
Change-Id: I1ef3df0192491035728fe9d5eb25cc66dc2965de
2016-06-15 12:53:28 -07:00
Sarah Parker 50c5921517 Add EndToEndTestLarge for VP10 non-highbitdepth
The current test case is only run for vp9 and vp10 when HBD
is enabled. This was mistakenly removed in:

d53f9a3 Enable VP10 HBD PSNR checking unit test

Change-Id: I88b8168ad1efd805d759238a037653a2901bf50d
2016-06-15 19:45:24 +00:00
Johann c516dd67bc neon hadamard 16x16
Runs about twice as fast as C

BUG=webm:1027

Change-Id: I6760d99f4e22259439ca35d746194b12a81bfa71
2016-06-14 19:23:38 +00:00
Johann 32ff4906da hadamard 16x16 test
BUG=webm:1027

Change-Id: Ibe58781905b372b9fe29dace39b4bfdd33fd0f83
2016-06-13 19:09:56 -07:00
James Zern 05bd964adc Merge "Revert "Add 1D version of vpx_sum_squares_i16"" into nextgenv2 2016-06-14 00:04:57 +00:00
James Zern a8ba2eb3d3 active_map_refresh_test: fix missing file w/vp10-only
Change-Id: I6413b7622a3c8524ec0409e087cf7c92f79e4f2d
2016-06-11 09:49:02 -07:00
Alex Converse 11ce75968f Merge "Turn on ActiveMapTest speeds [0,5) with all experiments." into nextgenv2 2016-06-10 21:52:57 +00:00
James Zern 5e831c548f Revert "Add 1D version of vpx_sum_squares_i16"
This reverts commit f19700fe52.

This crashes in SSE2/SumSquares2DTest.RandomValues/0 under x86 due to
alignment issues

Change-Id: I135d83ba6a7894c09d7c7a139b7eaf876416b40c
2016-06-09 23:42:15 -07:00
James Zern 667db87a1b Merge "Revert "Optimize wedge partition selection."" into nextgenv2 2016-06-10 03:49:29 +00:00
Angie Chiang 95340fccb3 Revert "Optimize wedge partition selection."
This reverts commit efda2831e5.

This commit causes segmentation fault at SSE2/SumSquares2DTest.RandomValues/0

Change-Id: I171937e4daf6f15323e8206418773deb03bd8c53
2016-06-09 19:17:37 -07:00
Sarah Parker 9d924a0c4a Fix vp9_end_to_end_test for vp10 HBD
This test is failing when no experiments are turned on. PSNR is
31.96 when the threshold is 32.

broken since:
0d6980d Remove swap buffer speed feature

Change-Id: I3c29815b40d5282c37f52f4345b56992f8558b2e
2016-06-09 18:47:47 -07:00
Alex Converse 587b8a11d0 Turn on ActiveMapTest speeds [0,5) with all experiments.
Change-Id: I7da9e6a85648aa69e5e20d825b717d51e3c6809c
2016-06-09 13:51:00 -07:00
Alex Converse d279cadbe0 Port active map / cyclic refresh fixes to VP10.
Bring commits 575e81f and 3d6b8a6 to VP10. These changes predate
the creation of the active map cyclic refresh test.

BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1224

Change-Id: I3559b6933ffa5649926a4b214e45ed0fae523a25
2016-06-09 16:52:43 +00:00
James Zern 95d2dc8981 fdct8x8_test: fix unsigned overflow
the difference between src and dst will be signed, the error will be
unsigned.
quiets -fsanitize=integer:
unsigned integer overflow: 4294967295 * 4294967295

Change-Id: I580813093ee46284fde7954520dfcb1188f79268
2016-06-08 17:33:34 -07:00
James Zern 06c6e4cbf6 fdct4x4_test: fix unsigned overflow
the difference between src and dst will be signed, the error will be
unsigned.
quiets -fsanitize=integer:
unsigned integer overflow: 4294967295 * 4294967295

Change-Id: I502fd707823c4faaa7f587c9cc0312f057e04904
2016-06-08 17:29:02 -07:00
Angie Chiang d9410d2d43 Merge "Move #if out of TEST_P in vp10_fwd/inv_txfm2d_test.cc" into nextgenv2 2016-06-07 22:02:28 +00:00
Alex Converse 7e26f01342 Turn ActiveMapTest back on.
If it's creating problems with some experiments, disable it under the
actual conditions where it doesn't work and file a bug.

Change-Id: Iab9f4bfe42ea926d49d371918da25f9a8938a20f
2016-06-07 11:59:15 -07:00
Debargha Mukherjee 13155e7725 Merge "Optimize wedge partition selection." into nextgenv2 2016-06-07 09:50:13 +00:00
Debargha Mukherjee 24a04f9048 Merge "Fix decoder crash with supertx" into nextgenv2 2016-06-07 09:46:48 +00:00
Angie Chiang f67196b2ed Move #if out of TEST_P in vp10_fwd/inv_txfm2d_test.cc
Change-Id: I1d5b2408f27a1e277574c2238f1e49e884596309
2016-06-06 12:45:54 -07:00
Geza Lore efda2831e5 Optimize wedge partition selection.
We can optimize wedge partition selection by pre-computing the
residuals of the 2 underlying predictors, and then blend these
to compute the sse of the compound predictor, without actually
having to compute and subtract the compound predictor.

Similarly we can pre-compute a proxy array which we can use to
cheaply check which mask sign would have lower sse.

Details are in wedge_utils.c.

Mathematically these are equivalence transformations, but due to the
finite precision the encoder output will be perturbed, though on
average this should make 0% difference.

ext-inter gains about ~4.5% speedup.

Change-Id: Ib2657c3209ae161b4090b58b4b6c392641bf2792
2016-06-06 14:43:10 +01:00
Geza Lore 6c4306c27d Fix decoder crash with supertx
xd->plane[0].n4_h and xd->plane[0].n4_w are not set at that point
when using supertx.

While this fixes the immediate crash described in the referenced
bug report, there are still issues in the ref-mv experiment that
causes these tests to fail, so they are kept disabled.

BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1230

Change-Id: Ibf8ef02847a903f8d10e6be28e16694db10c75af
2016-06-06 09:58:11 +01:00
James Zern e34e684059 Merge changes If31d36c8,I10b947e7
* changes:
  vpx_dsp,add_noise: remove mmx implementation
  vpx_dsp: remove mmx variance implementations
2016-06-04 00:56:06 +00:00
Linfeng Zhang b90166665f Merge "Slow pshufb removal in 3 intra prediction functions." 2016-06-03 16:35:14 +00:00
Geza Lore f19700fe52 Add 1D version of vpx_sum_squares_i16
Change-Id: I0d7bda2fe6f995a9e88a9f66540b4979b3f7fab1
2016-06-03 09:34:55 +01:00
Geza Lore 5a69ee0e11 Move template specializations into .cc from .h
Change-Id: I6d8775c1fa228fde25016a401e3c22a8e3da42f9
2016-06-03 09:34:55 +01:00
James Zern 462e0ff88b vpx_dsp,add_noise: remove mmx implementation
a sse2 version exists, this is a reasonable modern baseline.

Change-Id: If31d36c8412d25b53f41b4a93cf02f46802c0c33
2016-06-02 23:51:22 -07:00
James Zern eea8ea88ab vpx_dsp: remove mmx variance implementations
there are sse2 equivalents for all remaining variance implementations

Change-Id: I10b947e73fc0067688181f819b59e47966bec3d2
2016-06-02 23:46:16 -07:00
Linfeng Zhang ad0646cb84 Slow pshufb removal in 3 intra prediction functions.
Replaced vpx_d45_predictor_4x4_ssse3(), vpx_d45_predictor_8x8_ssse3()
and vpx_d207_predictor_4x4_ssse3() with
created vpx_d45_predictor_4x4_sse2(), vpx_d45_predictor_8x8_sse2()
and vpx_d207_predictor_4x4_sse2() respectively.
It's mostly neutral or slightly worse than ssse3 in good cases and
better than ssse3 in the bad cases (but still worse than using the mmx
regs).

Change-Id: Ib0237ceb71d2c57b8a93fd3170330cfed9d56bdd
2016-06-02 10:55:58 -07:00
Alex Converse 380c4ee32d Merge "segmentation: Don't use uninitialized probability data." into nextgenv2 2016-06-01 17:50:37 +00:00
Yaowu Xu 6382727dc5 Fix UBSAN/IOC errors
1. test/dct16x16_test.cc
2. test/dct32x32_test.cc
3. test/fdct8x8_test.cc

BUG=webm:1225

Change-Id: I9c9315fbd65ddb3b44f688e01ba265fd22192198
2016-06-01 16:01:18 +00:00
Alex Converse 7a6cb59dbb segmentation: Don't use uninitialized probability data.
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1224

Change-Id: I17b76fcf0d8c191850350d5aa50dcc007b8b0cdc
2016-05-31 16:42:29 -07:00
James Zern f6ac6cf5bd Merge "acm_random,Rand9Signed: correct cast" 2016-05-27 18:32:06 +00:00
Linfeng Zhang 2ab7b9a6c9 Merge "Upgrade fwht4x4_mmx() to fwht4x4_sse2() for vp9 and vp10." 2016-05-27 17:51:35 +00:00
James Zern 13d48c4267 acm_random,Rand9Signed: correct cast
convert the random value to int16 before subtracting 256 from it; quiets
a ubsan (sanitize=integer) warning

BUG=webm:1225

Change-Id: Ibc2c5a21f30e112bd6c180f7d6a033327c38d0df
2016-05-27 10:33:56 -07:00
Linfeng Zhang af7fb17c09 Upgrade fwht4x4_mmx() to fwht4x4_sse2() for vp9 and vp10.
Function level timing test shows about 27% time saving on
a Xeon E5-2680 v2 desktop.

Rename vp9_dct_sse2.c to vp9_dct_intrin_sse2.c for vp9 and
rename dct_sse2.c to dct_intrin_sse2.c for vp10 to avoid
duplicate basenames.

Actually vp9_fwht4x4_mmx/sse2() and vp10_fwht4x4_mmx/sse2()
are identical. TODO: They should be unified later if there is
no intention to keep a duplicate.

Change-Id: I3e537b7bbd9ba417c606cd7c68c4dbbfa583f77d
2016-05-27 09:51:16 -07:00
Linfeng Zhang 0ba9b299e9 Merge "Upgrade vpx_lpf_{vertical,horizontal}_4 mmx to sse2" 2016-05-27 15:47:28 +00:00
James Zern 5d237f0986 vp10_inv_txfm2d_test: fix memory leak
input_, ref_input_ and output_ were being allocated with new[] followed
by vpx_memalign, remove the former

Change-Id: Ia16d0f9b9317042a24445095ad3c284f4e7bb481
2016-05-26 20:04:59 -07:00
Linfeng Zhang 4b5e462d08 Upgrade vpx_lpf_{vertical,horizontal}_4 mmx to sse2
Followed the code style of other lpf fuctions.
These 2 functions put 2 rows of data in a single xmm register,
so they have similar but not identical filter operations,
and cannot share the same macros.

Change-Id: I3bab55a5d1a1232926ac8fd1f03251acc38302bc
2016-05-26 14:55:18 -07:00
Scott LaVarnway 9d24fe60f1 Merge "Code clean of sub_pixel_variance4xh -- 2" 2016-05-26 13:20:24 +00:00
Yi Luo 469d002f4e Merge "Integrate HBD inverse HT flip types sse4.1 optimization" into nextgenv2 2016-05-25 21:35:14 +00:00
Marco 75d551783d vp9: Add datarate test for 1 pass VBR mode.
Existing tests are only for CBR mode.

Change-Id: Ie3b2cd46236457748e2650901d1a347a730f38af
2016-05-25 14:20:30 -07:00
Yi Luo bfe4c0ae07 Integrate HBD inverse HT flip types sse4.1 optimization
- tx_size: 4x4, 8x8, 16x16.
- tx_type: FLIPADST_DCT, DCT_FLIPADST, FLIPADST_FLIPADST,
  ADST_FLIPADST, FLIPADST_ADST.
- Encoder speed improvement:
  park_joy_1080p_12: ~11%, crowd_run_1080p_12: ~7%.
- Add unit test cases for bit-exact against C.

Change-Id: Ia69d069031fa76c4625e845bfbfe7e6f6ed6e841
2016-05-25 12:32:10 -07:00
James Zern 008f27e70a Merge "add vp10 ActiveMap/ActiveMapRefreshTest" into nextgenv2 2016-05-25 19:05:02 +00:00
Yi Luo 28cdee448d HBD inverse HT 8x8 and 16x16 sse4.1 optimization
- Covers tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST.
- Encoding speed improves ~27% on crowd_run_1080p_12.
- Merge 4x4, 8x8, 16x16 unit tests in one test file.

Change-Id: I058ef5254d068a9523a826480c78ebbdd231824c
2016-05-24 12:55:30 -07:00
Scott LaVarnway a4f3751be5 Code clean of sub_pixel_variance4xh -- 2
Replace MMX with SSE2.

Change-Id: Id8482d2589131f9427e7f36bc64413f058caf31f
2016-05-24 04:44:05 -07:00
Debargha Mukherjee fb65f9b54b Merge "Add optimized vpx_blend_mask6" into nextgenv2 2016-05-23 23:43:52 +00:00
Geza Lore a661bc87c4 Add optimized vpx_blend_mask6
This is to replace vp10/common/reconinter.c:build_masked_compound.
Functionality is equivalent, but the interface is slightly more
generic.

Total encoder speedup with ext-inter: ~7.5%

Change-Id: Iee18b83ae324ffc9c7f7dc16d4b2b06adb4d4305
2016-05-23 16:28:58 +01:00
Jingning Han 8c9f6c5531 Merge "Clear redundant condition check from vp10_ext_tile_test.cc" into nextgenv2 2016-05-20 22:10:41 +00:00
James Zern e4bdbd3c0b Merge "Revert "Code clean of sub_pixel_variance4xh"" 2016-05-20 19:11:06 +00:00
Yaowu Xu 0924bcd824 Fix build when vp8 is disabled
Change-Id: Ie1765f086b10d0f7c4d72961d238dfe0d6056dc2
2016-05-20 11:33:07 -07:00
James Zern 3fb55d24e8 Revert "Code clean of sub_pixel_variance4xh"
This reverts commit 2468163e07.

causes valgrind errors for overread of buffer in SubpelVarianceTest

Change-Id: I448e52c76f815ac199305b71f7d169f2bc167679
2016-05-19 23:37:27 -07:00
James Zern 84e3639454 Revert "Extend the external fb interface to allocate individual planes."
This reverts commit 6dd7f2b50a.

conversion warnings, crashes in 32-bit builds

Change-Id: I529ead34cd93c862dd07c9a29d8542dda2fc20ea
2016-05-19 23:33:51 -07:00
Jingning Han 7488ae014b Merge "Remove unused private variables from vp10_inv_txfm2d_test.cc" into nextgenv2 2016-05-20 01:23:25 +00:00
Daniele Castagna 04fdbdc5ca Merge "Extend the external fb interface to allocate individual planes." 2016-05-19 18:01:59 +00:00
Jingning Han e816401a81 Clear redundant condition check from vp10_ext_tile_test.cc
Change-Id: I74e9df9e314e49b931c23a81d14f5a9e143b0b7d
2016-05-19 09:31:18 -07:00
Jingning Han 7d5ccccd47 Remove unused private variables from vp10_inv_txfm2d_test.cc
Change-Id: Ie933d754aca649bdf17cd679b9a31239bf413b63
2016-05-19 09:21:13 -07:00
Yi Luo 346d2449f0 Fix to conform Google's coding convention
- Confirm input coeff buffer is 16-byte aligned.
- sizeof() prefer variable name instead of type.
- Fix function name (Capital first letter then Pascal case).
- Long base class name uses a newline (with colon and 4 space indent).
- Remove a unnecessary reference function variable.
- Method declaration precedes variable declaration in class definition.

Change-Id: I317f7e679926b5219f58c5f7d14512e94985e7fe
2016-05-18 18:15:53 -07:00
James Zern 146ccd304f Merge "Code clean of sub_pixel_variance4xh" 2016-05-18 23:18:35 +00:00
Daniele Castagna 6dd7f2b50a Extend the external fb interface to allocate individual planes.
Change-Id: I73e1b9ea6f4c76ae539e2b3292ee4c751d9c7de4
2016-05-18 16:20:18 -04:00
Johann Koenig 36b610d8c1 Merge "neon hadamard 8x8" 2016-05-18 20:11:16 +00:00
Angie Chiang 6f28581b26 Turn on flip in inverse txfm2d
Fix build failed
Reduce txfm test time

Change-Id: Ieaf6b27f3a272d06286f817f01230413fa8adcf6
2016-05-18 11:26:57 -07:00
Scott LaVarnway 2468163e07 Code clean of sub_pixel_variance4xh
Replace MMX with SSE2.

Change-Id: Ia8fcba755952804e347d7d7736f57d1f90c988a0
2016-05-18 04:24:41 -07:00
Yi Luo 1d307368a9 Integrate HBD row/column flip fwd txfm SSE4.1 optimization
- Integrate 5 flip transform types for each 4x4, 8x8, and 16x16
  block, for experiment, EXT_TX.
- Encoder speed improves about 12%-15%.
- Update the unit tests for bit-exact result against C.

Change-Id: Idf27c87f1e516ca5b66c7b70142477a115404ccb
2016-05-18 03:48:01 +00:00
Yi Luo ceabb00704 Merge "HBD inverse HT 4x4 SSE4.1 optimization" into nextgenv2 2016-05-16 21:15:08 +00:00
Johann 9b54e812f7 neon hadamard 8x8
Runs about 30% faster than the C

BUG=webm:1021

Change-Id: I6809d6d84c3077ab619c53298296950e976bdaba
2016-05-16 11:58:02 -07:00
hui su cafbf63d30 Add level test for VP9
Change-Id: I99f50bdd5af3f64a029c2f5f6f5fb1ff45bad67e
2016-05-16 09:54:23 -07:00
Angie Chiang fdaad9f673 Refactor and add flip unit test to vp10_inv_txfm2d_test.cc
Change-Id: I6aa75c66429a0178852cf8df88f16eaa8e36b629
2016-05-13 12:30:51 -07:00
Angie Chiang 6a75253311 add unit test for highbd flip transform
Change-Id: I368d365ee0f58373bc399b615febd790addb2c36
2016-05-13 12:20:06 -07:00
Angie Chiang 716f1bd46c Refactor vp10_fwd_txfm2d_test.cc
Change-Id: Ibaf7b00bfe247df3e665ea3a0241667cb130e16c
2016-05-13 12:13:31 -07:00
Yi Luo a3a69b400c HBD inverse HT 4x4 SSE4.1 optimization
- Tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST.
- Encoder overall instruction count drops 2.91%.
- Decoder overall instruction count drops 1.01%.
- Add unit test to test bit-exact result against C.

Change-Id: I908c9e0e5106c58f67dd72d28760e6c9ce54278e
2016-05-13 12:08:43 -07:00
Tom Finegan 9a56a5ea18 convolve_test: Fix high bit depth IOC runtime errors.
Add a cast.

BUG=webm:1225

Change-Id: I34ea18ee816569485c1f1046a81fd2a0ce527ac8
2016-05-13 09:42:58 -07:00
Jingning Han 5b573d650a Fix vp10_inv_txfm2d.round_trip test failure
Avoid accessing transform type that is not 2D-DCT if the transform
size is 64x64. This fixes an assert failure in this unit test.

Change-Id: I0dee865ea0925f5743b8a25c2f90eb6522b4d272
2016-05-12 16:09:02 -07:00
Yunqing Wang e7ebe26dd5 Merge "Add decoder APIs and unit tests in tile-coding experiment" into nextgenv2 2016-05-12 19:05:58 +00:00
Tom Finegan 9d7eaf0046 Merge "twopass_encoder: Add frame limit argument." 2016-05-12 16:26:29 +00:00
Tom Finegan 10c7ea4be8 Merge "simple_encoder: Add a frame limit argument." 2016-05-12 14:55:08 +00:00
Yunqing Wang 8e5e338727 Add decoder APIs and unit tests in tile-coding experiment
In the tile-coding experiment,
1. In tile decoder, added 2 set control APIs:
   VP10_SET_DECODE_TILE_ROW and VP10_SET_DECODE_TILE_COL. It allowed
   users to set the range of decoding at frame level.
2. Added a unit test while tile-coding experiment is on. It tested
   both tile encoder and decoder to make sure the encoded frame
   can be decoded as a whole frame or as independent tiles.

Change-Id: I73fd0632b685047cb9376008127cde72efa3fb2b
2016-05-11 16:47:26 -07:00
James Zern 18112f6724 add vp10 ActiveMap/ActiveMapRefreshTest
currently disabled as they result in ASan errors

BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1224

Change-Id: I9c80910adc5dc2cd6eccb3030d33043df53e7ec5
2016-05-11 16:33:29 -07:00
Linfeng Zhang 2f55beb355 Merge "remove mmx variance functions" 2016-05-11 22:21:23 +00:00
Tom Finegan 7d6edc3ddd simple_encoder: Add a frame limit argument.
- Add frame limit argument.
- Make all arguments required.
- Enable the VP9 simple encoder test.

Change-Id: I11d228b358ff90c60ea92e02760cb476434ea571
2016-05-11 14:52:34 -07:00
Tom Finegan 9d47341a4c twopass_encoder: Add frame limit argument.
- Remove twopass_encoder test TODO re frame limit.
- Enable VP9 twopass_encoder test.

Change-Id: I0649f15aabef79a63891e997fd20b212af5672e6
2016-05-11 14:50:03 -07:00
Linfeng Zhang d0ffae825d remove mmx variance functions
there are sse2 equivalents which is a reasonable modern baseline
Removed mmx variance functions:
vpx_get_mb_ss_mmx()
vpx_get8x8var_mmx()
vpx_get4x4var_mmx()
vpx_variance4x4_mmx()
vpx_variance8x8_mmx()
vpx_mse16x16_mmx()
vpx_variance16x16_mmx()
vpx_variance16x8_mmx()
vpx_variance8x16_mmx()

Change-Id: Iffaf85344c6676a3dd337c0645a2dd5deb2f86a1
2016-05-11 12:39:42 -07:00
Linfeng Zhang d0e687bf8c remove mmx sad functions
there are sse2 equivalents which is a reasonable modern baseline

Change-Id: Ibbe536a5ad1c2cccef6bdcc75c13b3dde35a56ba
2016-05-11 10:50:04 -07:00
Angie Chiang 1954fa390f Add flip option for vp10_fwd_txfm2d_#x#_c
Will add unit test to test/vp10_fwd_txfm2d_test.cc later

Change-Id: I626900c67fca4eee2ad0ae1828188527a04a5362
2016-05-10 18:14:57 -07:00
Angie Chiang b5331459c2 Remove vp10_fwd_txfm2d_sse4_test.cc
Functions vp10_fwd_txfm2d_#x#_sse4_1 tested in this file
will be tested in vp10_fhts#x#_test.cc
Remove this to avoid duplication

Change-Id: Iaf21ab85b9a164fcf2a4574b3e13217e43b6255e
2016-05-10 17:06:40 -07:00
Debargha Mukherjee 03009b2e9e Merge "Use multiple tiles in V10 tile independence tests." into nextgenv2 2016-05-10 18:01:08 +00:00
Yi Luo 73d28a4068 Merge "Change inverse HT function argument from TXFM_2D_CFG* to int" into nextgenv2 2016-05-10 15:38:11 +00:00
Geza Lore d29062c4da Use multiple tiles in V10 tile independence tests.
Change-Id: I6e5c1cbe1bf40d2f7a0d8bd821cac8ce626ce3b8
2016-05-10 13:09:54 +01:00
Geza Lore e0dcab9d0c Print mismatch location for failing tests.
Change-Id: Ied6929bf5ac41ca25ee4df4ef19edada5bf1e8cd
2016-05-10 09:53:29 +01:00
Yi Luo cd8cfb8675 Change inverse HT function argument from TXFM_2D_CFG* to int
This change has no performance impact. It prepares the proper
function interface for better performance optimization.

Change-Id: I12e2f2deaf7f3adc603de0a74852116468c762f6
2016-05-09 18:34:16 -07:00
Yi Luo 6f3e71606f Merge "HBD hybrid transform 16x16 SSE4.1 optimization" into nextgenv2 2016-05-09 23:58:05 +00:00
Jingning Han 6b8acc2868 Merge "Fix dual filter type for high bit-depth" into nextgenv2 2016-05-09 22:06:09 +00:00
Tom Finegan d4fccb8f41 Merge "convolve_test: Fix IOC runtime errors." 2016-05-09 21:24:11 +00:00
Tom Finegan 6042d68851 convolve_test: Fix IOC runtime errors.
Add a cast.

BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1216

Change-Id: I40627de387bc9cfba37860e7a0a4f2d4524f3431
2016-05-09 16:33:59 -04:00