Граф коммитов

220 Коммитов

Автор SHA1 Сообщение Дата
James Zern 37225744db vpx_lpf_vertical_8: remove unused count param
Change-Id: Ic69406da00afb0f06588e8c0deb2b043952b078c
2016-02-16 14:59:00 -08:00
James Zern 26c6fbdcda vpx_ve_predictor_4x4_c: quiet unused param warning
Change-Id: I62234260e2d2de94d602c6d8095c8f8124334052
2016-02-11 19:22:29 -08:00
James Zern 05437805f7 intrapred/d135: flatten border results before storing
the results along the top and left border are then stored with a moving
window into the vector.
~40-67% faster on ARM, ~40-77+% on x86 depending on the block size.

Change-Id: Iab369aa2946a3ae4eb7290d512868fe5db92dbc8
2016-02-05 12:31:48 -08:00
James Zern cdf1077d5a intrapred: protect functions w/CONFIG check x2
high-bitdepth version
d207e, d63e, d45e are only used with CONFIG_MISC_FIXES

Change-Id: I77292e11f51fd76d4127fd0027f876866bcf8675
2016-02-02 19:38:37 -08:00
Yaowu Xu 6a94d6ad8e Merge "Enable sse2 version of inverse wht for hbd build" 2016-01-31 04:38:39 +00:00
James Zern 8faccb709a Merge changes If13946e4,I61a1814d,I2ca9aa3c,I44d91eaa
* changes:
  intrapred: protect functions w/CONFIG check
  vp9_noise_estimate: protect copy_frame w/CONFIG check
  vp8_cx_iface: delete 3 unused functions
  vp8: mark intra_prediction_down_copy inline
2016-01-30 00:17:16 +00:00
Yaowu Xu 0aef1bc898 Enable sse2 version of inverse wht for hbd build
Change-Id: If8f5efd701a11c8a7ad3078d10ec3cd0fe27667e
2016-01-29 14:47:56 -08:00
Yaowu Xu b229710811 SSSE3 idct8x8 functions for highbitdpeth build
This commit changes SSSE3 optimized idct8x8 functions to work with
highbitdepth build.

With this commit and the previous one that enabled SSSE3 idct32x32
functions, tests showed virtually no difference on decoding speed for
file fdJc1_IBKJA.248.webm for the build with -enable-vp9-highbitdpeth
option and the build without the option.

Change-Id: Ibe0634149ec70e8b921e6b30171664b8690a9c45
2016-01-29 12:36:53 -08:00
Yaowu Xu aac1ef7f80 Enable hbd_build to use SSSE3optimized functions
This commit changes the SSSE3 assembly functions for idct32x32 to
support highbitdepth build.

On test clip fdJc1_IBKJA.248.webm, this cuts the speed difference
between hbd and lbd build from between 3-4% to 1-2%.

Change-Id: Ic3390e0113bc1ca5bba8ec80d1795ad31b484fca
2016-01-29 01:30:43 +00:00
James Zern fea27ccca0 intrapred: protect functions w/CONFIG check
d207e, d63e, d45e are only used with CONFIG_MISC_FIXES

Change-Id: If13946e483c4d0ccaa3e1d60dc14216c06d5a219
2016-01-26 20:13:57 -08:00
James Zern 3a2ad10de2 Merge "Code clean of sad4xNx4D_sse" 2016-01-25 20:57:15 +00:00
Alex Converse ed3df445d9 Revert "Merge "Change highbd variance rounding to prevent negative variance.""
This reverts commit ea48370a50, reversing
changes made to 15939cb2d7.

The commit was insufficiently tested and causes failures.

Change-Id: I623d6fc2cd3ae6fd42d0abab1f8eada465ae57a7
2016-01-13 11:19:06 -08:00
Alex Converse ea48370a50 Merge "Change highbd variance rounding to prevent negative variance." 2016-01-13 00:25:54 +00:00
Jian Zhou 26a6ce4c6d Code clean of highbd_tm_predictor_32x32
Remove the ARCH_X86_64 constraint. No performance hit on both
big core and small core.

Change-Id: I39860b62b7a0ae4acaafdca7d68f3e5820133a81
2015-12-22 16:51:57 -08:00
Jian Zhou 355bfa2193 Code clean of highbd_tm_predictor_16x16
Remove the ARCH_X86_64 constraint.

Change-Id: I0139f8e998cc5525df55161c2054008d21ac24d4
2015-12-22 16:34:40 -08:00
Jian Zhou a4c265f1b7 Code clean of highbd_dc_predictor_32x32
Remove the ARCH_X86_64 constraint.

Change-Id: I7d2545fc4f24eb352cf3e03082fc4d48d46fbb09
2015-12-22 16:06:54 -08:00
James Zern cedb1db594 Merge "Code clean of highbd_tm_predictor_4x4" 2015-12-22 16:45:01 +00:00
James Zern a097963f80 Merge "Code clean of highbd_dc_predictor_4x4" 2015-12-22 16:30:37 +00:00
Jian Zhou 52e7f4153b Merge "Code clean of highbd_v_predictor_4x4" 2015-12-21 18:07:48 +00:00
Yunqing Wang b597e3e188 Merge "Fix for issue 1114 compile error" 2015-12-19 04:29:39 +00:00
James Zern 8b2ddbc728 sad_sse2: fix sad4xN(_avg) on windows
reduce the register count by 1 to avoid xmm6 and unnecessarily
penalizing the other users of the base macro

Change-Id: I59605c9a41a31c1b74f67ec06a40d1a7f92c4699
2015-12-18 19:19:32 -08:00
Jian Zhou db11307502 Code clean of highbd_tm_predictor_4x4
Replace MMX with SSE2, reduce mem access to left neighbor,
loop unrolled.

Change-Id: I941be915af809025f121ecc6c6443f73c9903e70
2015-12-18 18:43:41 -08:00
Jian Zhou c91dd55eda Code clean of highbd_v_predictor_4x4
MMX replaced with SSE2, same performance.

Change-Id: I2ab8f30a71e5fadbbc172fb385093dec1e11a696
2015-12-18 15:25:27 -08:00
Jian Zhou 8366b414dd Code clean of highbd_dc_predictor_4x4
MMX replaced with SSE2, same performance.

Change-Id: Ic57855254e26757191933c948fac6aa047fadafc
2015-12-18 12:45:23 -08:00
Peter de Rivaz 7361ef732b Fix for issue 1114 compile error
In 32-bit build with --enable-shared, there is a lot of
register pressure and register src_strideq is reused.
The code needs to use the stack based version of src_stride,
but this doesn't compile when used in an lea instruction.

This patch also fixes a related segmentation fault caused by the
implementation using src_strideq even though it has been
reused.

This patch also fixes the HBD subpel variance tests that fail
when compiled without disable-optimizations.
These failures were caused by local variables in the assembler
routines colliding with the caller's stack frame.

Change-Id: Ice9d4dafdcbdc6038ad5ee7c1c09a8f06deca362
2015-12-18 09:43:22 +00:00
Jian Zhou 789dbb3131 Code clean of sad4xNx4D_sse
Replace MMX with SSE2.

Change-Id: I948ca1be6ed9b8e67f16555e226f1203726b7da6
2015-12-17 17:43:46 -08:00
Jian Zhou b158d9a649 Code clean of sad4xN(_avg)_sse
Replace MMX with SSE2, reduce psadbw ops which may help Silvermont.

Change-Id: Ic7aec15245c9e5b2f3903dc7631f38e60be7c93d
2015-12-17 11:10:42 -08:00
James Zern b81f04a0cc Merge "move vp9_avg to vpx_dsp" 2015-12-15 03:41:22 +00:00
James Zern d36659cec7 move vp9_avg to vpx_dsp
Change-Id: I7bc991abea383db1f86c1bb0f2e849837b54d90f
2015-12-14 14:42:12 -08:00
Jian Zhou 2404e3290e Merge "Code clean of tm_predictor_32x32" 2015-12-14 17:56:01 +00:00
Jian Zhou 6e87880e7f Merge "Speed up tm_predictor_16x16" 2015-12-11 18:55:46 +00:00
Jian Zhou 88120481a4 Code clean of tm_predictor_32x32
Reallocate the xmm register usage so that no ARCH_X86_64 required.
Reduce memory access to the left neighbor by half.
Speed up by single digit on big core machine.

Change-Id: I392515ed8e8aeb02e6a717b3966b1ba13f5be990
2015-12-11 10:32:08 -08:00
Jian Zhou 62f986265f Merge "SSE2 based h_predictor_32x32" 2015-12-11 18:02:34 +00:00
James Zern ecb8dff768 Merge "dc_left_pred[48]: fix pic builds" 2015-12-11 02:48:11 +00:00
Jian Zhou 5604924945 Merge "Code clean of dc_left/top_predictor_16x16" 2015-12-11 01:53:44 +00:00
James Zern 40ee78bc19 dc_left_pred[48]: fix pic builds
GET_GOT modifies the stack pointer so the offset for left's address will
be wrong if loaded afterword.

Change-Id: Iff9433aec45f5f6fe1a59ed8080c589bad429536
2015-12-10 15:44:31 -08:00
Yunqing Wang 322ea7ff5b Fix the win32 crash when GET_GOT is not defined
This patch continues to fix the win32 crash issue:
https://bugs.chromium.org/p/webm/issues/detail?id=1105

Johann's patch is here:
https://chromium-review.googlesource.com/#/c/316446/2

Change-Id: I7fe191c717e40df8602e229371321efb0d689375
2015-12-10 14:25:01 -08:00
Jian Zhou 4ec5953080 Code clean of dc_left/top_predictor_16x16
Remove some redundant code.

Change-Id: Ida2e8c0ce28770f7a9545ca014fe792b04295260
2015-12-10 11:59:58 -08:00
Jian Zhou c90a8a1a43 SSE2 based h_predictor_32x32
Relocate the function from SSSE3 to SSE2, Unroll loop from 16 to 8,
and reduce mem access to left.
Speed up by single digit in ./test_intra_pred_speed on big core
machines.

Change-Id: I2b7fc95ffc0c42145be2baca4dc77116dff1c960
2015-12-10 10:09:58 -08:00
Johann Koenig 420b9f5bd3 Merge "fix null pointer crash in Win32 because esp register is broken" 2015-12-09 19:31:12 +00:00
Jian Zhou aa5b517a39 Re-enable SSE2 based intra 4x4 prediction
4x4 Intra predictor implemented with MMX is replaced with SSE2.
Segfault in change 315561 when decoding vp8 is taken care of.

Change-Id: I083a7cb4eb8982954c20865160f91ebec777ec76
2015-12-07 18:50:37 -08:00
Scott LaVarnway c7e557b82c Merge "VP9: Add ssse3 version of vpx_idct32x32_135_add()" 2015-12-07 21:13:35 +00:00
Sergey Kolomenkin 5fc9688792 fix null pointer crash in Win32 because esp register is broken
https://bugs.chromium.org/p/webm/issues/detail?id=1105

Change-Id: I304ea85ea1f6474e26f074dc39dc0748b90d4d3d
2015-12-07 12:57:06 -08:00
James Zern 79a9add666 Revert "MMX in intra 4x4 prediction replaced with SSE2"
This reverts commit 89a1efa4c4.

This causes a segfault when decoding vp8, in both 32 and 64-bit

Change-Id: Idbb9bb28ab897e1d055340497c47b49a12231367
2015-12-05 10:20:39 -08:00
Jian Zhou e86c7c863e Speed up h_predictor_16x16
Relocate the function from SSSE3 to SSE2, Unroll loop from 8 to 4,
and reduce mem access to left.
Speed up by >20% in ./test_intra_pred_speed.

Change-Id: Ie48229c2e32404706b722442942c84983bda74cc
2015-12-04 12:12:55 -08:00
Jian Zhou da3f08fac3 Speed up h_predictor_8x8
Relocate the function from SSSE3 to SSE2, Unroll loop from 4 to 2,
and reduce mem access to left.
Speed up by >20% in ./test_intra_pred_speed.

Change-Id: Ib9f1846819783b6e05e2a310c930eb844b2b4d2e
2015-12-04 11:36:44 -08:00
Jian Zhou aa2764abdd MMX in intra 8x8 prediction replaced with SSE2
8x8 Intra predictor implemented with MMX is replaced with SSE2.

Change-Id: I0c90e7c1e1e6942489ac2bfe58903b728aac7a52
2015-12-03 18:11:06 -08:00
Jian Zhou 89a1efa4c4 MMX in intra 4x4 prediction replaced with SSE2
4x4 Intra predictor implemented with MMX is replaced with SSE2.

Change-Id: Id57da2a7c38832d0356bc998790fc1989d39eafc
2015-12-03 16:40:23 -08:00
Jian Zhou 623e988add Merge "SSE2 speed up of h_predictor_4x4" 2015-12-02 18:49:00 +00:00
Scott LaVarnway f0b0b1fe62 VP9: Add ssse3 version of vpx_idct32x32_135_add()
Change-Id: I9a780131efaad28cf1ad233ae64c5c319a329727
2015-12-02 04:50:46 -08:00