Граф коммитов

411 Коммитов

Автор SHA1 Сообщение Дата
Johann 26faa3ec7a Apply 'const' to data not pointer
Change-Id: Ic6b695442e319f7582a7ee8e52a47ae3e38c7298
2016-04-14 14:47:16 -07:00
Yi Luo 6db95602e4 Merge "Optimized HBD block subtraction for all block sizes" into nextgenv2 2016-04-12 21:22:32 +00:00
Debargha Mukherjee ec1365a0c9 Merge "Extend variance based partitioning to 128x128 superblocks" into nextgenv2 2016-04-12 19:42:35 +00:00
Yi Luo 0f80b1f754 Optimized HBD block subtraction for all block sizes
- Interface function takes a local MxN function to call based on the
  block size.
- Repetition call (w/o cache line miss) shows improvement:
  ~63% - ~340%.
- Overall encoder speed improvement: ~0.9%.

Change-Id: Ieff8f3d192415c61d6d58d8b99bb2a722004823f
2016-04-12 12:04:43 -07:00
Geza Lore 61af8981b0 Extend variance based partitioning to 128x128 superblocks
Change-Id: I41edf266d5540a9b070a5e65bc397dd3da210507
2016-04-12 09:40:11 +01:00
Yi Luo e5f4e8eab9 Some cosmetic improvements since HBD variance 4x4 optimization
Change-Id: I414c1fabd2e3a9b1d9daa8a90f85a0bace8bd3cd
2016-04-08 10:32:13 -07:00
Geza Lore 454989ff32 Make superblock size variable at the frame level.
The uncompressed frame header contains a bit to signal whether the
frame is encoded using 64x64 or 128x128 superblocks. This can vary
between any 2 frames.

vpxenc gained the --sb-size={64,128,dynamic} option, which allows the
configuration of the superblock size used (default is dynamic). 64/128
will force the encoder to always use the specified superblock size.
Dynamic would enable the encoder to choose the sb size for each
frame, but this is not implemented yet (dynamic does the same as 128
for now).

Constraints on tile sizes depend on the superblock size, the following
is a summary of the current bitstream syntax and semantics:

If both --enable-ext-tile is OFF and --enable-ext-partition is OFF:
     The tile coding in this case is the same as VP9. In particular,
     tiles have a minimum width of 256 pixels and a maximum width of
     4096 pixels. The tile width must be multiples of 64 pixels
     (except for the rightmost tile column). There can be a maximum
     of 64 tile columns and 4 tile rows.

If --enable-ext-tile is OFF and --enable-ext-partition is ON:
     Same constraints as above, except that tile width must be
     multiples of 128 pixels (except for the rightmost tile column).

There is no change in the bitstream syntax used for coding the tile
configuration if --enable-ext-tile is OFF.

If --enable-ext-tile is ON and --enable-ext-partition is ON:
     This is the new large scale tile coding configuration. The
     minimum/maximum tile width and height are 64/4096 pixels. Tile
     width and height must be multiples of 64 pixels. The uncompressed
     header contains two 6 bit fields that hold the tile width/heigh
     in units of 64 pixels. The maximum number of tile rows/columns
     is only limited by the maximum frame size of 65536x65536 pixels
     that can be coded in the bitstream. This yields a maximum of
     1024x1024 tile rows and columns (of 64x64 tiles in a 65536x65536
     frame).

If both --enable-ext-tile is ON and --enable-ext-partition is ON:
     Same applies as above, except that in the bitstream the 2 fields
     containing the tile width/height are in units of the superblock
     size, and the superblock size itself is also coded in the bitstream.
     If the uncompressed header signals the use of 64x64 superblocks,
     then the tile width/height fields are 6 bits wide and are in units
     of 64 pixels. If the uncompressed header signals the use of 128x128
     superblocks, then the tile width/height fields are 5 bits wide and
     are in units of 128 pixels.

The above is a summary of the bitstream. The user interface to vpxenc
(and the equivalent encoder API) behaves a follows:

If --enable-ext-tile is OFF:
     No change in the user interface. --tile-columns and --tile-rows
     specify the base 2 logarithm of the desired number of tile columns
     and tile rows. The actual number of tile rows and tile columns,
     and the particular tile width and tile height are computed by the
     codec ensuring all of the above constraints are respected.

If --enable-ext-tile is ON, but --enable-ext-partition is OFF:
     No change in the user interface. --tile-columns and --tile-rows
     specify the WIDTH and HEIGHT of the tiles in unit of 64 pixels.
     The valid values are in the range [1, 64] (which corresponds to
     [64, 4096] pixels in increments of 64.

If both --enable-ext-tile is ON and --enable-ext-partition is ON:
     If --sb-size=64 (default):
         The user interface is the same as in the previous point.
         --tile-columns and --tile-rows specify tile WIDTH and HEIGHT,
         in units of 64 pixels, in the range [1, 64] (which corresponds
         to [64, 4096] pixels in increments of 64).
     If --sb-size=128 or --sb-size=dynamic:
         --tile-columns and --tile-rows specify tile WIDTH and HEIGHT,
         in units of 128 pixels in the range [1, 32] (which corresponds
         to [128, 4096] pixels in increments of 128).

Change-Id: Idc9beee1ad12ff1634e83671985d14c680f9179a
2016-04-07 10:34:25 +01:00
James Zern 5ab46e0ecd Merge changes I7a1c0cba,Ie02b5caf,I2cbd85d7,I644f35b0
* changes:
  vpx_fdct16x16_1_sse2: improve load pattern
  vpx_fdct16x16_1_c/msa: fix accumulator overflow
  vpx_fdctNxN_1_sse2: reduce store size
  dct32x32_test: add PartialTrans32x32Test, Random
2016-04-06 02:51:53 +00:00
James Zern 38bc1d0f4b vpx_fdct16x16_1_sse2: improve load pattern
load the full row rather than doing 2 8-wide columns

Change-Id: I7a1c0cba06b0dc1ae86046410922b1efccb95c95
2016-04-04 16:03:42 -07:00
James Zern eb64ea3e89 vpx_fdct16x16_1_c/msa: fix accumulator overflow
tran_low_t is only signed 16-bits in non-high-bitdepth mode

Change-Id: Ie02b5caf2658e8e71f995c17dd5ce666a4d64918
2016-04-04 16:03:41 -07:00
James Zern 3735def667 vpx_fdctNxN_1_sse2: reduce store size
only output[0] needs to be set, store_output is more involved than a
movdqa in the high bitdepth case

Change-Id: I2cbd85d7cf74688bdf47eb767934fe42e02bff67
2016-04-04 16:02:06 -07:00
Yi Luo 250935cab3 Optimized HBD 4x4 variance calculation
vpx_highbd_8/10/12_variance4x4_sse4_1 improves performance ~7%-11%.

Change-Id: Ida22bb2a2f7a58037cfd73e186d4f6267a960c02
2016-04-04 11:28:59 -07:00
James Zern c21d437052 vpx_fdct32x32_1_msa: fix accumulator overflow
Change-Id: I33a5432eda3416382e1cea06b45082c0c65faa75
2016-04-02 11:04:38 -07:00
James Zern f4cae05cd4 vpx_fdctNxN_1_c: remove unnecessary store
only output[0] needs to be set, the other values will be ignored in this
case.

Change-Id: I8e9692fc0d6d85700ba46f70c2e899a956023910
2016-04-01 12:21:59 -07:00
James Zern 0269df41c1 vpx_fdct32x32_1_c: fix accumulator overflow
tran_low_t is only 16-bits in non-high-bitdepth mode

Change-Id: Ifc06110c95e86e6d790c44250d52a538b2e9713b
2016-03-30 15:20:20 -07:00
Geza Lore 552d5cd715 Extend superblock size fo 128x128 pixels.
If --enable-ext-partition is used at build time, the superblock size
(sometimes also referred to as coding unit (CU) size) is extended to
128x128 pixels.

Change-Id: Ie09cec6b7e8d765b7555ff5d80974aab60803f3a
2016-03-30 18:23:06 +01:00
Yaowu Xu c810740c36 Merge branch 'masterbase' into nextgenv2
Conflicts:
	vp9/encoder/vp9_encoder.c
	vpx_dsp/x86/convolve.h

Change-Id: I60c3532936bedd796a75dfe78245a95ec21e2e55
2016-03-28 17:44:28 -07:00
Yunqing Wang 5f5552d846 Optimize HBD up-sampled prediction functions
Optimized 2 up-sampled reference prediction functions in high-bit
depth case. This reduced the HBD encoding time by 3%.

Change-Id: I8663ffb5234f5e70168c0fc9ca676309fe8e98f2
2016-03-14 19:04:33 -07:00
Yunqing Wang e6e2d886d3 Add high-precision sub-pixel search as a speed feature
Using the up-sampled reference frames in sub-pixel motion search is
enabled as a speed feature for good-quality mode speed 0 and speed 1.

Change-Id: Ieb454bf8c646ddb99e87bd64c8e74dbd78d84a50
2016-03-11 16:32:11 -08:00
Debargha Mukherjee f34deab243 Adds compound wedge prediction modes
Incorporates wedge compound prediction modes.

Change-Id: Ie73b54b629105b9dcc5f3763be87f35b09ad2ec7
2016-03-10 07:19:54 -08:00
Scott LaVarnway 67c4c8244a VPX: loopfilter_mmx.asm using x86inc 2
This reverts commit 9aa083d164.

Fixes a decoder mismatch with 32bit PIC builds.

Change-Id: I94717df662834810302fe3594b38c53084a4e284
2016-03-08 04:24:47 -08:00
Geza Lore 938b8dfc73 Extend convolution functions to 128x128 for ext-partition.
Change-Id: I7f7e26cd1d58eb38417200550c6fbf4108c9f942
2016-03-07 11:39:27 +00:00
James Zern 9aa083d164 Revert "VPX: loopfilter_mmx.asm using x86inc"
This reverts commit 15ecdc3970.

breaks 32-bit pic builds

Change-Id: I8bb1b9471a293f05ac7423aaba0339d408931b7a
2016-03-04 18:23:45 -08:00
Geza Lore 697bf5beff Add 128 pixel variance and SAD functions
Change-Id: I8fde245b32c9e586683a28aa6925da0b83850b39
2016-03-03 10:24:29 +00:00
Debargha Mukherjee 1d69ceee5c Adds masked variance and sad functions for wedge
Adds masked variance and sad functions needed for wedge
prediction modes to come.

Change-Id: I25b231bbc345e6a494316abb0a7d5cd5586a3a54
2016-03-01 17:28:56 -08:00
Yunqing Wang 342a368fd4 Do sub-pixel motion search in up-sampled reference frames
Up-sampled the reference frames to 8 times in each dimension using
the 8-tap interpolation filter. In sub-pixel motion search, use the
up-sampled reference frames to find the best matching blocks. This
largely improved the motion search precision, and thus, improved
the compression quality. There was no change in decoder side.

Borg test and speed test results:
1. On derflr set,
Overall PSNR gain: 1.306%, and SSIM gain: 1.512%.
Average speed loss on derf set was 6.0%.
2. On stdhd set,
Overall PSNR gain: 0.754%, and SSIM gain: 0.814%.
On hevchd set,
Overall PSNR gain: 0.465%, and SSIM gain: 0.527%.
Speed loss on HD clips was 3.5%.

Change-Id: I300ebaafff57e88914f3dedc8784cb21d316b04f
2016-02-29 12:14:47 -08:00
Scott LaVarnway dd6729f826 VPX: Remove pmin/pmax from subpixel functions.
These instructions are unnecessary if the adds
are done in the correct order.

Change-Id: I4e533b8267c32e610a4b94203ad052dc9fdabd71
2016-02-27 05:47:56 -08:00
Scott LaVarnway 51beb29f52 Merge "VPX: vpx_filter_block1d16_(v8, v8_avg)" 2016-02-27 13:31:18 +00:00
hui su 4aeabf1b0d Fix compiler warnings
Change-Id: Id7240260cec471a3f8d0986b9c8df06efda925f9
2016-02-26 13:52:49 -08:00
Yaowu Xu a570cefcf8 Merge "Extend vpxssim to handle more HBD combinations" into nextgenv2 2016-02-26 15:57:40 +00:00
James Zern 654d2163c9 x86/convolve.h: remove redundant check in FUN_CONV_2D
the filter will be the same in this case

Change-Id: I95159bcb05bbfb71b57da741393e80cc7ffc5cff
2016-02-25 23:31:50 -08:00
James Zern 6d8c8c6201 x86/convolve.h: replace while w/if for w < 16
in non-hbd configurations; any high-bitdepth changes will be done in a
follow-up

Change-Id: Ia74e30971b744c1faab68c92fdeda1a053988c77
2016-02-25 21:44:06 -08:00
Scott LaVarnway 1f736e400f VPX: vpx_filter_block1d16_(v8, v8_avg)
Store result with one 16 byte store instead of
two 8 byte stores.

Change-Id: I43acbc5edfd6d6055a926f9b9605d47127400f09
2016-02-25 06:15:24 -08:00
James Zern b3ceb629ba x86/convolve.h: change filter[] || chains to |
Change-Id: I661f64390f232826857b259e7a67e77f5a3a91ad
2016-02-24 19:47:43 -08:00
hui su 8537826eb4 Fix some compiler warnings.
"taking the absolute value of unsigned type 'unsigned int' has no effect"

Change-Id: Iea1f67c2a3171a98ca89d5dc7192a5508d086c16
2016-02-24 11:17:33 -08:00
Yaowu Xu aa6c754635 Merge remote-tracking branch 'webm/master' into nextgenv2 2016-02-24 10:53:17 -08:00
Scott LaVarnway 06d0e2fe6c BUG FIX: vpx_filter_block1d(8,4)_(v8, v8_avg)
Change-Id: Ic7ea79988ed0864e7ddbfeb312516bcf77eaaac1
2016-02-23 12:23:41 -08:00
Yaowu Xu eeaf8e6b6c Extend vpxssim to handle more HBD combinations
Change-Id: I38426d946b74c9090a265d34b89e2db6693927c2
2016-02-22 16:09:08 -08:00
Yaowu Xu 38cfc45e07 Cleanup psnr.h
Change-Id: Id026e72ee655ee5bd645a89e378da0d462be367d
2016-02-22 15:37:40 -08:00
Yaowu Xu d1c5cd4a30 Add shift stage in FASTSSIM computation
This commits adds a shift stage for FASTSSIM computaton when source
bit depth is different from working bit depth, to make sure metric
results are calculated in bit_depth consistent with source.

Change-Id: I997799634076ef7b00fd051710544681ed536185
2016-02-22 14:58:10 -08:00
Yaowu Xu 195bf52bca Add shift stage for PSNRHVS computation
This commit adds the ability to shift down the working buffer when
source bit_depth is different than working bit_depth. It does so
by shift down to be consistent with source bit_depth.

Change-Id: Idfdbfc614d73fe445d62e35e642cc7d75e9dc4ff
2016-02-22 10:22:42 -08:00
Yaowu Xu 6e695da2d9 Move psnrhvs function declaration to psnr.h
From "ssim.h"

Change-Id: Ie53378794149ef8a844b4eb47ad4f08579de4b60
2016-02-22 08:38:49 -08:00
Scott LaVarnway 15ecdc3970 VPX: loopfilter_mmx.asm using x86inc
Change-Id: Idcf29281d617b275e3ca50f77e6d00c60992a36d
2016-02-18 15:34:58 -08:00
Yaowu Xu acc4addb60 Merge "Add tests for Highbitdepth PSNR metric computations" into nextgenv2 2016-02-18 01:01:00 +00:00
Yaowu Xu 7823fbb45c Merge "Move PSNR related functions into vpx_dsp/psnr.c" into nextgenv2 2016-02-18 01:00:54 +00:00
Yaowu Xu 9fb593d0fc Add tests for Highbitdepth PSNR metric computations
Change-Id: I07324155f73bbdbe25bb7a7ccd587ebf9010ac7a
2016-02-17 21:28:22 +00:00
Yaowu Xu 7538501ad1 Move PSNR related functions into vpx_dsp/psnr.c
This makes all metric computation to locate at some place, also gets
rid of duplicate code between vp9 and vp10.

Change-Id: I24a2707d183a2419cd18a8343010adae185ffcd4
2016-02-17 13:05:34 -08:00
Debargha Mukherjee 35d9eadf08 Merge "Extends ext-tx to support 32x32 masked transforms" into nextgenv2 2016-02-17 18:33:10 +00:00
Debargha Mukherjee 7485498773 Extends ext-tx to support 32x32 masked transforms
Adds new 32x32 masked 1-d transforms that combine 1-D length-16
DCT with length-16 identity transforms.

To be continued in subsequent patches.

Change-Id: I0b4f66492d44c079b3c3b531ba48a97201de1484
2016-02-17 09:31:34 -08:00
Yaowu Xu 6ed7f7a516 Merge branch 'master' into nextgenv2 2016-02-17 07:23:58 -08:00
James Zern 9b44d9d00f split vpx_highbd_lpf_horizontal_16 in two
replace with vpx_highbd_lpf_horizontal_edge_16 and
vpx_highbd_lpf_horizontal_edge_8 to avoid passing a count parameter

Change-Id: I551f8cec0fce57032cb2652584bb802e2248644d
2016-02-16 23:13:58 -08:00
James Zern 1b519fb666 split vpx_lpf_horizontal_16 in two
replace with vpx_lpf_horizontal_edge_16 and vpx_lpf_horizontal_edge_8 to
avoid passing a count parameter

Change-Id: I848c95c02a3c6ebaa6c2bdf0983dce05cd645271
2016-02-16 22:57:45 -08:00
James Zern e7a23d703b vpx_highbd_lpf_horizontal_4: remove unused count param
Change-Id: I655a771e1b1a8753be5669ef9348a312ba6cfdbc
2016-02-16 22:57:45 -08:00
James Zern 5171857329 vpx_highbd_lpf_horizontal_8: remove unused count param
Change-Id: Iaca71ea3796115d4c2d43563b4e6f3914e21f1bf
2016-02-16 22:57:44 -08:00
James Zern 3c1019e49d vpx_highbd_lpf_vertical_4: remove unused count param
Change-Id: Ic6da723c5cf3cd8127db1f476c3e46ea134cb774
2016-02-16 22:57:44 -08:00
James Zern 72a9f06ac2 vpx_highbd_lpf_vertical_8: remove unused count param
Change-Id: Id16f7259897654831d31642c2d5e0bbe5e13416c
2016-02-16 22:57:44 -08:00
James Zern b1e97c6a25 vpx_lpf_horizontal_4: remove unused count param
Change-Id: Iec7d8eda343991f7d7d46931dca17af23c821d11
2016-02-16 22:57:27 -08:00
James Zern bd5a5bb561 vpx_lpf_horizontal_8: remove unused count param
Change-Id: I48741e167a7b09b7c9ad3bfc1c4b88ef1029ae46
2016-02-16 22:54:40 -08:00
James Zern 109a47b342 vpx_lpf_vertical_4: remove unused count param
Change-Id: I43a191cb3d42e51e7bca266adfa11c6239a8064c
2016-02-16 14:59:00 -08:00
James Zern 37225744db vpx_lpf_vertical_8: remove unused count param
Change-Id: Ic69406da00afb0f06588e8c0deb2b043952b078c
2016-02-16 14:59:00 -08:00
Geza Lore abd00505d1 Add optimized vpx_sum_squares_2d_i16 for vp10.
Using this we can eliminate large numbers of calls to predict intra,
and is also faster than most of the variance functions it replaces.
This is an equivalence transform so coding performance is unaffected.

Encoder speedup is approx 7% when var_tx, super_tx and ext_tx are all
enabled.

Change-Id: I0d4c83afc4a97a1826f3abd864bd68e41bb504fb
2016-02-15 16:54:52 +00:00
Yaowu Xu 18b6e9a36f Merge branch 'masterbase' into nextgenv2
Conflicts:
	vp10/encoder/rdopt.c

Change-Id: If720e7f9810378d24bf9fd51a95fd29c3bc5d774
2016-02-12 09:19:30 -08:00
Yaowu Xu 1a69cb286f Refactor internal stats code
Also removed the use of postprocessing in computing internal stats.

Change-Id: Ib8fdbdfe7b7ca05cd1a034a373aa7762fa44323c
2016-02-12 07:31:29 -08:00
James Zern 26c6fbdcda vpx_ve_predictor_4x4_c: quiet unused param warning
Change-Id: I62234260e2d2de94d602c6d8095c8f8124334052
2016-02-11 19:22:29 -08:00
Yaowu Xu bb8ca08816 Enable computing PSNRHVS for hbd build
This commit adds computation of PSNRHVS for highbitdepth build, it
also adds tests to make sure the calculation of psnrhvs metric for
10 and 12 bit correct.

Change-Id: Iac8a8073d2b3e3ba5d368829d770793212fa63b6
2016-02-11 13:17:59 -08:00
Yaowu Xu c0874f2441 Enable computing of FastSSIM for HBD build
This commit adds the computation of fastSSIM for highbitdepth build,
it also modifies the hbdmetric test to be more generic and applicable
for fastSSIM.

The 255 used for calculating ssim constants c1 and c2 is not exactly
scaled by 4x and 16x to 1023 and 4095, therefore requries the metric
test to have a thresold more tolerant than 0, currently at 0.03dB.

Change-Id: I631829da7773de400e77fc36004156e5e126c7e0
2016-02-10 17:11:58 -08:00
Yaowu Xu 204e77e059 Remove a flavor of SSIM that is never really used.
Change-Id: I61ea7f63acbcfeecd3f7dba5a5a38b980efc802b
2016-02-08 11:22:08 -08:00
Yaowu Xu efe1b1dbf7 Set a max dB value for PSNR_HVS and FAST_SSIM
Now set at 100.0 instead of infinite

Change-Id: I41bae0c4bd95a26f9819584e7311b7945df1271a
2016-02-08 10:55:25 -08:00
Yaowu Xu 3c28b4a8ff Fix msvc compiler warnings
There were a number of compiler warnings:
1. int16_t to uint8_t in recon_intra.c;
2. double to float conversions in psnrhvs.c
3. intptr_t to int in quantize.c
4. size_t to int32_t in decoder.c

Change-Id: Id95423b17779dcfa6cf39d9a90fe8cb8b910f5df
2016-02-08 10:14:08 -08:00
Yaowu Xu ac898d221f Normalize fdct8x8 in psnrhvs computation
This is to match the scale to the fdct8x8 used in original daala
psnrhvs computation.

Change-Id: Ic30b50747ba9c340bcb679f7439640046c69f90a
2016-02-08 17:13:18 +00:00
James Zern 05437805f7 intrapred/d135: flatten border results before storing
the results along the top and left border are then stored with a moving
window into the vector.
~40-67% faster on ARM, ~40-77+% on x86 depending on the block size.

Change-Id: Iab369aa2946a3ae4eb7290d512868fe5db92dbc8
2016-02-05 12:31:48 -08:00
Yaowu Xu 105da4128d Fix bad merge artifacts
Temporaly disable warning for unused function for vp10, needs clean
out the warnings before re-enable the flag for vp10.

Change-Id: I5636f8cd607423f6ea6963db9c2cbd688e30b495
2016-02-05 09:04:41 -08:00
Yaowu Xu 48b2713553 Merge branch 'master' into nextgenv2 2016-02-05 05:00:06 -08:00
James Zern cdf1077d5a intrapred: protect functions w/CONFIG check x2
high-bitdepth version
d207e, d63e, d45e are only used with CONFIG_MISC_FIXES

Change-Id: I77292e11f51fd76d4127fd0027f876866bcf8675
2016-02-02 19:38:37 -08:00
Yaowu Xu 9568a284ab Fix automerge errors
Change-Id: I24d415bafe617eac894427088d7b2fbe0b7e04d7
2016-02-01 14:03:49 -08:00
Yaowu Xu 8678ecd1ef Merge branch 'master' into nextgenv2 2016-01-31 05:00:05 -08:00
Yaowu Xu 6a94d6ad8e Merge "Enable sse2 version of inverse wht for hbd build" 2016-01-31 04:38:39 +00:00
Yaowu Xu 8dc6f3f5c2 Merge branch 'master' into nextgenv2 2016-01-30 05:00:05 -08:00
James Zern 8faccb709a Merge changes If13946e4,I61a1814d,I2ca9aa3c,I44d91eaa
* changes:
  intrapred: protect functions w/CONFIG check
  vp9_noise_estimate: protect copy_frame w/CONFIG check
  vp8_cx_iface: delete 3 unused functions
  vp8: mark intra_prediction_down_copy inline
2016-01-30 00:17:16 +00:00
Yaowu Xu 0aef1bc898 Enable sse2 version of inverse wht for hbd build
Change-Id: If8f5efd701a11c8a7ad3078d10ec3cd0fe27667e
2016-01-29 14:47:56 -08:00
Yaowu Xu b229710811 SSSE3 idct8x8 functions for highbitdpeth build
This commit changes SSSE3 optimized idct8x8 functions to work with
highbitdepth build.

With this commit and the previous one that enabled SSSE3 idct32x32
functions, tests showed virtually no difference on decoding speed for
file fdJc1_IBKJA.248.webm for the build with -enable-vp9-highbitdpeth
option and the build without the option.

Change-Id: Ibe0634149ec70e8b921e6b30171664b8690a9c45
2016-01-29 12:36:53 -08:00
Yaowu Xu aac1ef7f80 Enable hbd_build to use SSSE3optimized functions
This commit changes the SSSE3 assembly functions for idct32x32 to
support highbitdepth build.

On test clip fdJc1_IBKJA.248.webm, this cuts the speed difference
between hbd and lbd build from between 3-4% to 1-2%.

Change-Id: Ic3390e0113bc1ca5bba8ec80d1795ad31b484fca
2016-01-29 01:30:43 +00:00
James Zern fea27ccca0 intrapred: protect functions w/CONFIG check
d207e, d63e, d45e are only used with CONFIG_MISC_FIXES

Change-Id: If13946e483c4d0ccaa3e1d60dc14216c06d5a219
2016-01-26 20:13:57 -08:00
Yaowu Xu f512a311f2 Merge branch 'master' into nextgenv2 2016-01-26 05:00:05 -08:00
James Zern 3a2ad10de2 Merge "Code clean of sad4xNx4D_sse" 2016-01-25 20:57:15 +00:00
Alex Converse ed3df445d9 Revert "Merge "Change highbd variance rounding to prevent negative variance.""
This reverts commit ea48370a50, reversing
changes made to 15939cb2d7.

The commit was insufficiently tested and causes failures.

Change-Id: I623d6fc2cd3ae6fd42d0abab1f8eada465ae57a7
2016-01-13 11:19:06 -08:00
Alex Converse ea48370a50 Merge "Change highbd variance rounding to prevent negative variance." 2016-01-13 00:25:54 +00:00
Yaowu Xu 250213ac7e Merge branch 'master' into nextgenv2 2015-12-29 05:00:05 -08:00
Yaowu Xu 14b0443792 Merge branch 'master' into nextgenv2 2015-12-23 05:00:05 -08:00
Jian Zhou 26a6ce4c6d Code clean of highbd_tm_predictor_32x32
Remove the ARCH_X86_64 constraint. No performance hit on both
big core and small core.

Change-Id: I39860b62b7a0ae4acaafdca7d68f3e5820133a81
2015-12-22 16:51:57 -08:00
Jian Zhou 355bfa2193 Code clean of highbd_tm_predictor_16x16
Remove the ARCH_X86_64 constraint.

Change-Id: I0139f8e998cc5525df55161c2054008d21ac24d4
2015-12-22 16:34:40 -08:00
Jian Zhou a4c265f1b7 Code clean of highbd_dc_predictor_32x32
Remove the ARCH_X86_64 constraint.

Change-Id: I7d2545fc4f24eb352cf3e03082fc4d48d46fbb09
2015-12-22 16:06:54 -08:00
James Zern cedb1db594 Merge "Code clean of highbd_tm_predictor_4x4" 2015-12-22 16:45:01 +00:00
James Zern a097963f80 Merge "Code clean of highbd_dc_predictor_4x4" 2015-12-22 16:30:37 +00:00
Yaowu Xu 7c6144bc4a Merge branch 'master' into nextgenv2 2015-12-22 05:00:05 -08:00
Jian Zhou 52e7f4153b Merge "Code clean of highbd_v_predictor_4x4" 2015-12-21 18:07:48 +00:00
Yaowu Xu f73feedb9e Merge branch 'master' into nextgenv2 2015-12-19 05:00:06 -08:00
Yunqing Wang b597e3e188 Merge "Fix for issue 1114 compile error" 2015-12-19 04:29:39 +00:00
James Zern 8b2ddbc728 sad_sse2: fix sad4xN(_avg) on windows
reduce the register count by 1 to avoid xmm6 and unnecessarily
penalizing the other users of the base macro

Change-Id: I59605c9a41a31c1b74f67ec06a40d1a7f92c4699
2015-12-18 19:19:32 -08:00
Jian Zhou db11307502 Code clean of highbd_tm_predictor_4x4
Replace MMX with SSE2, reduce mem access to left neighbor,
loop unrolled.

Change-Id: I941be915af809025f121ecc6c6443f73c9903e70
2015-12-18 18:43:41 -08:00
Jian Zhou c91dd55eda Code clean of highbd_v_predictor_4x4
MMX replaced with SSE2, same performance.

Change-Id: I2ab8f30a71e5fadbbc172fb385093dec1e11a696
2015-12-18 15:25:27 -08:00
Jian Zhou 8366b414dd Code clean of highbd_dc_predictor_4x4
MMX replaced with SSE2, same performance.

Change-Id: Ic57855254e26757191933c948fac6aa047fadafc
2015-12-18 12:45:23 -08:00
Yaowu Xu 7330108009 Merge branch 'master' into nextgenv2 2015-12-18 05:00:05 -08:00
Peter de Rivaz 7361ef732b Fix for issue 1114 compile error
In 32-bit build with --enable-shared, there is a lot of
register pressure and register src_strideq is reused.
The code needs to use the stack based version of src_stride,
but this doesn't compile when used in an lea instruction.

This patch also fixes a related segmentation fault caused by the
implementation using src_strideq even though it has been
reused.

This patch also fixes the HBD subpel variance tests that fail
when compiled without disable-optimizations.
These failures were caused by local variables in the assembler
routines colliding with the caller's stack frame.

Change-Id: Ice9d4dafdcbdc6038ad5ee7c1c09a8f06deca362
2015-12-18 09:43:22 +00:00
Jian Zhou 789dbb3131 Code clean of sad4xNx4D_sse
Replace MMX with SSE2.

Change-Id: I948ca1be6ed9b8e67f16555e226f1203726b7da6
2015-12-17 17:43:46 -08:00
Jian Zhou b158d9a649 Code clean of sad4xN(_avg)_sse
Replace MMX with SSE2, reduce psadbw ops which may help Silvermont.

Change-Id: Ic7aec15245c9e5b2f3903dc7631f38e60be7c93d
2015-12-17 11:10:42 -08:00
Yaowu Xu b37e8b0e00 Merge branch 'master' into nextgenv2 2015-12-15 05:00:05 -08:00
James Zern b81f04a0cc Merge "move vp9_avg to vpx_dsp" 2015-12-15 03:41:22 +00:00
James Zern d36659cec7 move vp9_avg to vpx_dsp
Change-Id: I7bc991abea383db1f86c1bb0f2e849837b54d90f
2015-12-14 14:42:12 -08:00
Jian Zhou 2404e3290e Merge "Code clean of tm_predictor_32x32" 2015-12-14 17:56:01 +00:00
Jian Zhou 6e87880e7f Merge "Speed up tm_predictor_16x16" 2015-12-11 18:55:46 +00:00
Jian Zhou 88120481a4 Code clean of tm_predictor_32x32
Reallocate the xmm register usage so that no ARCH_X86_64 required.
Reduce memory access to the left neighbor by half.
Speed up by single digit on big core machine.

Change-Id: I392515ed8e8aeb02e6a717b3966b1ba13f5be990
2015-12-11 10:32:08 -08:00
Jian Zhou 62f986265f Merge "SSE2 based h_predictor_32x32" 2015-12-11 18:02:34 +00:00
Yaowu Xu f07d73b9bf Merge branch 'master' into nextgenv2
Change-Id: Id0b784b115602e2502b42fa972a5ae210435a3be
2015-12-11 08:58:40 -08:00
James Zern ecb8dff768 Merge "dc_left_pred[48]: fix pic builds" 2015-12-11 02:48:11 +00:00
Jian Zhou 5604924945 Merge "Code clean of dc_left/top_predictor_16x16" 2015-12-11 01:53:44 +00:00
James Zern 40ee78bc19 dc_left_pred[48]: fix pic builds
GET_GOT modifies the stack pointer so the offset for left's address will
be wrong if loaded afterword.

Change-Id: Iff9433aec45f5f6fe1a59ed8080c589bad429536
2015-12-10 15:44:31 -08:00
Debargha Mukherjee 104636a39a Some fixes from merging MISC_FIXES config
Change-Id: I3f77e952af3c441a50479bb5d278ea0fd6cf62c6
2015-12-10 15:17:33 -08:00
Yunqing Wang 322ea7ff5b Fix the win32 crash when GET_GOT is not defined
This patch continues to fix the win32 crash issue:
https://bugs.chromium.org/p/webm/issues/detail?id=1105

Johann's patch is here:
https://chromium-review.googlesource.com/#/c/316446/2

Change-Id: I7fe191c717e40df8602e229371321efb0d689375
2015-12-10 14:25:01 -08:00
Jian Zhou 4ec5953080 Code clean of dc_left/top_predictor_16x16
Remove some redundant code.

Change-Id: Ida2e8c0ce28770f7a9545ca014fe792b04295260
2015-12-10 11:59:58 -08:00
Jian Zhou c90a8a1a43 SSE2 based h_predictor_32x32
Relocate the function from SSSE3 to SSE2, Unroll loop from 16 to 8,
and reduce mem access to left.
Speed up by single digit in ./test_intra_pred_speed on big core
machines.

Change-Id: I2b7fc95ffc0c42145be2baca4dc77116dff1c960
2015-12-10 10:09:58 -08:00
Johann Koenig 420b9f5bd3 Merge "fix null pointer crash in Win32 because esp register is broken" 2015-12-09 19:31:12 +00:00
Yaowu Xu f757782f22 Merge branch 'master' into nextgenv2
Change-Id: I6f8b540854ddc78fc4a2a8045b194a888749a3cb
2015-12-09 08:09:30 -08:00
Jian Zhou aa5b517a39 Re-enable SSE2 based intra 4x4 prediction
4x4 Intra predictor implemented with MMX is replaced with SSE2.
Segfault in change 315561 when decoding vp8 is taken care of.

Change-Id: I083a7cb4eb8982954c20865160f91ebec777ec76
2015-12-07 18:50:37 -08:00
Scott LaVarnway c7e557b82c Merge "VP9: Add ssse3 version of vpx_idct32x32_135_add()" 2015-12-07 21:13:35 +00:00
Sergey Kolomenkin 5fc9688792 fix null pointer crash in Win32 because esp register is broken
https://bugs.chromium.org/p/webm/issues/detail?id=1105

Change-Id: I304ea85ea1f6474e26f074dc39dc0748b90d4d3d
2015-12-07 12:57:06 -08:00
Yaowu Xu 69f4930041 Merge branch 'master' into nextgenv2
Conflicts:
	vp10/common/blockd.h
	vp10/common/entropymode.h
	vp10/common/reconintra.c
	vp10/decoder/decodemv.c
	vp10/encoder/bitstream.c
	vp10/encoder/encoder.h
	vp10/encoder/rd.c
	vp10/encoder/rdopt.c
	vp10/encoder/tokenize.h

Change-Id: Ic4891839b6f0474026d6d69821e38edec9632df1
2015-12-07 11:37:14 -08:00
James Zern 79a9add666 Revert "MMX in intra 4x4 prediction replaced with SSE2"
This reverts commit 89a1efa4c4.

This causes a segfault when decoding vp8, in both 32 and 64-bit

Change-Id: Idbb9bb28ab897e1d055340497c47b49a12231367
2015-12-05 10:20:39 -08:00
Jian Zhou e86c7c863e Speed up h_predictor_16x16
Relocate the function from SSSE3 to SSE2, Unroll loop from 8 to 4,
and reduce mem access to left.
Speed up by >20% in ./test_intra_pred_speed.

Change-Id: Ie48229c2e32404706b722442942c84983bda74cc
2015-12-04 12:12:55 -08:00
Jian Zhou da3f08fac3 Speed up h_predictor_8x8
Relocate the function from SSSE3 to SSE2, Unroll loop from 4 to 2,
and reduce mem access to left.
Speed up by >20% in ./test_intra_pred_speed.

Change-Id: Ib9f1846819783b6e05e2a310c930eb844b2b4d2e
2015-12-04 11:36:44 -08:00
Jian Zhou aa2764abdd MMX in intra 8x8 prediction replaced with SSE2
8x8 Intra predictor implemented with MMX is replaced with SSE2.

Change-Id: I0c90e7c1e1e6942489ac2bfe58903b728aac7a52
2015-12-03 18:11:06 -08:00
Jian Zhou 89a1efa4c4 MMX in intra 4x4 prediction replaced with SSE2
4x4 Intra predictor implemented with MMX is replaced with SSE2.

Change-Id: Id57da2a7c38832d0356bc998790fc1989d39eafc
2015-12-03 16:40:23 -08:00
Yaowu Xu 3e2273fcee Merge branch 'master' into nextgenv2 2015-12-03 05:00:05 -08:00
Jian Zhou 623e988add Merge "SSE2 speed up of h_predictor_4x4" 2015-12-02 18:49:00 +00:00
Scott LaVarnway f0b0b1fe62 VP9: Add ssse3 version of vpx_idct32x32_135_add()
Change-Id: I9a780131efaad28cf1ad233ae64c5c319a329727
2015-12-02 04:50:46 -08:00
Yaowu Xu d1486c3837 Merge branch 'master' into nextgenv2 2015-12-01 05:00:05 -08:00
Jian Zhou c7fae5d893 Speed up tm_predictor_16x16
Reduce mem access to left. Speed up by 10% in ./test_intra_pred_speed
with the same instruction size.

Change-Id: Ia33689d62476972cc82ebb06b50415aeccc95d15
2015-11-30 17:46:40 -08:00
Scott LaVarnway 2669e05949 Merge "VPX: x86 asm version of vpx_idct32x32_1024_add()" 2015-11-30 23:28:27 +00:00
Jian Zhou 9d29d76280 SSE2 speed up of h_predictor_4x4
Relocate h_predictor_4x4 from SSSE3 to SSE2 with XMM registers.
Speed up by ~25% in ./test_intra_pred_speed.

Change-Id: I64e14c13b482a471449be3559bfb0da45cf88d9d
2015-11-30 10:08:05 -08:00
Scott LaVarnway 0148e20c3c VPX: x86 asm version of vpx_idct32x32_1024_add()
Change-Id: I3ba4ede553e068bf116dce59d1317347988b3542
2015-11-25 10:11:29 -08:00
Yaowu Xu 49f5903dd2 Merge branch 'master' into nextgenv2 2015-11-25 05:00:05 -08:00
Jian Zhou 901d20369a Merge "Speed up tm_predictor_8x8" 2015-11-25 02:34:07 +00:00
Alex Converse 022c848b4d Change highbd variance rounding to prevent negative variance.
Always round sum error and sum square error toward zero in variance
calculations. This prevents variance from becoming negative.
Avoiding rounding variance at all might be better but would be far
more invasive.

Change-Id: Icf24e0e75ff94952fc026ba6a4d26adf8d373f1c
2015-11-24 16:32:01 -08:00
Jian Zhou f4621c5c8d Speed up tm_predictor_8x8
Left neighbor read from memory only once.
Speed up by ~20% in ./test_intra_pred_speed.

Change-Id: Ia1388630df6fed0dce9a6eeded6cb855bbc43505
2015-11-24 16:07:06 -08:00
Alex Converse b84fa548fb Merge "bitreader/writer: Change shift to signed" 2015-11-24 18:33:45 +00:00
Yaowu Xu ea78294030 Merge branch 'master' into nextgenv2 2015-11-24 05:00:05 -08:00
Scott LaVarnway 97e6cc6198 VPX: Removed unnecessary pmulhrsw in IDCT32X32_34
and fixed macro name.

Change-Id: I306b98a2b4ec80b130ae80290b4cd9c7a5363311
2015-11-23 10:24:09 -08:00
Yaowu Xu c1629ca53b Merge branch 'master' into nextgenv2 2015-11-21 05:00:05 -08:00
James Zern 16eba81f69 Revert "Speed up h_predictor_4x4"
This reverts commit d76032ae87.

breaks 32-bit builds

Change-Id: If6266ec2a405b5a21d615112f0f37e8a71193858
2015-11-20 22:25:29 -08:00
James Zern 1b10753ad7 Merge "Speed up h_predictor_4x4" 2015-11-21 01:12:42 +00:00