mozilla/aom - aom

Граф коммитов

Автор	SHA1	Сообщение	Дата
Angie Chiang	81b2e50213	Pass conv_params into warp-related functions This aims at integrate convolve_round/compound_round with global_motion Change-Id: I1d91ff2de6b075f807eaaaa0a7a66edb2036e57b	2017-06-20 17:42:12 +00:00
David Barker	facac4f5f0	Tidy up warp filter * Simplify the C version of the warp filter to make the intent of the code clearer * Replace saturate_uint() in the C warp filter with an assertion that the intermediate values are in-range. This is because they should (provably) never go out-of-range. * Add a comment describing the intended hardware architecture * Miscellaneous comment updates Change-Id: I798736f923ece599f22d573d31c5dfccd18b2d0e	2017-05-30 21:37:40 +00:00
Debargha Mukherjee	a77ec1c922	Change warp filter to use one less precision bit Change-Id: Idc7bb686f5751b0457c9f21daac0fa6f4865fd22	2017-05-29 07:09:48 +00:00
David Barker	58616eb0cd	Further speedups to warp filter * Calculate sx4, sy4 by truncation instead of rounding * Move some repeated calculations out of the filter loop This is expected to have a roughly neutral effect on BDRATE. The speedup of each filter (SSE2, lowbd SSSE3, highbd SSSE3) is 7-10%, for a total speedup of 14-18% when considered together with patches `f7a5ee5` and `14b8112`. Change-Id: I692f649202214c7ab53ecf81f81386f1503e2d20	2017-05-16 21:02:12 +00:00
Yaowu Xu	70d9acc1c5	Avoid left shift of negative numbers Slience warnings by converting the shifts to mulitplies. Change-Id: Icde8f2df650f740b8e90691ba706a0853be84984	2017-05-14 20:34:04 -07:00
Sean Purser-Haskell	14b8112b42	Extra rounding to let hw to use narrower integers. Change-Id: I175d6ff03f31a2e0d2fe7cd1c3852210d6e0ddf5	2017-05-11 17:30:02 +00:00
David Barker	f7a5ee536b	More accurate chroma warping Previously, the projected positions of chroma pixels would effectively undergo double rounding, since we round both when calculating x4 / y4 and when calculating the filter index. Further, the two roundings were different: x4 / y4 used ROUND_POWER_OF_TWO_SIGNED, whereas the filter index uses ROUND_POWER_OF_TWO. It is slightly more accurate (and faster) to replace the first rounding by a shift; this is motivated by the fact that ROUND_POWER_OF_TWO(x >> a, b) == ROUND_POWER_OF_TWO(x, a + b) Change-Id: Ia52b05745168d0aeb05f0af4c75ff33eee791d82	2017-05-11 11:25:07 +00:00
Sarah Parker	7afb8b758a	Compute compound average in warp_plane only for COMPOUND_AVERAGE This fixes a mismatch which occurs when global/warped motion and a masked compound type are used together. Change-Id: I08b2702cdb3b85f8d8817b9286a73951c97cf379	2017-05-05 16:38:21 +00:00
David Barker	d8a423c62d	Add SSSE3 warp filter + const-ify warp filters The SSSE3 filter is very similar to the SSE2 filter, but the horizontal pass is sped up by using the 8x8->16 multiplies added in SSSE3. Also apply const-correctness to all versions of the filter The timings of the existing filters are unchanged, and the lowbd SSSE3 filter is ~17% faster than the lowbd SSE2 filter. Timings per 8x8 block: lowbd SSE2: 320ns lowbd SSSE3: 273ns highbd SSSE3: 300ns Filter output is unchanged. Change-Id: Ifb428a33b106d900cde1b080794796c0754ae182	2017-05-04 11:15:26 +01:00
David Barker	4644374134	Remove temporary condition from warp code Patch https://aomedia-review.googlesource.com/c/10901/ temporarily disabled the SSE2 warp filter for 4x4 blocks, because of a data race when the filter was used at the right-hand edge of a tile in a multithreaded encode. This patch fixes the data race and re-enables the SSE2 warp filter. Change-Id: I7058c897ddf538cd10001c5be13b1a1bfe8320fd	2017-05-02 17:46:20 +00:00
Yaowu Xu	d74b56c086	Change to use unaligned load Fixes segfaults due to aligned load of unaligned data. Change-Id: If0106f2c2e7df6713c8db14cf360eabbb334cbb5	2017-05-02 01:38:58 +00:00
Debargha Mukherjee	79362e3307	Revert "Limit to 192 filters for warp, clamp index since in some cases index 192" This reverts commit `266db85d4a`. Reason for revert: Reverting to prevent software slowdown. Will be implemented differently in a separate patch. Change-Id: I386a9661c87d69e22761e5c01507f2f1f968433f	2017-04-28 19:44:11 +00:00
David Barker	b62eef7b6e	Fix encode/decode mismatch with global/warped motion When predicting a 4x4 warp block (either using ZEROMV with global-motion, or the WARPED_CAUSAL motion mode with warped-motion), the warp filter would previously write 4 bytes to the right of the block. This caused encode/decode mismatches when encoding with multiple threads and tile_cols > 1, since in that case we could end up overwriting already-generated pixels from the next tile across. This patch changes the filter so that we only overwrite the intended pixels. Change-Id: I3664b44e872e85aa5ccc0a5781f0f9ad994a5b80	2017-04-28 15:45:07 +00:00
Sean Purser-Haskell	266db85d4a	Limit to 192 filters for warp, clamp index since in some cases index 192 is accessed. Change-Id: I3d65123893663cc7d303056e46934aec153bc35b	2017-04-27 22:50:13 +00:00
Debargha Mukherjee	27f6e66e22	Reduce precision of shear parameters to 16 bits Change-Id: I9cd9362edbb7b642f4b632bf574abfe5b2159ff3	2017-04-10 20:54:20 +00:00
David Barker	521383ae59	Add SSSE3 highbd warp filter Change-Id: Ic3b8508c3364aecff1b2f53c7246a5e381b63018	2017-04-06 22:00:10 +00:00
Debargha Mukherjee	1d18460fab	Reduce precision bis between shears Change-Id: I89e981c9396c7a1ba8051d65036a16692da94d0d	2017-04-05 14:28:00 +00:00
David Barker	838367db1e	Add correctness tests for the SSE2 warp filter Also rename warp_affine() to av1_warp_affine() Change-Id: I945baff6be8a1ea942ce88dfcfa5344af6b3a966	2017-01-19 16:55:58 +00:00
David Barker	1b888f2e9a	Optimize SSE2 warp filter Improve the speed of the warp filter itself by ~30%. This leads to an overall decoder speedup of 5-20%, depending on bitrate, for the global-motion experiment, and a small speedup for warped-motion. Applies a very minor change to the rounding during filter selection (ROUND_POWER_OF_TWO makes slightly more sense here than ROUND_POWER_OF_TWO_SIGNED, and is faster) Change-Id: I3f364221d1ec35a8aac0d2c8b0e427f527d12e43	2017-01-19 16:55:52 +00:00
David Barker	d5dfa96e88	Add SSE2 vectorized warp filter for lowbd End-to-end speed improvements: (measured on tempete_cif.y4m, 20 frames for encoder and all 260 frames for decoder) * GLOBAL_MOTION encoder: ~10% faster * GLOBAL_MOTION decoder: 100-200% faster depending on bitrate * WARPED_MOTION encoder: ~2.5% faster * WARPED_MOTION decoder: ~20-40% faster depending on bitrate The improvement in the GLOBAL_MOTION decoder is particularly large because its runtime is dominated by calls to warp_plane(). This introduces minor changes to the output of the warp filter, but these should be rare. Change-Id: I5813ab9e90311e27587045153c32d400b6b9eb92	2017-01-12 17:14:35 +00:00

20 Коммитов