Граф коммитов

20 Коммитов

Автор SHA1 Сообщение Дата
Angie Chiang 81b2e50213 Pass conv_params into warp-related functions
This aims at integrate convolve_round/compound_round
with global_motion

Change-Id: I1d91ff2de6b075f807eaaaa0a7a66edb2036e57b
2017-06-20 17:42:12 +00:00
David Barker facac4f5f0 Tidy up warp filter
* Simplify the C version of the warp filter to make the intent
  of the code clearer
* Replace saturate_uint() in the C warp filter with an assertion
  that the intermediate values are in-range. This is because they
  should (provably) *never* go out-of-range.
* Add a comment describing the intended hardware architecture
* Miscellaneous comment updates

Change-Id: I798736f923ece599f22d573d31c5dfccd18b2d0e
2017-05-30 21:37:40 +00:00
Debargha Mukherjee a77ec1c922 Change warp filter to use one less precision bit
Change-Id: Idc7bb686f5751b0457c9f21daac0fa6f4865fd22
2017-05-29 07:09:48 +00:00
David Barker 58616eb0cd Further speedups to warp filter
* Calculate sx4, sy4 by truncation instead of rounding
* Move some repeated calculations out of the filter loop

This is expected to have a roughly neutral effect on BDRATE.
The speedup of each filter (SSE2, lowbd SSSE3, highbd SSSE3) is
7-10%, for a total speedup of 14-18% when considered together
with patches f7a5ee5 and 14b8112.

Change-Id: I692f649202214c7ab53ecf81f81386f1503e2d20
2017-05-16 21:02:12 +00:00
Yaowu Xu 70d9acc1c5 Avoid left shift of negative numbers
Slience warnings by converting the shifts to mulitplies.

Change-Id: Icde8f2df650f740b8e90691ba706a0853be84984
2017-05-14 20:34:04 -07:00
Sean Purser-Haskell 14b8112b42 Extra rounding to let hw to use narrower integers.
Change-Id: I175d6ff03f31a2e0d2fe7cd1c3852210d6e0ddf5
2017-05-11 17:30:02 +00:00
David Barker f7a5ee536b More accurate chroma warping
Previously, the projected positions of chroma pixels would effectively
undergo double rounding, since we round both when calculating x4 / y4
and when calculating the filter index. Further, the two roundings
were different: x4 / y4 used ROUND_POWER_OF_TWO_SIGNED, whereas
the filter index uses ROUND_POWER_OF_TWO.

It is slightly more accurate (and faster) to replace the first
rounding by a shift; this is motivated by the fact that
ROUND_POWER_OF_TWO(x >> a, b) == ROUND_POWER_OF_TWO(x, a + b)

Change-Id: Ia52b05745168d0aeb05f0af4c75ff33eee791d82
2017-05-11 11:25:07 +00:00
Sarah Parker 7afb8b758a Compute compound average in warp_plane only for COMPOUND_AVERAGE
This fixes a mismatch which occurs when global/warped motion and
a masked compound type are used together.

Change-Id: I08b2702cdb3b85f8d8817b9286a73951c97cf379
2017-05-05 16:38:21 +00:00
David Barker d8a423c62d Add SSSE3 warp filter + const-ify warp filters
The SSSE3 filter is very similar to the SSE2 filter, but
the horizontal pass is sped up by using the 8x8->16
multiplies added in SSSE3.

Also apply const-correctness to all versions of the filter

The timings of the existing filters are unchanged, and the
lowbd SSSE3 filter is ~17% faster than the lowbd SSE2 filter.

Timings per 8x8 block:
lowbd SSE2: 320ns
lowbd SSSE3: 273ns
highbd SSSE3: 300ns

Filter output is unchanged.

Change-Id: Ifb428a33b106d900cde1b080794796c0754ae182
2017-05-04 11:15:26 +01:00
David Barker 4644374134 Remove temporary condition from warp code
Patch https://aomedia-review.googlesource.com/c/10901/ temporarily
disabled the SSE2 warp filter for 4x4 blocks, because of a
data race when the filter was used at the right-hand edge of a
tile in a multithreaded encode.

This patch fixes the data race and re-enables the SSE2 warp filter.

Change-Id: I7058c897ddf538cd10001c5be13b1a1bfe8320fd
2017-05-02 17:46:20 +00:00
Yaowu Xu d74b56c086 Change to use unaligned load
Fixes segfaults due to aligned load of unaligned data.

Change-Id: If0106f2c2e7df6713c8db14cf360eabbb334cbb5
2017-05-02 01:38:58 +00:00
Debargha Mukherjee 79362e3307 Revert "Limit to 192 filters for warp, clamp index since in some cases index 192"
This reverts commit 266db85d4a.

Reason for revert: Reverting to prevent software slowdown. Will be implemented differently in a separate patch.

Change-Id: I386a9661c87d69e22761e5c01507f2f1f968433f
2017-04-28 19:44:11 +00:00
David Barker b62eef7b6e Fix encode/decode mismatch with global/warped motion
When predicting a 4x4 warp block (either using ZEROMV with
global-motion, or the WARPED_CAUSAL motion mode with
warped-motion), the warp filter would previously write
4 bytes to the right of the block.

This caused encode/decode mismatches when encoding with
multiple threads and tile_cols > 1, since in that case
we could end up overwriting already-generated pixels from
the next tile across.

This patch changes the filter so that we only overwrite the
intended pixels.

Change-Id: I3664b44e872e85aa5ccc0a5781f0f9ad994a5b80
2017-04-28 15:45:07 +00:00
Sean Purser-Haskell 266db85d4a Limit to 192 filters for warp, clamp index since in some cases index 192
is accessed.

Change-Id: I3d65123893663cc7d303056e46934aec153bc35b
2017-04-27 22:50:13 +00:00
Debargha Mukherjee 27f6e66e22 Reduce precision of shear parameters to 16 bits
Change-Id: I9cd9362edbb7b642f4b632bf574abfe5b2159ff3
2017-04-10 20:54:20 +00:00
David Barker 521383ae59 Add SSSE3 highbd warp filter
Change-Id: Ic3b8508c3364aecff1b2f53c7246a5e381b63018
2017-04-06 22:00:10 +00:00
Debargha Mukherjee 1d18460fab Reduce precision bis between shears
Change-Id: I89e981c9396c7a1ba8051d65036a16692da94d0d
2017-04-05 14:28:00 +00:00
David Barker 838367db1e Add correctness tests for the SSE2 warp filter
Also rename warp_affine() to av1_warp_affine()

Change-Id: I945baff6be8a1ea942ce88dfcfa5344af6b3a966
2017-01-19 16:55:58 +00:00
David Barker 1b888f2e9a Optimize SSE2 warp filter
Improve the speed of the warp filter itself by ~30%. This leads
to an overall decoder speedup of 5-20%, depending on bitrate,
for the global-motion experiment, and a small speedup for
warped-motion.

Applies a very minor change to the rounding during filter
selection (ROUND_POWER_OF_TWO makes slightly more sense here
than ROUND_POWER_OF_TWO_SIGNED, and is faster)

Change-Id: I3f364221d1ec35a8aac0d2c8b0e427f527d12e43
2017-01-19 16:55:52 +00:00
David Barker d5dfa96e88 Add SSE2 vectorized warp filter for lowbd
End-to-end speed improvements: (measured on tempete_cif.y4m,
20 frames for encoder and all 260 frames for decoder)

* GLOBAL_MOTION encoder: ~10% faster
* GLOBAL_MOTION decoder: 100-200% faster depending on bitrate
* WARPED_MOTION encoder: ~2.5% faster
* WARPED_MOTION decoder: ~20-40% faster depending on bitrate

The improvement in the GLOBAL_MOTION decoder is particularly
large because its runtime is dominated by calls to warp_plane().

This introduces minor changes to the output of the warp filter,
but these should be rare.

Change-Id: I5813ab9e90311e27587045153c32d400b6b9eb92
2017-01-12 17:14:35 +00:00