mozilla/aom - aom

Граф коммитов

Автор	SHA1	Сообщение	Дата
James Zern	4950dbceaf	Merge changes from topic 'rm-loopfilter-count-param' * changes: lpf_8_test: remove unneeded function wrapper remove loopfilter 'count' param TODOs split vpx_highbd_lpf_horizontal_16 in two split vpx_lpf_horizontal_16 in two vpx_highbd_lpf_horizontal_4: remove unused count param vpx_highbd_lpf_horizontal_8: remove unused count param vpx_highbd_lpf_vertical_4: remove unused count param vpx_highbd_lpf_vertical_8: remove unused count param vpx_lpf_horizontal_4: remove unused count param vpx_lpf_horizontal_8: remove unused count param vpx_lpf_vertical_4: remove unused count param vpx_lpf_vertical_8: remove unused count param lpf_8_test: add missing dspr2 tests lpf_8_test: add missing vpx_lpf_horizontal_4 tests lpf_8_test: add missing vpx_lpf_vertical_4 tests lpf_8_test: simplify function wrapper generation	2016-02-18 18:47:48 +00:00
Alex Converse	09f9c5d7f9	Better workaround for Bug 1089. Don't initialize first pass costs for a number of symbols where first pass probabilities aren't initialized. This brings a 1.22x first pass speedup. https://bugs.chromium.org/p/webm/issues/detail?id=1089 Change-Id: I97438c357bd88f52f5a15c697031cf0c3cc8f510	2016-02-17 14:46:26 -08:00
James Zern	110d377899	remove loopfilter 'count' param TODOs Change-Id: I25ce7314372ce2f521526ea7864ffc4ab62e4519	2016-02-16 23:14:03 -08:00
James Zern	9b44d9d00f	split vpx_highbd_lpf_horizontal_16 in two replace with vpx_highbd_lpf_horizontal_edge_16 and vpx_highbd_lpf_horizontal_edge_8 to avoid passing a count parameter Change-Id: I551f8cec0fce57032cb2652584bb802e2248644d	2016-02-16 23:13:58 -08:00
James Zern	1b519fb666	split vpx_lpf_horizontal_16 in two replace with vpx_lpf_horizontal_edge_16 and vpx_lpf_horizontal_edge_8 to avoid passing a count parameter Change-Id: I848c95c02a3c6ebaa6c2bdf0983dce05cd645271	2016-02-16 22:57:45 -08:00
James Zern	e7a23d703b	vpx_highbd_lpf_horizontal_4: remove unused count param Change-Id: I655a771e1b1a8753be5669ef9348a312ba6cfdbc	2016-02-16 22:57:45 -08:00
James Zern	5171857329	vpx_highbd_lpf_horizontal_8: remove unused count param Change-Id: Iaca71ea3796115d4c2d43563b4e6f3914e21f1bf	2016-02-16 22:57:44 -08:00
James Zern	3c1019e49d	vpx_highbd_lpf_vertical_4: remove unused count param Change-Id: Ic6da723c5cf3cd8127db1f476c3e46ea134cb774	2016-02-16 22:57:44 -08:00
James Zern	72a9f06ac2	vpx_highbd_lpf_vertical_8: remove unused count param Change-Id: Id16f7259897654831d31642c2d5e0bbe5e13416c	2016-02-16 22:57:44 -08:00
James Zern	b1e97c6a25	vpx_lpf_horizontal_4: remove unused count param Change-Id: Iec7d8eda343991f7d7d46931dca17af23c821d11	2016-02-16 22:57:27 -08:00
James Zern	bd5a5bb561	vpx_lpf_horizontal_8: remove unused count param Change-Id: I48741e167a7b09b7c9ad3bfc1c4b88ef1029ae46	2016-02-16 22:54:40 -08:00
James Zern	109a47b342	vpx_lpf_vertical_4: remove unused count param Change-Id: I43a191cb3d42e51e7bca266adfa11c6239a8064c	2016-02-16 14:59:00 -08:00
James Zern	37225744db	vpx_lpf_vertical_8: remove unused count param Change-Id: Ic69406da00afb0f06588e8c0deb2b043952b078c	2016-02-16 14:59:00 -08:00
Marco	3cbc26f31b	vp9-resize: Fix an issue with external dynamic resize. External dynamic resize with swapping width and height was not handled properly. Fix is to re-init loop-filter under certain condtions. Modify unittest to test this case. Without this change test will fail. Relates to: https://bugs.chromium.org/p/webm/issues/detail?id=1140 Change-Id: I7d81ca7fe0783b3bc103a52a7b7cf073a96be26e	2016-02-12 15:06:48 -08:00
James Zern	ecd32d6faa	Merge "Vidyo patch: Optimization for 1-to-2 downsampling and upsampling."	2016-02-05 02:36:03 +00:00
Scott LaVarnway	989c69303d	Vidyo patch: Optimization for 1-to-2 downsampling and upsampling. Change-Id: I9cc9780f506e025aea57485a9e21f0835faf173c	2016-02-04 14:50:26 -08:00
Paul Wilkins	e062eb16fb	Merge "Loop filter search resets on overlay frame."	2016-02-02 14:44:47 +00:00
hui su	5afc4e4c77	Fix some typos. Change-Id: I32aacd014df6c927cf2893dc096cbe6ec7604b9b	2016-01-27 16:12:49 -08:00
Scott LaVarnway	5232326716	VP9: Eliminate MB_MODE_INFO Change-Id: Ifa607dd2bb366ce09fa16dfcad3cc45a2440c185	2016-01-19 16:40:20 -08:00
paulwilkins	733bbab53a	Loop filter search resets on overlay frame. This patch fixes a bug that causes the loop filter search to reset to a low value or zero after each arf overlay frame. We expect the overlay frames to need little or no loop filtering but this should not propagate. Change-Id: I895b28474cf200f20d82793f3de40b60b19579fd	2016-01-19 13:05:15 +00:00
Scott LaVarnway	d4bc17d696	Merge "VP9: inline vp9_use_mv_hp()"	2016-01-14 13:36:40 +00:00
Scott LaVarnway	a85e552d95	VP9: Remove decoder args from find_mv_refs_idx() The decoder does not use this function. Change-Id: Ie67f909c0f4108ef286789c70df867d4b960a780	2016-01-13 13:30:40 -08:00
Scott LaVarnway	de993a847f	VP9: inline vp9_use_mv_hp() Change-Id: Ib275bfc4c29c572d6c70e5ec6dbfc241590d3e3e	2016-01-13 08:02:05 -08:00
Scott LaVarnway	15939cb2d7	Merge "VP9: Eliminate unnecessary nearest/near searches"	2016-01-12 20:00:59 +00:00
Scott LaVarnway	d8aa40634a	VP9: Eliminate unnecessary nearest/near searches Prior to this patch, read_inter_block_mode_info() would find the nearmv and nearestmv for all modes. Now it does not search for ZEROMV modes and breaks out early for NEARMV and NEWMV modes. Change-Id: Ifa7b1eaf58bb03b9c7792ea5012fef477527d0fd	2016-01-12 05:09:06 -08:00
Yaowu Xu	2bd4f44409	Assert no mv clamping for scaled references Under --enable-better-hw-compabibility, this commit adds the asserts that no mv clamping is applied for scaled references, so when built with this configure option, decoder will assert if an input bitstream triggger mv clamping for scaled reference frames. Change-Id: I786e86a2bbbfb5bc2d2b706a31b0ffa8fe2eb0cb	2016-01-05 14:55:05 -08:00
Yaowu Xu	ce6d3f1de4	Merge "Assert no 8x4/4x8 partition for scaled references"	2016-01-05 20:35:46 +00:00
Yaowu Xu	03a021a6fc	Assert no 8x4/4x8 partition for scaled references This commit adds a new configure option: --enable-better-hw-compatibility The purpose of the configure option is to provide information on known hardware decoder implementation bugs, so encoder implementers may choose to implement their encoders in a way to avoid triggering these decoder bugs. The WebM team were made aware of that a number of hardware decoders have trouble in handling the combination of scaled frame reference frame and 8x4 or 4x8 partitions. This commit added asserts to vp9 decoder, so when built with above configure option, the decoder can assert if an input bitstream triggers such decoder bug. Change-Id: I386204cfa80ed16b50ebde57f886121ed76200bf	2016-01-04 18:33:37 -08:00
James Zern	d36659cec7	move vp9_avg to vpx_dsp Change-Id: I7bc991abea383db1f86c1bb0f2e849837b54d90f	2015-12-14 14:42:12 -08:00
Jacky Chen	d9bba21306	Merge "Add vp9_avg_4x4_neon and the unit test."	2015-12-09 06:09:33 +00:00
jackychen	303f144eef	Add vp9_avg_4x4_neon and the unit test. Change-Id: I3ef9a9648841374ed3cc865a02053c14ad821a20	2015-12-08 17:23:36 -08:00
Scott LaVarnway	f0b0b1fe62	VP9: Add ssse3 version of vpx_idct32x32_135_add() Change-Id: I9a780131efaad28cf1ad233ae64c5c319a329727	2015-12-02 04:50:46 -08:00
James Zern	fd51d90159	Merge changes Iaf8cbe95,I6748183d,I2a49811d * changes: add vp9_satd_neon fix vp9_satd_sse2 vp9_satd: return an int	2015-11-25 01:48:53 +00:00
James Zern	eb1d0f8d60	add vp9_satd_neon ~60-65% faster at the function level across block sizes Change-Id: Iaf8cbe95731c43fdcbf68256e44284ba51a93893	2015-11-24 16:09:10 -08:00
Alex Converse	4b038ad2ef	Merge "Deduplicate some high bit depth tables"	2015-11-24 18:24:32 +00:00
James Zern	60760f710f	fix vp9_satd_sse2 accumulate satd in 32-bits + add unit test Change-Id: I6748183df3662ddb9d635f9641f9586f2fd38ad5	2015-11-20 14:35:46 -08:00
James Zern	3e0138edb7	vp9_satd: return an int the final sum may use up to 26 bits + add a unit test + disable the sse2 as the result will rollover; this will be fixed in a future commit Change-Id: I2a49811dfaa06abfd9fa1e1e65ed7cd68e4c97ce	2015-11-20 14:35:38 -08:00
paulwilkins	0149fb3d6b	Changes to exhaustive motion search. This change alters the nature and use of exhaustive motion search. Firstly any exhaustive search is preceded by a normal step search. The exhaustive search is only carried out if the distortion resulting from the step search is above a threshold value. Secondly the simple +/- 64 exhaustive search is replaced by a multi stage mesh based search where each stage has a range and step/interval size. Subsequent stages use the best position from the previous stage as the center of the search but use a reduced range and interval size. For example: stage 1: Range +/- 64 interval 4 stage 2: Range +/- 32 interval 2 stage 3: Range +/- 15 interval 1 This process, especially when it follows on from a normal step search, has shown itself to be almost as effective as a full range exhaustive search with step 1 but greatly lowers the computational complexity such that it can be used in some cases for speeds 0-2. This patch also removes a double exhaustive search for sub 8x8 blocks which also contained a bug (the two searches used different distortion metrics). For best quality in my test animation sequence this patch has almost no impact on quality but improves encode speed by more than 5X. Restricted use in good quality speeds 0-2 yields significant quality gains on the animation test of 0.2 - 0.5 db with only a small impact on encode speed. On most clips though the quality gain and speed impact are small. Change-Id: Id22967a840e996e1db273f6ac4ff03f4f52d49aa	2015-11-13 10:16:31 +00:00
Geza Lore	5eefd3ebfd	Add AVX vectorized vp9_diamond_search_sad This function now has an AVX intrinsics version which is about 80% faster compared to the C implementation. This provides a 2-4% total speed-up for encode, depending on encoding parameters. The function utilizes 3 properties of the cost function lookup table, constructed in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'. For the joint cost: - mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3] For the component costs: - For all i: mvsadcost[0][i] == mvsadcost[1][i] (equal per component cost) - For all i: mvsadcost[0][i] == mvsadcost[0][-i] (Cost function is even) These must hold, otherwise the AVX version of the function cannot be used. Change-Id: I6c2791d43022822a9e6ab43cd124a773946d0bdc	2015-11-11 14:03:47 +00:00
James Zern	30466f26b4	Revert "Add AVX vectorized vp9_diamond_search_sad" This reverts commit `f1342a7b07`. This breaks 32-bit builds: runtime error: load of misaligned address 0xf72fdd48 for type 'const __m128i' (vector of 2 'long long' values), which requires 16 byte alignment + _mm_set1_epi64x is incompatible with some versions of visual studio Change-Id: I6f6fc3c11403344cef78d1c432cdc9147e5c1673	2015-11-06 13:15:01 -08:00
Yunqing Wang	57cae22c1e	Merge "Add AVX vectorized vp9_diamond_search_sad"	2015-11-05 20:17:13 +00:00
Geza Lore	f1342a7b07	Add AVX vectorized vp9_diamond_search_sad This function now has an AVX intrinsics version which is about 80% faster compared to the C implementation. This provides a 2-4% total speed-up for encode, depending on encoding parameters. The function utilizes 3 properties of the cost function lookup table, constructed in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'. For the joint cost: - mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3] For the component costs: - For all i: mvsadcost[0][i] == mvsadcost[1][i] (equal per component cost) - For all i: mvsadcost[0][i] == mvsadcost[0][-i] (Cost function is even) These must hold, otherwise the AVX version of the function cannot be used. Change-Id: I184055b864c5a2dc37b2d8c5c9012eb801e9daf6	2015-11-05 10:02:17 +00:00
Alex Converse	246e0eaa71	Deduplicate some high bit depth tables Change-Id: I6977f7d155cc1e81ae2393933893caac6770821f	2015-11-03 15:40:44 -08:00
hui su	e085fb643f	Generate intra prediction reference values only when necessary This can help increase encoding speed substantially. Change-Id: Id0c009146e6e74d9365add71c7b10b9a57a84676	2015-11-02 10:26:50 -08:00
Alex Converse	989193c797	Make the zero handling in extend_to_full_distribution more explicit. The old workaround "p = 0 ? 0 : p -1" is misleading. ?: happens before = assigning back to p truncates to one byte. Therefore it is equivalent to (p - 1) & 0xFF, but the check just exists to work around a first pass bug, so let's make the work around more clear. https://bugs.chromium.org/p/webm/issues/detail?id=1089 Change-Id: I587c44dd61c1f3767543c0126376f881889935af	2015-10-29 14:46:55 -07:00
Alex Converse	663960e757	Revert "Replace the zero handling in extend_to_full_distribution." This reverts commit `7f56cb2978`. It causes uninitialized reads in the first pass setting up later cost tables. Change-Id: I2df498df3f5c03eff359f79edf045aed0c618dc9	2015-10-28 11:51:40 -07:00
Alex Converse	7f56cb2978	Replace the zero handling in extend_to_full_distribution. The old workaround "p = 0 ? 0 : p -1" is misleading. ?: happens before = assigning back to p truncates to one byte. Therefore it is equivalent to (p - 1) & 0xFF, but the check just exists to work around a first pass bug, so let's make the work around more clear. https://code.google.com/p/webm/issues/detail?id=1089 Change-Id: Ia6dcc8922e1acbac0eeca23a4d564a355c489572	2015-10-26 11:29:46 -07:00
Geza Lore	aa8f85223b	Optimize vp9_highbd_block_error_8bit assembly. A new version of vp9_highbd_error_8bit is now available which is optimized with AVX assembly. AVX itself does not buy us too much, but the non-destructive 3 operand format encoding of the 128bit SSEn integer instructions helps to eliminate move instructions. The Sandy Bridge micro-architecture cannot eliminate move instructions in the processor front end, so AVX will help on these machines. Further 2 optimizations are applied: 1. The common case of computing block error on 4x4 blocks is optimized as a special case. 2. All arithmetic is speculatively done on 32 bits only. At the end of the loop, the code detects if overflow might have happened and if so, the whole computation is re-executed using higher precision arithmetic. This case however is extremely rare in real use, so we can achieve a large net gain here. The optimizations rely on the fact that the coefficients are in the range [-(2^15-1), 2^15-1], and that the quantized coefficients always have the same sign as the input coefficients (in the worst case they are 0). These are the same assumptions that the old SSE2 assembly code for the non high bitdepth configuration relied on. The unit tests have been updated to take this constraint into consideration when generating test input data. Change-Id: I57d9888a74715e7145a5d9987d67891ef68f39b7	2015-10-21 12:30:40 +01:00
Yaowu Xu	568429512e	Add a new enum type vpx_color_range_t to make meaning of color_range obvious. Change-Id: I303582e448b82b3203b497e27b22601cc718dfff	2015-10-16 16:27:18 -07:00
Geza Lore	0134764fa6	Optimization of 8bit block error for high bitdepth If high bit depth configuration is enabled, but encoding in profile 0, the code now falls back on optimized SSE2 assembler to compute the block errors, similar to when high bit depth is not enabled. Change-Id: I471d1494e541de61a4008f852dbc0d548856484f	2015-10-08 14:05:25 -07:00

1 2 3 4 5 ...

3103 Коммитов