mozilla/aom - aom

Граф коммитов

Автор	SHA1	Сообщение	Дата
Yunqing Wang	a3a4a34c60	Merge "vp9_ethread: the tile-based multi-threaded encoder"	2014-12-05 08:23:49 -08:00
Yunqing Wang	eba9c762a1	vp9_ethread: the tile-based multi-threaded encoder Currently, VP9 supports column-tile encoding, which allows a frame to be encoded in multiple column tiles independently. The number of column tiles are set by encoder option "--tile-columns". This provides a way to encode a frame in parallel. Based on previous set of patches, this patch implemented the tile- based multi-threaded encoder. Each thread processes one or more tiles. Usage: For HD clips: --tile-columns=2 --threads=1/2/3/4 While using 4 threads, tests showed that the encoder achieved 2.3X - 2.5X speedup at good-quality speed 3, and 2X speedup at realtime speed 5. Change-Id: Ied987f8f2618b1283a8643ad255e88341733c9d4	2014-12-04 11:21:34 -08:00
Peter de Rivaz	7e40a55ef9	Added high bitdepth sse2 transform functions Also removes some spurious changes in common/vp9_blockd.h which was introduced by a rebase issue between nextgen and master branches. Change-Id: If359f0e9a71bca9c2ba685a87a355873536bb282 (cherry picked from commit 005d80cd05269a299cd2f7ddbc3d4d8b791aebba) (cherry picked from commit 08d2f548007fd8d6fd41da8ef7fdb488b6485af3) (cherry picked from commit 4230c2306c194c058f56433a5275aa02a2e71d56)	2014-12-02 11:16:24 -08:00
Debargha Mukherjee	e9d9f1adab	Merge "Refactored idct routines and headers"	2014-11-24 12:47:03 -08:00
Peter de Rivaz	3a8c43a479	Refactored idct routines and headers This change is made in preparation for a subsequent patch which adds acceleration for the highbitdepth transform functions. The highbitdepth transform functions attempt to use 16/32bit sse instructions where possible, but fallback to using the C implementations if potential overflow is detected. For this reason the dct routines are made global so they can be called from the acceleration functions in the subsequent patch. Change-Id: Ia921f191bf6936ccba4f13e8461624b120c1f665 (cherry picked from commit 454342d4e77dbb67f4a3c10f97a57a6fcb46d9a0)	2014-11-24 09:57:40 -08:00
Debargha Mukherjee	02355a4abf	Merge "Added highbitdepth sse2 acceleration for quantize"	2014-11-21 16:08:47 -08:00
Peter de Rivaz	a7b2d09f36	Added highbitdepth sse2 acceleration for quantize Also includes block error. (This patch is mostly cherry picked from commit db7192e0b014a331a1dcb102c8a1148e9f0e1081) Change-Id: Idef18f90b111a0d0c9546543d3347e551908fd78	2014-11-19 23:55:19 -08:00
Jingning Han	c42715b721	Enable ssse3 version of vp9_fdct8x8_quant It improves the speed performance of vp9_fdct8x8_quant_sse2 by about 5%. Change-Id: I74b093ba4d81df64caf71ac7693f3d917f673097	2014-11-19 22:14:19 -08:00
Peter de Rivaz	48032bfcdb	Added sse2 acceleration for highbitdepth variance Change-Id: I446bdf3a405e4e9d2aa633d6281d66ea0cdfd79f (cherry picked from commit d7422b2b1eb9f0011a8c379c2be680d6892b16bc) (cherry picked from commit 6d741e4d76a7d9ece69ca117d1d9e2f9ee48ef8c)	2014-11-14 15:18:53 -08:00
Peter de Rivaz	7eee487c00	Added highbitdepth sse2 SAD acceleration and tests Change-Id: I1a74a1b032b198793ef9cc526327987f7799125f (cherry picked from commit b1a6f6b9cb47eafe0ce86eaf0318612806091fe5)	2014-11-12 14:25:45 -08:00
Yunqing Wang	687c56e802	Merge "SAD32xh and SAD64xh for AVX2"	2014-10-20 12:37:55 -07:00
levytamar82	7045aec00a	SAD32xh and SAD64xh for AVX2 All sad function that process above 32 consecutive elements are optimized for AVX2: vp9_sad64x64 vp9_sad64x32 vp9_sad32x64 vp9_sad32x32 vp9_sad32x16 vp9_sad64x64_avg vp9_sad64x32_avg vp9_sad32x64_avg vp9_sad32x32_avg vp9_sad32x16_avg The functions that appeared as a hotspot is vp9_sad32x32 and vp9_sad64x64 vp9_sad32x32 was optimized by 68% and vp9_sad64x64 was optimized by 90% both of them gave and overall ~2.3% user level gain Change-Id: Iccf86b375a2b54c5fbbe685902ead0c9a561b9fd	2014-10-19 13:59:10 -07:00
Alex Converse	7497d2fb23	Add a 32-bit friendly sse2 quantizer. This is based on the 64-bit ssse3 quantizer. 1.1x speedup for screen content at speed 7. Change-Id: I57d15415ef97c49165954bbe3daaaf9318e37448	2014-10-14 11:37:41 -07:00
Jim Bankoski	0ce51d823f	experimental : partition using 1/8 x 1/8 image The concept: There's too much noise in source pixels for variance and at low bitrate the reconstructed looks nothing like the source so we have problems getting good partitionings with either. This skirts the issue by using a box blur scaled down version for variance calculations. To compare against source_var_ moved keyframe to be rd based like source_var. Change-Id: Ie3babdbfadae324b7b5a76bea192893af27f0624	2014-10-07 16:36:14 -07:00
JackyChen	80465dae88	Add SSE2 code and unit test for VP9 denoiser. This SSE2 is based on VP8 denoiser's SSE2 code. In VP8, there are only 16x16 blocks in denoiser, while in VP9, there are 13 different block sizes. By adding this SSE2 code, the improvement of encoder speed is around 20%(using C code vs using SSE2 code), vary for different clips. The unit test for VP9 denoiser is to confirm that the SSE2 code is bit-exact with the C code. The unit test covers all block size. Change-Id: Ic8d8ac26db4ea40a5f146b5678a065af07eaaa3d	2014-10-06 15:27:40 -07:00
Dmitry Kovalev	1f19ebbab6	Replacing vp9_get_mb_ss_sse2 asm implementation with intrinsics. Change-Id: Ib4f5dd733eb2939b108070a01e83da5d9990bac0	2014-09-06 00:10:25 -07:00
Dmitry Kovalev	318fc0c34f	Removing MMX SAD calculation code. Removed functions: * vp9_sad_16x16_mmx * vp9_sad_8x16_mmx * vp9_sad_16x8_mmx * vp9_sad_8x8_mmx * vp9_sad_4x4_mmx Change-Id: Ic5174b93b64d65d846f0c11e72cab149e9472bc3	2014-09-02 14:41:36 -07:00
Dmitry Kovalev	12cd6f421d	Removing variance MMX code. Removed functions: * vp9_mse16x16_mmx * vp9_get_mb_ss_mmx * vp9_get4x4var_mmx * vp9_get8x8var_mmx * vp9_variance4x4_mmx * vp9_variance8x8_mmx * vp9_variance16x16_mmx * vp9_variance16x8_mmx * vp9_variance8x16_mmx They all have SSE2 equivalent. Change-Id: I3796f2477c4f59b35b4828f46a300c16e62a2615	2014-08-29 10:26:42 -07:00
Scott LaVarnway	6f4b8dcdc2	Neon version of vp9_subtract_block() On a Nexus 7, vpxenc (in realtime mode, speed -12) reported a performance improvement of ~3.2% Change-Id: I8862497264142171b7efc32df1a67714a23539f4	2014-07-31 09:28:06 -07:00
Scott LaVarnway	d4a37db5b8	Neon version of vp9_quantize_fp() On a Nexus 7, vpxenc (in realtime mode, speed -12) reported a performance improvement of ~12.4% Change-Id: Id29d215acf58bb108489e218a259adf74b4768d7	2014-07-30 09:33:46 -07:00
Scott LaVarnway	521cf7e879	Neon version of vp9_sub_pixel_variance16x16(), vp9_variance16x16(), and vp9_get16x16var(). On a Nexus 7, vpxenc (in realtime mode, speed -12) reported a performance improvement of ~16.7%. Change-Id: Ib163aa99f56e680194aabe00dacdd7f0899a4ecb	2014-07-30 08:17:32 -07:00
Scott LaVarnway	d19d222db6	Added vp9_fdct8x8_neon(), vp9_fdct8x8_1_neon() On a Nexus 7, vpxenc (in realtime mode, speed -12) reported a performance improvement of ~3.7%. Change-Id: I428c72c40df82c6d537955e320a8debf99343004	2014-07-29 08:56:05 -07:00
Tim Kopp	9d337d34f2	s/CONFIG_DENOISING/CONFIG_VP9_TEMPORAL_DENOISING This should prevent confusion with the VP8 CONFIG_TEMPORAL_DENOISING and other flags. Change-Id: I1fe4e2977895b7966841d861ab74317ad875b6c8	2014-07-24 13:43:52 -07:00
Scott LaVarnway	696fa52eaa	Added vp9_sad64x64_neon(), vp9_sad32x32_neon() and vp9_sad16x16_neon() On a Nexus 7, vpxenc (in realtime mode, speed -6) reported a performance improvement of ~17%. Change-Id: I91e070cde2973451083d3f3d63b49b7886de9a85	2014-07-16 12:54:46 -07:00
Alex Converse	03c276ea17	Split vp9_rdopt into vp9_rdopt and vp9_rd. vp9_rdopt is for making rd optimal mode decisions. vp9_rd is for all other rd related routines. Anything used outside of making an rd optimal decision belongs in rd. Change-Id: I772a3073f7588bdf139f551fb9810b6864d8e64b	2014-07-02 15:33:33 -07:00
James Zern	75cb82d87a	vp9cx.mk: move avx c files outside of x86inc block same reasoning as: `9f3a0db` vp9_rtcd: correct avx2 references these are all intrinsics, so don't depend on x86inc.asm Change-Id: I915beaef318a28f64bfa5469e5efe90e4af5b827	2014-06-25 12:20:46 -07:00
Tim Kopp	ab8bfb077b	Added skeleton for VP9 denoiser Change-Id: Iccf6ede4c4f85646b0f8daec47050ce93e267c90	2014-06-12 15:12:22 -07:00
Deb Mukherjee	e272273443	Renames x86_64 specific asm files Renames all x86_64 specific assembly files to consistently end in _x86_64.asm. This will be useful for build systems to handle these files differently. All new 64-bit specific assembly files should use the new naming convention. Change-Id: I36c89584967c82ffc4088b1b5044ac15d2bb7536	2014-05-21 13:55:56 -07:00
Yunqing Wang	c661cf0dad	Merge "AVX2 To VP9 Block Error Optimization"	2014-05-15 11:29:29 -07:00
levytamar82	1fbab853c8	AVX2 To VP9 Block Error Optimization vp9_block_error_sse2 can only handle 16 bytes at a time but the function requires to handle a sequence of 32 bytes at a time so each 16 bytes is handled in a different register. With AVX2 optimization the 32 bytes can be handled in one register instead of two in the SSE2 The vp9_block_error was optimized by 85%. The user level was optimized by 1.2% Change-Id: Ia8fffe60e61eff7432a5fbd538757894f6c319fd	2014-05-14 11:51:07 -07:00
Alex Converse	b5422fab46	Add an x86inc MMX fwht4x4. Change-Id: Ib0a73d4863478f9b8a00976379d25d2f6ebbb197	2014-05-08 12:01:27 -07:00
Paul Wilkins	33b1c457ed	Revert "Add an MMX fwht4x4" Includes changes that are not compatible with VS windows builds. Amongst other things stdint.h is not supported in VS. This reverts commit `89fbf3de50`. Change-Id: Ifa86d7df250578d1ada9b539c9ff12ed0c523cdd	2014-05-07 12:53:27 +01:00
Alex Converse	89fbf3de50	Add an MMX fwht4x4 7% faster encoding a desktop lossless at RT speed 4. Change-Id: I41627f5b737752616b6512bb91a36ec45995bf64	2014-05-05 15:10:48 -07:00
Dmitry Kovalev	e05b92c0aa	Merge "Removing half-variance asm functions which are not used."	2014-05-01 14:50:45 -07:00
Jingning Han	39761eb5d6	Merge "Enable SSSE3 implementation of 8x8 forward 2D-DCT"	2014-04-30 13:41:36 -07:00
Dmitry Kovalev	94f5491c46	Removing half-variance asm functions which are not used. Corresponding C functions were removed in I99695564a3aa9bc8c79ac0a551d257e2ff3ad3c3 Change-Id: I50a5575065a7a9e41904eb2161afd739def927db	2014-04-30 12:21:54 -07:00
Jingning Han	1eaa3a76dc	Enable SSSE3 implementation of 8x8 forward 2D-DCT Assembly implementation of ssse3 8x8 forward 2D-DCT. The current version is turned on only for x86_64. The average unit runtime goes from 157 cycles down to 136 cycles, i.e., about 12.8% faster. This translates into about 1.5% speed-up for pedestrian_area 1080p at speed 2. Change-Id: I0f12435857e9425ed7ce12541344dfa16837f4f4	2014-04-29 15:49:18 -07:00
Dmitry Kovalev	ef003078e8	Renaming "onyx" to "encoder". Actual renames: vp9_onyx_if.c -> vp9_encoder.c vp9_onyx_int.h -> vp9_encoder.h Change-Id: I80532a80b118d0060518e6c6a0d640e3f411783c	2014-04-22 14:57:05 -07:00
Jim Bankoski	e890c2579b	add a context tree structure to encoder This patch sets up a quad_tree structure (pc_tree) for holding all of pick_mode_context data we use at any square block size during encoding or picking modes. That includes contexts for 2 horizontal and 2 vertical splits, one none, and pointers to 4 sub pc_tree nodes corresponding to split. It also includes a pointer to the current chosen partitioning. This replaces code that held an index for every level in the pick modes array including: sb_index, mb_index, b_index, ab_index. These were used as stateful indexes that pointed to the current pick mode contexts you had at each level stored in the following arrays array ab4x4_context[][][], sb8x4_context[][][], sb4x8_context[][][], sb8x8_context[][][], sb8x16_context[][][], sb16x8_context[][][], mb_context[][], sb32x16[][], sb16x32[], sb32_context[], sb32x64_context[], sb64x32_context[], sb64_context and the partitioning that had been stored in the following: b_partitioning, mb_partitioning, sb_partitioning, and sb64_partitioning. Prior to this patch before doing an encode you had to set the appropriate index for your block size ( switch statement), update it ( up to 3 lookups for the index array value) and then make your call into a recursive function at which point you'd have to call get_context which then had to do a switch statement based on the blocksize, and then up to 3 lookups based upon the block size to find the context to use. With the new code the context for the block size is passed around directly avoiding the extraneous switch statements and multi dimensional array look ups that were listed above. At any level in the search all of the contexts are local to the pc_tree you are working on (in?). In addition in most places code that used to call sub functions and then check if the block size was 4x4 and index was > 0 and return now don't preferring instead to call the right none function on the inside. Change-Id: I06e39318269d9af2ce37961b3f95e181b57f5ed9	2014-04-17 07:30:55 -07:00
Dmitry Kovalev	2fc3a18653	Removing unused vp9_mcomp_x86.h file. We don't use declarations from this file. The real declarations (differently named) are in vp9_rtcd_defs.pl, e.g. vp9_full_search_sad. Change-Id: I73cbf064305710ba20747233cfdbe67366f069a0	2014-04-14 11:32:58 -07:00
Dmitry Kovalev	129cb23c14	Adding vp9_ssim.h file. Change-Id: Ib3b3864a6018c62ac1ea18e30795af74464596cd	2014-04-08 16:08:39 -07:00
Dmitry Kovalev	0a6d5547e2	Adding vp9_speed_features.{h, c}. Change-Id: I7d9874da8ff78a2d7e0cf11073af9c30538bc9a6	2014-03-28 10:30:28 -07:00
Marco Paniconi	2b06bf20ce	Move aq_mode=2 (complexity_aq) to separate file. Change-Id: Iffa45b9b04196c1ded6037622a8644a2500a62de	2014-03-26 18:01:59 -07:00
Jim Bankoski	7d76cc36df	Merge "vp9_write_bit_buffer.h header statics converted to globals"	2014-03-24 14:18:24 -07:00
Yunqing Wang	b458bb7c20	Merge "AVX2 SAD Optimization:"	2014-03-24 10:52:32 -07:00
Jim Bankoski	423590aa63	vp9_write_bit_buffer.h header statics converted to globals Change-Id: I12c29a630da1fbc5508f11b61d182f9b527b3a35	2014-03-24 09:56:06 -07:00
Marco Paniconi	03a9e5edb6	Rename the aq_mode files. Change-Id: Id76a628495c822e23825b66a7589b4a3279680e2	2014-03-21 15:20:59 -07:00
levytamar82	0fa8b668c1	AVX2 SAD Optimization: 2 functions were optimized for avx2 by using full 256 bit register In order to handle 32 elements in parallel instead of only 16 in parallel: 1. vp9_sad32x32x4d 2. vp9_sad64x64x4d The function level gain is 66% and the user level gain is ~1%. Change-Id: I4efbb3bc7d8bc03b64b6c98f5cd5c4a9dd3212cb	2014-03-21 13:53:32 -07:00
Marco Paniconi	6b83884ba9	In-frame q adjustment for cyclic background refresh. Activated using aq_mode=3. Change-Id: Ied628b9e7bd0e88b0c75790276bca75b19eb5c07	2014-03-18 10:59:21 -07:00
Marco Paniconi	78664081d1	Move svc layer_context to separate file. Change-Id: Ie47c139d48cb18409d71f98f6a5b9eeb9f9437a9	2014-03-13 14:39:45 -07:00

1 2 3

111 Коммитов