mozilla/aom - aom

Граф коммитов

Автор	SHA1	Сообщение	Дата
Yi Luo	5128109508	Fix inv txfm low/high bitdepth selection logic We are going to have several commits to setup new low/high bitdepth data path selection logic. This patch is for inverse transform. Let me summarize the ideas as following. - For low/high bitdepth selection, encoder depends on input configuration, e.g., video sequence bitdepth, profile. Decoder depends on input bitstream. This has nothing to do with compiler/build configuration. - Typical encoder usage for sampling format 4:2:0. 1) 8-bit video sequence: a) --profile=0 Fastest encoding/decoding pipeline on speedup. b) --profile=2 --bit-depth=10 Image pixels are left shifted by 2 bits. It employs 16-bit reference frame buffer and has high calculation precision. It usually enjoys higher compression performance. 2) 10/12-bit video sequence (HDR): --profile=2 --bit-depth=10/12 - Transform coefficient type: Lowbitdepth: int16_t Highbitdepth: int32_t - The type, tran_low_t is still used in codebase, Which is int32_t, defining the data path capacity. Naturally, it is high bitdepth. Eventually we shall remove the configuration flags, CONFIG_HIGHBITDEPTH/CONFIG_LOWBITDEPTH, and seperate low and high bitdepth data path. Two data paths co-exist in the same build environment. Change-Id: I35c06d4d4f19ebf80d909168fdddbae57c3cc884	2017-06-27 21:20:51 +00:00
Yi Luo	193422e76f	Add avx2 highbd_quantize_b - First pass encoding time reduces ~10.9% on i7-6700 at 100 frames, 1080p. - avx2 works for coeff number >= 8 cases; coeff number < 8 case will be implemented by sse2. - Unit test is added type B/FP/DC. Change-Id: Ibe5b7807c64e6dfc2d59c470ed50a6e8ca94ef7c	2017-06-22 15:52:01 +00:00
Timothy B. Terriberry	5d24b6f049	encoder: Remove 64x upsampled reference buffers They do not handle border extension correctly (interpolation and border extension do not commute unless you upsample into the border), nor do they handle crop dimensions that are not a multiple of 8 (the upsampled version is not sufficiently large), in addition to using massive amounts of memory and being a criminal waste of cache (1 byte used for every 8 bytes fetched). This commit reimplements use_upsampled_references by computing the subpixel samples on the fly. This implementation not only corrects the border handling, but is also faster, while maintaining the same quality. HL AWCY results are basically noise: PSNR \| PSNR HVS \| SSIM \| MS SSIM \| CIEDE 2000 0.0188 \| 0.0187 \| 0.0045 \| 0.0063 \| 0.0228 Change-Id: I7527db9f83b87a7bb8b35342f7e6457cd0bef9cd	2017-06-19 18:50:57 +00:00
Frederic Barbier	d405f8a627	Cleanup dead fwd transform functions Cleanup related wrappers and unit-tests. Change-Id: I2d37a8c80de63dbeaef584e3d5fa842c0b2ee6db	2017-06-08 16:56:59 +00:00
Urvang Joshi	766a389b58	Add a new experiment "rect-intra-pred". Earlier, intra prediction for rectangular blocks was performed by running two steps of prediction on square sub-blocks. With this experiment, we do proper intra prediction for rectangular blocks. This ensures that we make use of all available neighboring pixels especially for directional modes. For this, all the intra predictors were updated to work with rectangular transform block sizes. Performance improvements are small but free of cost: All Intra frames: lowres: -0.126 midres: -0.154 Video Overall: lowres: -0.043 midres: -0.100 [Could not get AWCY results due to a backlog.] BUG=aomedia:551 Change-Id: I7936e91b171d5c246cb0a4ea470a981a013892e6	2017-06-06 16:02:38 +00:00
Thomas Daede	8ea3319ee7	Remove VAR_BASED_PARTITION. BUG=aomedia:526 Change-Id: I5d9b86a36f412ded2d6f20e198d2f4de4f97aaeb	2017-05-30 17:54:55 +00:00
David Barker	0aa39ff054	ext-inter: Vectorize new masked SAD/SSE functions We would expect that these new functions would be slower than the old masked SAD/SSE functions, as they do additional work (blending two inputs and comparing to a third, rather than just comparing two inputs). This is true for the SAD functions, which are about 50% slower (depending on block size and bit depth). However, the sub-pixel SSE functions are comparable to the old speed for the accelerated special cases (xoffset or yoffset = 0 or 4), and are between 40-90% faster for the generic case. Change-Id: I1a296ed8fc9e3edc313a6add516ff76b17cd3e9f	2017-05-26 18:50:20 +00:00
David Barker	f19f35f7fb	ext-inter: Further cleanup * Rename the 'masked_compound_' functions to just 'masked_'. The previous names were intended to be temporary, to distinguish the old and new masked motion search pipelines. But now that the old pipeline has been removed, we can reuse the old names. * Simplify the new ext-inter compound motion search pipeline a bit. * Harmonize names: Rename aom_highbd_masked_compound_sub_pixel_variance* to aom_highbd_8_masked_sub_pixel_variance*, to match the naming of the corresponding non-masked functions Change-Id: I988768ffe2f42a942405b7d8e93a2757a012dca3	2017-05-24 00:02:53 +00:00
David Barker	5d34e6a738	Vectorize high-precision convolve filter Add SSE2 lowbd and SSSE3 highbd versions of the filters introduced in https://aomedia-review.googlesource.com/c/11962/ . These filters are equivalent in speed to the SSE2 implementations of the regular convolve filter. The average time to filter a 64x64 block is: lowbd C: 52us lowbd SSE2: 5.6us highbd C: 53us highbd SSSE3: 5.8us Also add a correctness test based on the warp filter tests. Change-Id: Ia0d81100e8a414bbfc2b5f664d751cf24765299e	2017-05-23 13:22:36 +00:00
David Barker	0f3c94e134	ext-inter: Delete dead code Patches https://aomedia-review.googlesource.com/c/11987/ and https://aomedia-review.googlesource.com/c/11988/ replaced the old masked motion search pipeline with a new one which uses different SAD/SSE functions. This resulted in a lot of dead code. This patch removes the now-dead code. Note that this includes vectorized SAD/SSE functions, which will need to be rewritten at some point for the new pipeline. It also includes the masked_compound_variance_* functions since these turned out not to be used by the new pipeline. To help with the later addition of vectorized functions, the masked_sad/variance_test.cc files are kept but are modified to work with the new functions. The tests are then disabled until we actually have the vectorized functions. Change-Id: I61b686abd14bba5280bed94e1be62eb74ea23d89	2017-05-23 06:13:26 +00:00
Tom Finegan	ba02c24cfe	Remove CONFIG_{DE,EN}CODERS from the CMake build. Use CONFIG_AV1_{DE,EN}CODER to control decoder and encoder support inclusion instead. BUG=aomedia:76,aomedia:508 Change-Id: Ib150ae382b301885589f30d9b6e98d3bfdd1afce	2017-05-22 22:35:37 +00:00
David Barker	c155e018ca	ext-inter: Use joint_motion_search for masked compounds Add functions which take both components of a masked compound and compute the resulting SAD/SSE. Extend joint_motion_search to understand masked compounds, and use it to evaluate NEW_NEWMV modes. Change-Id: I782199a20d119a6c61c6567df157508125ac7ce7	2017-05-18 21:55:59 +00:00
Debargha Mukherjee	28d15c715a	Experimental high precision convolve for Wiener Improves coding efficiency. Change-Id: I7bb12190cdc4581097809a020355cdc8867fc1ad	2017-05-15 19:48:41 +00:00
Ralph Giles	be111b3838	Remove armv6 media-extension assembly. Libvpx dropped armv6 support sometime after the aom fork. We don't intend to support this platform, which is likely too slow in any case. Remove the assembly and intrinsics optimized routines, their tests, cpu feature detection, and rtcd specialization for this instruction set extension. Change-Id: If44ec28e5ddafc6af179c5d1982ac7e81fe54d5e	2017-05-15 15:55:47 +00:00
Yi Luo	40f22ef85b	Partial IDCT 32x32 avx2 - Function level improvement (ms): Functions ssse3 avx2 Percentage idct32x32_1024 794 374 52.9% idct32x32_135 354 169 52.2% idct32x32_34 197 142 27.9% idct32x32_1 n/a 26 n/a - Integrating in default scan order. Change-Id: I84815112b26b8a8cb800281a1cfb1706342af57d	2017-05-11 05:48:44 +00:00
Yi Luo	f6176abb07	Partial IDCT 16x16 avx2 - Function level improvement: functions sse2 avx2 percentage idct16x16_256 365 226 38% idct16x16_38 n/a 136 n/a idct16x16_10 171 110 35% idct16x16_1 34 26 23% - Integrated in AV1 for default scan order. Change-Id: Ieb1a8e730bea9c371ebc0e5f4a748640d8f5e921	2017-05-08 22:18:18 +00:00
Urvang Joshi	e6ca8e8380	Add a new experiment SMOOTH_HV. This experiment extends ALT_INTRA by adding two new modes: smooth horizontal and smooth vertical. Improvement on intra frames in BDRate (PSNR): =============================================== AWCY (high latency): -0.46% (Also, -1.0% or more on PSNR Cb,Cr and APSNR Cb,Cr). AWCY (low latency): -0.43% (Also, -0.88% to -0.94% on PSNR Cb,Cr and APSNR Cb,Cr). Google sets: lowres: -0.454 midres: -0.484 hdres: -0.525 Improvement on video overall in BDRate (PSNR): ================================================ AWCY (high latency): -0.15% Google sets: lowres: -0.085 midres: -0.079 Change-Id: I9f4e7c1b8ded1fe244c72838f336103ccc715d50	2017-05-08 21:09:19 +00:00
Frederic Barbier	4fc8df674c	Cleanup dead high-bitdepth inverse-tx functions This patch removes dead code and prevents future implementations to rely on obsolete transforms. Future optimizations and tests should be based on latest C-functions (av1/common/av1_inv_txfm1d.c) Cleanup related last unit-test callers. BUG=aomedia:442 Change-Id: I24953cc1baf30dd7b720df8a72dd91b356b74cad	2017-04-27 16:14:44 +00:00
Yi Luo	3fcb356e78	Update partial inverse DCT according to VP9 - Partial inverse DCT unit tests have been enhanced. - IDCT x86_64 assembly code has been removed. Change-Id: Ic3bed2c0e70abdfd642a4f74fa969cc672d4795f	2017-04-26 15:57:11 +00:00
James Zern	4a2e3b2dc6	remove remaining refs to aom_highbd_idct8x8_64_add fixes high-bitdepth build: ./libaom.a(aom_dsp_rtcd.c.o): In function `setup_rtcd_internal': ./aom_dsp_rtcd.h:2614: undefined reference to `aom_highbd_idct8x8_64_add_c' missed in: `c756e4d04` Cleanup dead high-bitdepth inverse-tx functions BUG=aomedia:442 Change-Id: I63ee6fc5dbf85fd48efd9ff721868df6fb05eb09	2017-04-25 16:42:22 -07:00
Urvang Joshi	c3bcf3be11	Intra prediction: Remove unused variants. Directional predictors for 45, 63 and 207 angle had 2 or 3 variants each, and only one of them was actually being used. So, removed the C, sse2, ssse3 and neon versions of the unused ones. Updates to the test: - test_intra_pred_speed was testing the unused versions, so changed it to use the version actually used by code. This meant updating some golden MD5 values. - test_intra_pred_speed was NOT filling up bottom-left and top-right pixels randomly, so the predictors using these pixels weren't tested properly. This was fixed. BUG=aomedia:442 Change-Id: I09725d593408b81e0cd636e70a88c28eea5f2222	2017-04-25 07:17:25 +00:00
Yaowu Xu	4ff59b55e6	Cleanup: Remove const for params passed by value BUG=aomedia:448 Change-Id: Ieff977fca8a5033ddef2871a194870f59301ad8f	2017-04-24 23:30:43 +00:00
Sebastien Alaiwan	c6a48a2534	Drop support for CONFIG_EMULATE_HARDWARE This experiment complexifies DSP function dispatch, without bringing any real value (it's non-normative arbitrary behaviour). Moreover, it only has an effect on obsolete transforms, the new ones don't implement this mechanism. Change-Id: Idaccdd0c14ed6b7008cd4f365c7f017ba8ccacf5	2017-04-20 18:49:39 +00:00
Sebastien Alaiwan	71e87847eb	Homogenize configuration option name. Rename '--enable-aom-highbitdepth' to '--enable-highbitdepth' Change-Id: I1de13c3508c30c552532993419d8ace326142ab6	2017-04-12 22:29:11 +00:00
Urvang Joshi	c07b23de18	RTCD defs: refactor intra prediction protos. Change-Id: I0f4576522a07597dbb04089b02ca1fae67075ba4	2017-04-03 23:15:42 +00:00
Urvang Joshi	5ddac0aac8	RTCD defs: Remove empty specialize statements once and for all. A similar cleanup happened before, but the empty statements have since reappeared. I added a check in 'specialize' subroutine to die whenever such an empty specialize call is found, so that config+make would fail. Change-Id: I300ca0f0b077c0aeca8096d6460d8fb1c364d9b9	2017-03-31 16:40:03 +00:00
Yi Luo	9d24735537	High bit depth inter prediction filter AVX2 On i7-6700: - Function level speed improvement: 23%-29% - User level speed improvement: decoder: ~%2-%4. encoder: <1%. Change-Id: I02937a72304c3b356ca41e580352790df391f0a2	2017-03-30 23:12:13 +00:00
Alex Converse	4c5b020472	Make aom_sum_squares_2d_i16 take width and height parameters. SSE2 may be needed for nx4 and 4xn. Change-Id: I3c10112447fdb5fe51a68bc2c6e2f2641b102723	2017-03-30 15:49:22 +00:00
Steinar Midtskogen	b8ff6aaf5d	Add SIMD support for CDEF dering for sse2/ssse3 and neon Change-Id: Ibaaed850ddceba9c3db542eaf4a1c623ce6b412b	2017-03-29 23:47:21 +00:00
Steinar Midtskogen	73aa77c034	Increase parallelism in CLPF SIMD Change-Id: I66cdb67f8a1c2072516a65822dcc838e516ba9d7	2017-03-29 17:53:04 +00:00
Jean-Marc Valin	e9f7742437	Do real chroma RDO search for CDEF Chroma now has a list of strenghts too, with the superblock signalling shared between luma and chroma. low-latency, cpu=4: PSNR \| PSNR Cb \| PSNR Cr \| PSNR HVS \| SSIM \| MS SSIM \| CIEDE 2000 -0.0114 \| -1.4626 \| -1.4745 \| -0.0423 \| 0.0430 \| -0.0001 \| -0.7416 Change-Id: I389c77f1d80020f810e45f8502c656ad9d397c8c	2017-03-23 21:57:30 +00:00
Steinar Midtskogen	3c33def72c	Limit line buffer to 6 lines Change-Id: I6fedfa6427865e9a37fbdf9d9c1bf8be55222cba	2017-03-21 21:27:09 +00:00
Steinar Midtskogen	d280a84554	Remove boundary checks in CLPF Change-Id: Icc93783f47fe7fe3aac395aadcc8bbc307dae1fb	2017-03-21 21:27:09 +00:00
Steinar Midtskogen	a9d41e88d2	Merge dering/clpf rdo and filtering * Dering and clpf were merged into a single pass. * 32x32 and 128x128 filter block sizes for clpf were removed. * RDO for dering and clpf merged and improved: - "0" no longer required to be in the strength selection - Dering strength can now be 0, 1 or 2 bits per block LL HL PSNR: -0.04 -0.01 PSNR HVS: -0.27 -0.18 SSIM: -0.15 +0.01 CIEDE 2000: -0.11 -0.03 APSNR: -0.03 -0.00 MS SSIM: -0.18 -0.11 Change-Id: I9f002a16ad218eab6007f90f1f176232443495f0	2017-03-17 19:06:20 +00:00
Steinar Midtskogen	4305e6be8e	CLPF: Add quality dependent damping in the constrain function PSNR YCbCr: -0.17% -0.03% -0.40% APSNR YCbCr: -0.17% -0.02% -0.39% PSNRHVS: -0.06% SSIM: -0.17% MSSSIM: -0.07% CIEDE2000: -0.12% Change-Id: I69a4b6a4e18c22c3930069396540a6fee45cb30d	2017-02-27 08:06:02 +00:00
Jean-Marc Valin	0143513080	Merging the dering and clpf experiments into a single experiment: CDEF The result is identical to enabling both deringing and CLPF Change-Id: I71db5ba9e21fcaf11ad87e94841eaf80be58c0a8	2017-02-18 21:30:54 +00:00
Steinar Midtskogen	4f0b3ed8b8	Retune the CLPF kernel CLPF performance had degraded by about 0.5% over the past six months, which isn't totally surprising since the codec is a moving target. About half of that degradation comes from the improved 7 bit filter coefficients. Therefore, CLPF needs to be retuned for the current codec. This patch makes two (normative) changes to the CLPF kernel: * The clipping function was changed from clamp(x, -s, s) to sign(x) * max(0, abs(x) - max(0, abs(x) - s + (abs(x) >> (bitdepth - 3 - log2(s))))) This adds a rampdown to 0 at -32 and 32 (for 8 bit, -128 & 128 for 10 bit, etc), so large differences are ignored. * 8 taps instead of 6 taps: 1 4 3 13 31 -> 13 31 4 3 1 AWCY results: low delay high delay PSNR: -0.40% -0.47% PSNR HVS: 0.00% -0.11% SSIM: -0.31% -0.39% CIEDE 2000: -0.22% -0.31% APSNR: -0.40% -0.48% MS SSIM: 0.01% -0.12% About 3/4 of the gains come from the new clipping function. Change-Id: Idad9dc4004e71a9c7ec81ba62ebd12fb76fb044a	2017-02-10 23:00:16 +00:00
Steinar Midtskogen	73ad523642	Add support for disabling CLPF on tile boundaries Change-Id: Icb578f9b54c4020effa4b9245e343c1519bd7acb	2017-02-08 06:41:20 +00:00
Johann	cda0b5e46c	highbitdepth + loop restoration: fix build on x86 32 bit When the functions were added in https://aomedia-review.googlesource.com/6545 they were not restricted to x86_64 builds. Fixes "undefined reference to `aom_highbd_convolve8_add_src_sse2'" for --target=x86-linux-gcc Also remove SSE2 specializations from `aom_highbd_convolve8_add_src[_horiz/_vert]`, since those functions don't actually have SSE2 versions (this was left in by accident in the original patch). Change-Id: I9f7d0c11b58b6f5a0e6a1fdaed0f92175bdeab34	2017-01-27 16:36:30 +00:00
Urvang Joshi	bca73c4cb8	Make ALT_INTRA work with CB4X4. Change-Id: Ibc1803c3d149c6a53d1817798d0cab6dc5ab5927	2017-01-24 14:54:42 -08:00
Steinar Midtskogen	83307f33f2	Fix typos in comments Change-Id: Id70b49e2a77c6837da75c684d622ddfe60f3d97e	2017-01-07 10:26:28 +01:00
Steinar Midtskogen	d954f2d77d	Disable unsupported SIMD optimisations for CLPF for 32 bit VS targets VS compiling for 32 bit targets does not support vector types in structs as arguments, which makes the v256 type of the intrinsics hard to support, so optimizations for this target are disabled. Change-Id: I675394cf1aed0cb18a48f21216470867031b30ce	2017-01-07 08:59:56 +00:00
David Barker	be6cc07d82	Add new convolve variant for loop-restoration The convolve filters generated by loop_wiener_filter_tile are not compatible with some existing convolve implementations (they can have coefficients >128, sums of (certain subsets of) coefficients >128, etc.) So we implement a new variant, which takes a filter with 128 subtracted from its central element and which adds an extra copy of the source just before clipping to a pixel (reinstating the 128 we subtracted). This should be easy to adapt from the existing convolve functions, and this patch includes SSE2 highbd and SSSE3 lowbd implementations. Change-Id: I0abf4c2915f0665c49d88fe450dbc77b783f69e1	2017-01-03 17:15:29 +00:00
Jingning Han	cc5bdf4920	Add 2x2 block level variance functions for high bd Change-Id: I38259c4074f77a8941baefbe7585fff2eded6b12	2016-12-20 17:28:13 +00:00
Jingning Han	324b4c6d6a	Add 2x2 intra predictor for high bit-depth Provide primitive modules for cb4x4 mode use. This resolves compiler warnings when both high bit-depth and cb4x4 mode are turned on. Change-Id: If6ecac50578b3e665b602419a0701c3e047ce623	2016-12-20 17:28:13 +00:00
Jingning Han	e2ffaf884d	Add 2x4 and 4x2 variance functions Change-Id: Ic2fbc66e9212da32930c6a8ba1a749e3a37c5b9a	2016-12-15 20:19:19 +00:00
Yi Luo	e98325848d	High bit depth motion search SAD optimization on avx2 - For all blocks with width >= 16. - Add test_count to make the unit tests harder to pass. - Speed testing on 1080p, 100 frames, 5 Mbps, CPU, i7-6700 User level time reduction: baseline: 3.68% baseline + ext-partition: 36.12% Change-Id: I78c5d9ca216f0fd91f1a360dca2190b11fd54a08	2016-12-09 21:14:48 +00:00
Jingning Han	9e7c49fc8a	Add 2x2 variance function Change-Id: I73bcb8ab5727e2d07e34ca35e9e014f3c6f63d56	2016-12-07 05:47:55 +00:00
Jingning Han	7833d2bfbf	Enable 2x2 intra prediction Bring 2x2 intra prediction online for chroma components. Change-Id: Ia56af9101b2a977691bca4156a6dcf89e644b4a7	2016-12-02 01:46:59 +00:00
Yi Luo	9e218747c4	SAD avg and 4D avx2 optimization for ext-partition - User level time reduction <1% on i7-6700 cpu Change-Id: I8f15bde07dddd938df0b065e20ae94109e7b3b5b	2016-11-28 22:42:08 +00:00

1 2

72 Коммитов