mozilla/aom - aom

Граф коммитов

Автор	SHA1	Сообщение	Дата
David Barker	6c4af6b7c8	Fix some irregularities in SSE2 variance code Change-Id: I1dcf3bd33645aed3347301149808c157eeb44cad	2017-06-28 00:53:28 +00:00
Debargha Mukherjee	f053cba247	Reduce multiplier precision for warp least squares Includes reordering and other clamping changes, as well as changes to reduce multiplier precision. cam_lowres (60 frames): -0.092% BDRATE improvement in --disable-cdef --disable-global-motion --disable-ext-tx configuation. Change-Id: I0660c45b44fcd5a193534d8dadd1aa1ae5c5e27a	2017-06-27 21:49:32 +00:00
Yi Luo	5128109508	Fix inv txfm low/high bitdepth selection logic We are going to have several commits to setup new low/high bitdepth data path selection logic. This patch is for inverse transform. Let me summarize the ideas as following. - For low/high bitdepth selection, encoder depends on input configuration, e.g., video sequence bitdepth, profile. Decoder depends on input bitstream. This has nothing to do with compiler/build configuration. - Typical encoder usage for sampling format 4:2:0. 1) 8-bit video sequence: a) --profile=0 Fastest encoding/decoding pipeline on speedup. b) --profile=2 --bit-depth=10 Image pixels are left shifted by 2 bits. It employs 16-bit reference frame buffer and has high calculation precision. It usually enjoys higher compression performance. 2) 10/12-bit video sequence (HDR): --profile=2 --bit-depth=10/12 - Transform coefficient type: Lowbitdepth: int16_t Highbitdepth: int32_t - The type, tran_low_t is still used in codebase, Which is int32_t, defining the data path capacity. Naturally, it is high bitdepth. Eventually we shall remove the configuration flags, CONFIG_HIGHBITDEPTH/CONFIG_LOWBITDEPTH, and seperate low and high bitdepth data path. Two data paths co-exist in the same build environment. Change-Id: I35c06d4d4f19ebf80d909168fdddbae57c3cc884	2017-06-27 21:20:51 +00:00
Yaowu Xu	d43d6777a6	quantize.c: convert to int before apply sign This change makes the conversions similar to those in av1_quantize.c, and fix ubsan warnings shown in nightly tests. Change-Id: I90851a80dcb9f052a32bf22199fd9ef8ff927725	2017-06-26 15:49:37 -07:00
James Zern	284c883029	aom_dsp.cmake: add highbd_quantize_intrin_avx2.c added in: `193422e76` Add avx2 highbd_quantize_b Change-Id: Ie4ba48042ffd36d69d2bf200bba12a1d924c8f9c	2017-06-26 21:36:59 +00:00
Lester Lu	ad8290b8e6	New experiment: LGT In previous ADSTs, DST-7 and DST-4 are used for length 4 and length 8/16/32, respectively. In this LGT experiment we explore transforms between DST-4 and DST-7. When CONFIG_LGT flag is on, adst4 and adst8 are replaced by lgt4 and lgt8, the intermediate transforms with pre-chosen parameters. The LGTs applied here are lgt4_160 and lgt8_170, where the numbers mean the self-loop weights times 100. The associated values for DST-7 and DST-4 are 100 and 200. ovr_psnr: lowres: -0.140 midres: -0.131 hdres: -0.078 These changes are not applied to the highbd scenario in the current version. Change-Id: I20600456da8766528b2b6b11aa28801e70af498e	2017-06-26 19:11:25 +00:00
Steinar Midtskogen	079acac180	Silence warnings in VS BUG=aomedia:615 Change-Id: I827e857d310020705a5292ef8fe817bc042d8dd0	2017-06-22 20:01:25 +00:00
Yi Luo	193422e76f	Add avx2 highbd_quantize_b - First pass encoding time reduces ~10.9% on i7-6700 at 100 frames, 1080p. - avx2 works for coeff number >= 8 cases; coeff number < 8 case will be implemented by sse2. - Unit test is added type B/FP/DC. Change-Id: Ibe5b7807c64e6dfc2d59c470ed50a6e8ca94ef7c	2017-06-22 15:52:01 +00:00
Tom Finegan	78516fca4e	Build static libaom without internal deps in CMake. Change the internal lib targets so that external apps need link only libaom instead of all internal library targets and libaom. BUG=aomedia:76,aomedia:609 Change-Id: I38862fcd90cb585300b6b23e8558f78a1934750f	2017-06-20 19:57:02 +00:00
Tom Finegan	84f2d796c8	Add shared library support to the CMake build. This is enabled via: $ cmake path/to/aom -DBUILD_SHARED_LIBS=1 Currently supports only Linux and MacOS targets. Symbol visibility is handled by exports.cmake and its helpers exports_sources.cmake and generate_exports.cmake. Some sweeping changes were required to properly support shared libs and control symbol visibility: - Object libraries are always linked privately into static libraries. - Static libraries are always linked privately into eachother in the many cases where the CMake build merges multiple library targets. - aom_dsp.cmake now links all its targets into the aom_dsp static library target, and privately links aom_dsp into the aom target. - av1.cmake now links all its targets into the aom_av1 static library target, and privately links in aom_dsp and aom_scale as well. It then privately links aom_av1 into the aom target. - The aom_mem, aom_ports, aom_scale, and aom_util targets are now static libs that are privately linked into the aom target. - In CMakeLists.txt libyuv and libwebm are now privately linked into app targets. - The ASM and intrinsic library functions in aom_optimization.cmake now both require a dependent target argument. This facilitates the changes noted above regarding new privately linked static library targets for ASM and intrinsics sources. BUG=aomedia:76,aomedia:556 Change-Id: I4892059880c5de0f479da2e9c21d8ba2fa7390c3	2017-06-20 19:24:53 +00:00
Jingning Han	71adf5292a	Revert "Clamp inverse transform coefficients" This reverts commit `79b78b7d47`. The transform coefficient range needs some more tuning. Before we finalize on that front, directly applying clamping would cause multiple unit test failure issues. Hence revert this Cl temporarily. BUG=aomedia:612 Change-Id: I1dd8680dee17289801c4a209275f05a498355c8e	2017-06-19 21:50:48 +00:00
Timothy B. Terriberry	5d24b6f049	encoder: Remove 64x upsampled reference buffers They do not handle border extension correctly (interpolation and border extension do not commute unless you upsample into the border), nor do they handle crop dimensions that are not a multiple of 8 (the upsampled version is not sufficiently large), in addition to using massive amounts of memory and being a criminal waste of cache (1 byte used for every 8 bytes fetched). This commit reimplements use_upsampled_references by computing the subpixel samples on the fly. This implementation not only corrects the border handling, but is also faster, while maintaining the same quality. HL AWCY results are basically noise: PSNR \| PSNR HVS \| SSIM \| MS SSIM \| CIEDE 2000 0.0188 \| 0.0187 \| 0.0045 \| 0.0063 \| 0.0228 Change-Id: I7527db9f83b87a7bb8b35342f7e6457cd0bef9cd	2017-06-19 18:50:57 +00:00
Sebastien Alaiwan	79b78b7d47	Clamp inverse transform coefficients When --enable-coefficient-range-checking isn't specificed, clamp the coefficient at each stage. This doesn't change the decoder behaviour for existing AV1 streams. However, some AV1 bitstreams that would have been rejected by the decoder as illegal (range check failure) are now legal bitstreams. There is no impact on video quality. BUG=aomedia:30 Change-Id: Ifa01186bae6bfe5d7712298e33d964c20f88435e	2017-06-16 23:26:26 +00:00
Tom Finegan	3613c51767	Sync CMake build with the configure build. - Added: CONFIG_COLORSPACE_HEADERS CONFIG_SPEED_REFS CONFIG_LGT CONFIG_SBL_SYMBOL - Removed: CONFIG_RECT_INTRA_PRED - Changed, 0 => 1: CONFIG_EXT_INTER CONFIG_INTERINTRA CONFIG_WEDGE CONFIG_COMPOUND_SEGMENT 1 => 0: CONFIG_ONE_SIDED_COMPOUND BUG=aomedia:76 Change-Id: If9ebd068d0014386ec25d91226a577c591f5a774	2017-06-16 06:56:48 +00:00
Sebastien Alaiwan	4770189d2b	Remove dead macro Change-Id: I6a582b85f535d2dbeb4b6af46fc45357d56f2f2b	2017-06-14 20:13:39 +00:00
Jingning Han	105eecf4c6	Resolve compiler warning when highbd is off The highbd_clip_pixel_add() function is generalized to be used in the regular 8 bit path. Move its defintions outside the highbd experimental flag. This resolves the comiler warning in unit tests when high bit-depth is turned off. Change-Id: I90a744adb2381c9bf8476aa2a2bd0c87d9afdf57	2017-06-11 05:22:58 +00:00
David Barker	dab3e99b27	Fix Windows x86 build with --enable-ext-inter The Windows calling convention pushes any __m128i type arguments after the 3rd (4th on x86-64) onto the stack. But on x86, stack-allocated arguments are not guaranteed to be aligned to a multiple of their natural alignment, leading to compile errors. We fix this by making the functions which take >3 __m128i arguments instead take pointers. Since the functions are marked INLINE, the extra memory operations should optimize out. BUG=aomedia:587 Change-Id: I0cb2831fd12aded6f2821c037365386e6183ba5c	2017-06-09 18:31:47 +00:00
Thomas Davies	92aa22a8a3	AOM_QM: Use 8-bit matrices and fix 2x2 transform sizes. 2x2 transforms are now hidden behind the CHROMA_2X2 macro, not the CB4X4 macro. Change-Id: I5d73c679fba486ccda98fa8dbb804a3902df6c8d	2017-06-09 16:43:52 +00:00
Frederic Barbier	d405f8a627	Cleanup dead fwd transform functions Cleanup related wrappers and unit-tests. Change-Id: I2d37a8c80de63dbeaef584e3d5fa842c0b2ee6db	2017-06-08 16:56:59 +00:00
Sarah Parker	31c66502fa	Remove deprecated high-bitdepth functions This unifies the codepath for high-bitdepth transforms and deletes all calls to the old deprecated versions. This required reworking the way 1d configurations are combined in order to support rectangular transforms. There is one remaining codepath that calls the deprecated 4x4 hbd transform from encoder/encodemb.c. I need to take a closer look at what is happening there and will leave that for a followup since this change has already gotten so large. lowres 10 bit: -0.035% lowres 12 bit: 0.021% BUG=aomedia:524 Change-Id: I34cdeaed2461ed7942364147cef10d7d21e3779c	2017-06-08 01:26:44 +00:00
Urvang Joshi	766a389b58	Add a new experiment "rect-intra-pred". Earlier, intra prediction for rectangular blocks was performed by running two steps of prediction on square sub-blocks. With this experiment, we do proper intra prediction for rectangular blocks. This ensures that we make use of all available neighboring pixels especially for directional modes. For this, all the intra predictors were updated to work with rectangular transform block sizes. Performance improvements are small but free of cost: All Intra frames: lowres: -0.126 midres: -0.154 Video Overall: lowres: -0.043 midres: -0.100 [Could not get AWCY results due to a backlog.] BUG=aomedia:551 Change-Id: I7936e91b171d5c246cb0a4ea470a981a013892e6	2017-06-06 16:02:38 +00:00
Tom Finegan	6f9dfa5141	Sync CMake build defaults with the configure build. - Added: CONFIG_ONE_SIDED_COMPOUND CONFIG_VAR_REFS - Removed: CONFIG_SUB8X8_MC CONFIG_EC_MULTISYMBOL CONFIG_DAALA_EC CONFIG_LOWDELAY_COMPOUND - Changed, 0 => 1: CONFIG_VAR_TX CONFIG_EC_SMALLMUL CONFIG_CHROMA_SUB8X8 CONFIG_LOOPFILTERING_ACROSS_TILES CONFIG_TEMPMV_SIGNALING BUG=aomedia:76 Change-Id: Ia010abeaf079d8c6318a5a540e9354d5455ce826	2017-06-02 16:46:51 +00:00
Tom Finegan	17ccaec4bb	Add include guards to CMake files used as includes. BUG=aomedia:76 Change-Id: Ie34025f31a89f4991d03d5ecf03c6f6f5ab7b0a1	2017-06-02 16:43:58 +00:00
Ryan Lei	17905edfe0	integrate parallel_deblocking with CB4x4 this change makes parallel deblocking experiment works with cb4x4. the inner loop process every 4x4 block. Change-Id: I86adb3d7b6d67a91ccc12aab29da9bfb8c522cf1	2017-06-02 16:09:30 +00:00
Thomas Daede	8ea3319ee7	Remove VAR_BASED_PARTITION. BUG=aomedia:526 Change-Id: I5d9b86a36f412ded2d6f20e198d2f4de4f97aaeb	2017-05-30 17:54:55 +00:00
Debargha Mukherjee	11cf46f4af	High precision Wiener filter rework Implements the high precision Wiener filter with an offset to reduce the error due to saturation without increasing the number of bits needed for intermediate precision. Also turns the high precision filter on. Change-Id: I34037a5746a6a89c5fce67753c1b027749085edf	2017-05-27 01:20:14 +00:00
David Barker	0aa39ff054	ext-inter: Vectorize new masked SAD/SSE functions We would expect that these new functions would be slower than the old masked SAD/SSE functions, as they do additional work (blending two inputs and comparing to a third, rather than just comparing two inputs). This is true for the SAD functions, which are about 50% slower (depending on block size and bit depth). However, the sub-pixel SSE functions are comparable to the old speed for the accelerated special cases (xoffset or yoffset = 0 or 4), and are between 40-90% faster for the generic case. Change-Id: I1a296ed8fc9e3edc313a6add516ff76b17cd3e9f	2017-05-26 18:50:20 +00:00
Cheng Chen	60f59618ab	Function parameter type correction Make function parameter and pass in value the same type. Change-Id: Ie2172b99b4cda81ac1d51f7ef1018bb9d4f55016	2017-05-26 01:28:04 +00:00
Nathan E. Egge	476c63c1dd	Remove the DAALA_EC experiment. This patch forces DAALA_EC on by default and removes the dkbool coder. Change-Id: Icd2ff08efd7bf467adf554344111473cb357adf8	2017-05-25 18:31:31 +00:00
Tom Finegan	c156224a40	Support CONFIG_LOOP_RESTORATION in the CMake build. BUG=aomedia:76 Change-Id: Ifabdedd5e027e1efd87ba8ca1bbf0af06481bb5c	2017-05-25 16:48:41 +00:00
Tom Finegan	1738c1091f	aom_dsp: Remove empty CMake target. Change-Id: I6e79d605818806005b65197745ca1bb66efb3890	2017-05-25 16:48:41 +00:00
Urvang Joshi	1624765821	tx64x64: Fix build when highbitdepth is on. This was being worked around by forcing highbitdepth to be off when enabling tx64x64. With the fixes, removed the work-around. Change-Id: I3102f9e17d4037af96a9eff418c5af6a97fd740c	2017-05-25 03:36:36 +00:00
Tom Finegan	378d652fc6	Remove CONFIG_{DE,EN}CODERS from the build system. Use CONFIG_AV1_DECODER and CONFIG_AV1_ENCODER instead. Change-Id: I33d83aa6d31067d0db7a972d36927dc49c420f81	2017-05-24 19:08:40 +00:00
David Barker	f19f35f7fb	ext-inter: Further cleanup * Rename the 'masked_compound_' functions to just 'masked_'. The previous names were intended to be temporary, to distinguish the old and new masked motion search pipelines. But now that the old pipeline has been removed, we can reuse the old names. * Simplify the new ext-inter compound motion search pipeline a bit. * Harmonize names: Rename aom_highbd_masked_compound_sub_pixel_variance* to aom_highbd_8_masked_sub_pixel_variance*, to match the naming of the corresponding non-masked functions Change-Id: I988768ffe2f42a942405b7d8e93a2757a012dca3	2017-05-24 00:02:53 +00:00
David Barker	5d34e6a738	Vectorize high-precision convolve filter Add SSE2 lowbd and SSSE3 highbd versions of the filters introduced in https://aomedia-review.googlesource.com/c/11962/ . These filters are equivalent in speed to the SSE2 implementations of the regular convolve filter. The average time to filter a 64x64 block is: lowbd C: 52us lowbd SSE2: 5.6us highbd C: 53us highbd SSSE3: 5.8us Also add a correctness test based on the warp filter tests. Change-Id: Ia0d81100e8a414bbfc2b5f664d751cf24765299e	2017-05-23 13:22:36 +00:00
David Barker	0f3c94e134	ext-inter: Delete dead code Patches https://aomedia-review.googlesource.com/c/11987/ and https://aomedia-review.googlesource.com/c/11988/ replaced the old masked motion search pipeline with a new one which uses different SAD/SSE functions. This resulted in a lot of dead code. This patch removes the now-dead code. Note that this includes vectorized SAD/SSE functions, which will need to be rewritten at some point for the new pipeline. It also includes the masked_compound_variance_* functions since these turned out not to be used by the new pipeline. To help with the later addition of vectorized functions, the masked_sad/variance_test.cc files are kept but are modified to work with the new functions. The tests are then disabled until we actually have the vectorized functions. Change-Id: I61b686abd14bba5280bed94e1be62eb74ea23d89	2017-05-23 06:13:26 +00:00
Tom Finegan	ba02c24cfe	Remove CONFIG_{DE,EN}CODERS from the CMake build. Use CONFIG_AV1_{DE,EN}CODER to control decoder and encoder support inclusion instead. BUG=aomedia:76,aomedia:508 Change-Id: Ib150ae382b301885589f30d9b6e98d3bfdd1afce	2017-05-22 22:35:37 +00:00
Nathan E. Egge	cd5395184e	Replace EC_MULTISYMBOL with DAALA_EC \|\| ANS. Change-Id: Ia0eb3a3694fdbe9d33548ff4014b704b2f3db86a	2017-05-20 00:48:56 +00:00
David Barker	c155e018ca	ext-inter: Use joint_motion_search for masked compounds Add functions which take both components of a masked compound and compute the resulting SAD/SSE. Extend joint_motion_search to understand masked compounds, and use it to evaluate NEW_NEWMV modes. Change-Id: I782199a20d119a6c61c6567df157508125ac7ce7	2017-05-18 21:55:59 +00:00
hui su	efdad1f400	Add NELEMENTS() macro in aom_dsp/aom_dsp_common.h Change-Id: Ia8da431c7a1faa43e130ce71da9561957c5556e7	2017-05-17 18:03:06 +00:00
Yi Luo	aaa65f2460	Correct function signature for Visual Studio - There would be VS build warning on unaligned formal parameter. Change-Id: I6e122c4fec2505ef3458e4bdf218d3cd30bb494f	2017-05-16 18:26:15 +00:00
Debargha Mukherjee	28d15c715a	Experimental high precision convolve for Wiener Improves coding efficiency. Change-Id: I7bb12190cdc4581097809a020355cdc8867fc1ad	2017-05-15 19:48:41 +00:00
Ralph Giles	be111b3838	Remove armv6 media-extension assembly. Libvpx dropped armv6 support sometime after the aom fork. We don't intend to support this platform, which is likely too slow in any case. Remove the assembly and intrinsics optimized routines, their tests, cpu feature detection, and rtcd specialization for this instruction set extension. Change-Id: If44ec28e5ddafc6af179c5d1982ac7e81fe54d5e	2017-05-15 15:55:47 +00:00
Yi Luo	d1fb415ff2	Fix uninitialzed __m256i vector warning BUG=aomedia:529 Change-Id: I9fccd0d29d100c92152d33a74dc3df8b7d256bcb	2017-05-12 19:13:40 +00:00
Yaowu Xu	eec78cc986	Prevent call to non-existing error_handler BUG=aomedia:121 Change-Id: Iec2e98c9f80ef3a01476234e637c135ff0513efd	2017-05-12 18:08:01 +00:00
Yi Luo	40f22ef85b	Partial IDCT 32x32 avx2 - Function level improvement (ms): Functions ssse3 avx2 Percentage idct32x32_1024 794 374 52.9% idct32x32_135 354 169 52.2% idct32x32_34 197 142 27.9% idct32x32_1 n/a 26 n/a - Integrating in default scan order. Change-Id: I84815112b26b8a8cb800281a1cfb1706342af57d	2017-05-11 05:48:44 +00:00
Urvang Joshi	f695d6596b	smooth_pred: 1D weights array to use less memory. As the block sizes are powers of two, we can index into the weights array as sm_weights_array[bs] now. This uses 2 * MAX_BLOCK_DIM memory, instead of NUM_BLOCK_DIMS * MAX_BLOCK_DIM earlier. Change-Id: I55bcedc188b8ed7def719c4d002c1fe2ec5e1b7f	2017-05-10 22:00:00 +00:00
Yi Luo	165adf8ec6	Use saturation addition to do rounding for avx2 IDCT - Found this bug when increasing unit test number to 10000. - Unit test is therefore also updated. Change-Id: I938e96f6ebd35ae1bd8affebf8665e1da49a324b	2017-05-09 22:16:50 +00:00
Sebastien Alaiwan	e13a11f34f	Fix warning about unused functions Change-Id: Ia6707cf50441f757fb053daeae85fb2d0c9b135e	2017-05-09 10:54:57 +02:00
Yi Luo	f6176abb07	Partial IDCT 16x16 avx2 - Function level improvement: functions sse2 avx2 percentage idct16x16_256 365 226 38% idct16x16_38 n/a 136 n/a idct16x16_10 171 110 35% idct16x16_1 34 26 23% - Integrated in AV1 for default scan order. Change-Id: Ieb1a8e730bea9c371ebc0e5f4a748640d8f5e921	2017-05-08 22:18:18 +00:00

1 2 3 4 5 ...

372 Коммитов