mozilla/aom - aom

Граф коммитов

Автор	SHA1	Сообщение	Дата
Timothy B. Terriberry	f6c807c5a1	ec_smallmul: Convert CDFs to iCDFs. Hoists the iCDF conversion outside of the daala code. We directly store 32768 - cdf[i] in each cdf, to avoid having to convert the whole array every time a symbol is coded. This works with ec_multisymbol, new_tokenset, and ec_adapt. Compared to Change-Id Idbbd3743e9189146cb519d5b984bdabd69e3f4c0, this improves decoder runtimes by 1.15% at QP=55 and 2.64% at QP=20. The overall slowdown of ec_smallmul is now 0.12% at QP=55 and 0.44% at QP=20. Encoder output should not change, and all streams should remain decodable without decoder changes. Change-Id: I06b8b75b667bb1bc4ddffc78f895e48a09f4c578	2017-04-18 18:47:29 +00:00
Yi Luo	a4d879923f	Move width branch out of height loop - AVX2 Copy and average functions are faster, Copy function: ~4%-57% Avg function: ~17%-54% Change-Id: Ib1732cd90eb353379ef50ecbb1e207860969f1c3	2017-04-18 18:00:35 +00:00
Yushin Cho	27acc47869	Skip adding zero siginal to prediction with DC only idct If DC only idct gives zero, then we can skip the steps which add zero signal to predicted signal. DC only idct cases will occur more frequently at lower bit rates. Similar changes can be done for C version of high bit depth idct functions. Change-Id: I53af22904568f7043091710da70ca8299bf361c5	2017-04-17 20:43:28 +00:00
Timothy B. Terriberry	d5b89d0d07	ec_smallmul: Simplify binary read/write. This should be the same number of operations as the non-ec_smallmul version (though ideally we'd use the real 15-bit probability natively). Encoder output should not change, and all streams should remain decodable without decoder changes. Change-Id: I2998a77a02f566cd0c82c415395637acf49b5a97	2017-04-14 19:07:43 +00:00
Timothy B. Terriberry	ead52876d6	daala_ec: Convert the decoder to use iCDFs This only changes the internal coding engine. We convert CDFs into iCDFs at the "bool" reader <-> daala_ec boundary. Decoder output should not change. Change-Id: I483dfe3e5588d2038c3c7ec4cd5ba62d6699b920	2017-04-14 19:07:43 +00:00
Timothy B. Terriberry	881f109bf3	daala_ec: Invert the internal state of the decoder This removes one subtraction from the CDF search loop (reducing the dependency chain for reading from the CDF) at the cost of one increment and decrement during renormalization (easily absorbed by the reorder buffer). There should be no change in decoded output. Change-Id: Ia7905bb8ca7c5d4ab73f23ccc61bcd3432349aa2	2017-04-14 19:07:43 +00:00
Timothy B. Terriberry	41b4f75b87	daala_ec: Convert the encoder to use iCDFs This only changes the internal coding engine. We convert CDFs into iCDFs at the "bool" writer <-> daala_ec boundary. Encoder output should not change, and all streams should remain decodable without decoder changes. Change-Id: Id3ac7352926497bf6f7bc371ab9bc76e9a3569d5	2017-04-14 19:07:43 +00:00
Timothy B. Terriberry	033e53688f	daala_ec: Remove non-dyadic functions. Encoder output should not change, and all streams should remain decodable without decoder changes. Change-Id: Id1f1b0f2f02c3b46f150a93c451bf48abd0782ca	2017-04-14 19:07:43 +00:00
Ryan Lei	dd6fa06a06	update parallel_deblocking experiment with more filter tap options this change adds the following filter tap options: 1. add options to replace 15 tap filter with 9 or 11 tap filter 2. force chroma plane to only use maximum 7 tap filter above options are disabled by default Change-Id: Iab90a613210c1adaf4475976e9ed7e78ac30803b	2017-04-14 18:49:07 +00:00
Sebastien Alaiwan	e5728a955a	Simplify coefficient range checking Deduplicate implementations of check_range, and deduplicate the call to aom_read_bit. Change-Id: I63b023758248717125e4df6d1c382d4c517bae84	2017-04-14 16:35:24 +00:00
Tom Finegan	6c86ace0a5	Fix CONFIG_HIGHBITDEPTH in cmake. Broken since `9d247355` when aom_dsp/x86/highbd_convolve_avx2.c was added to aom_dsp.mk. Change-Id: Ide6779209a546e1bf84a4997c0cdcf3b2bc2b92b	2017-04-14 03:51:19 +00:00
Tom Finegan	d148c9637c	Add mips32 support to the cmake build. Requires use of new cmake toolchain file: $ cmake path/to/aom -DCMAKE_TOOLCHAIN_FILE=path/to/aom/build/cmake/toolchains/mips32-linux-gcc.cmake DSPR2 and MSA are supported via addition of -DENABLE_DSPR2=1 and -DENABLE_MSA=1 respectively. Note that the latter requires the addition of -DMIPS_CPU=p5600. BUG=https://bugs.chromium.org/p/aomedia/issues/detail?id=76 Change-Id: Idf7d7f2daecf18cc45b834166eaf34ee9f414d49	2017-04-13 13:24:33 -07:00
Steinar Midtskogen	1b2b739bd2	Add s8 -> s16 unpack instrinsics Change-Id: Iec22c6442c55a5908d858766ff6dfb8bff69835d	2017-04-13 07:48:44 +00:00
Sebastien Alaiwan	71e87847eb	Homogenize configuration option name. Rename '--enable-aom-highbitdepth' to '--enable-highbitdepth' Change-Id: I1de13c3508c30c552532993419d8ace326142ab6	2017-04-12 22:29:11 +00:00
Timothy B. Terriberry	b1c5760ed8	Add ec_smallmul experiment. This reduces the multiplier width of daala_ec from 16x15->31 to 8x15->23, which reduces hardware latency by an estimated 20% (and area for this module by an estimated 40%). These are the smallest logical changes required to achieve this, but the approach will be optimized significantly in subsequent commits. When enabled: ec_smallmul1c_base@2017-03-08T00:49:01.830Z -> ec_smallmul1c@2017-03-08T00:49:45.091Z PSNR \| PSNR Cb \| PSNR Cr \| PSNR HVS \| SSIM \| MS SSIM \| CIEDE 2000 0.0203 \| 0.0203 \| 0.0204 \| 0.0203 \| 0.0203 \| 0.0203 \| 0.0202 Change-Id: Idbbd3743e9189146cb519d5b984bdabd69e3f4c0	2017-04-12 20:17:54 +00:00
Sarah Parker	f178329191	Add gm parameter coding based on ref parameters Change-Id: Ic2344a6475b967fa07f70b3ffad2714de657bb49	2017-04-07 22:39:42 +00:00
James Zern	859931ed9d	variance_neon: sync variance() w/c,sse2 removes some unnecessary casts and adds a few explicit uint32 ones for larger sizes to quiet -Wshorten-64-to-32 warnings ported from libvpx: e372bfd5a variance_neon: sync variance() w/c,sse2 Change-Id: I63c5fce8e62c426d5cf5c10a66a113c119a43518	2017-04-06 22:21:47 +00:00
Yunqing Wang	1e64e70b5f	Fix forward transform compilation errors BUG=aomedia:395 Change-Id: I381f16e66e2540c9bf24727abf8915a3850dcc92	2017-04-04 16:25:55 +00:00
Urvang Joshi	c07b23de18	RTCD defs: refactor intra prediction protos. Change-Id: I0f4576522a07597dbb04089b02ca1fae67075ba4	2017-04-03 23:15:42 +00:00
Steinar Midtskogen	a2fa9ee3a4	Improve SSE2 implementation of v64_abs_s8 and v128_abs_s8 Change-Id: I5243432106c2456f1220adb9d8f24ae5e4249748	2017-04-03 08:59:23 +02:00
Steinar Midtskogen	6033fb853d	Add v64_abs_s8, v128_abs_s8 and v256_abs_s8 Change-Id: I529509e4e997ba123799a3a581d20624d75cf582	2017-04-02 21:45:46 +02:00
Steinar Midtskogen	9b8444a17c	Add v64_ssub_u16, v128_ssub_u16 and v256_ssub_u16 Change-Id: I60543913cbd8dc5cad524ab74697227f9e93836e	2017-04-02 02:04:11 +00:00
Urvang Joshi	5ddac0aac8	RTCD defs: Remove empty specialize statements once and for all. A similar cleanup happened before, but the empty statements have since reappeared. I added a check in 'specialize' subroutine to die whenever such an empty specialize call is found, so that config+make would fail. Change-Id: I300ca0f0b077c0aeca8096d6460d8fb1c364d9b9	2017-03-31 16:40:03 +00:00
Alex Converse	29608d84af	variance: Add odd size sse functions Change-Id: I5eb7870d4b1b83bb907e539528f27f80a42e2fad	2017-03-31 16:25:44 +00:00
Yi Luo	9d24735537	High bit depth inter prediction filter AVX2 On i7-6700: - Function level speed improvement: 23%-29% - User level speed improvement: decoder: ~%2-%4. encoder: <1%. Change-Id: I02937a72304c3b356ca41e580352790df391f0a2	2017-03-30 23:12:13 +00:00
Alex Converse	4c5b020472	Make aom_sum_squares_2d_i16 take width and height parameters. SSE2 may be needed for nx4 and 4xn. Change-Id: I3c10112447fdb5fe51a68bc2c6e2f2641b102723	2017-03-30 15:49:22 +00:00
Steinar Midtskogen	b8ff6aaf5d	Add SIMD support for CDEF dering for sse2/ssse3 and neon Change-Id: Ibaaed850ddceba9c3db542eaf4a1c623ce6b412b	2017-03-29 23:47:21 +00:00
Steinar Midtskogen	73aa77c034	Increase parallelism in CLPF SIMD Change-Id: I66cdb67f8a1c2072516a65822dcc838e516ba9d7	2017-03-29 17:53:04 +00:00
Tom Finegan	5e9d15f34a	aom_dsp: Fix cmake build. Add missing sources: - binary_codes_{reader,writer}.{c,h} Change-Id: I4eda322a4c07a61f9da7370fda3163f726ab531c	2017-03-28 23:33:53 +00:00
Debargha Mukherjee	47748b5692	Adds binary code lib for coding various symbols Adds a variable length binary code library for coding various symbols for typical use in headers. The main codes implemented are: 1. Coding a symbol from an n-ary alphabet using a quasi-uniform code. 2. A bilevel code for coding symbols from an n-ary alphabet based on a reference value for the symbol also taken from the same alphabet. The code has two steps. If the symbol is close to the reference a shorter code is used, while if it is farther away a longer code is used. 3. A finite (terminated) subexponential code that codes a symbol from an n-ary alphabet using subexp parameter k. 4. A finite (terminated) subexponential code that codes a symbol from an n-ary alphabet using subexp parameter k, based on a given reference also taken from the same alphabet. This code essentially reorders the values before using the same code as 3. Also adds corresponding encoder side functions to count the number of bits used. These codes will be subsequently used for more efficient encoding of loop-restoration parameters and global motion parameters. Change-Id: I28c82b611925c1ab17f544c48c4b1287930764b7	2017-03-27 21:25:38 +00:00
Jingning Han	8e67c05f53	Fix tree to cdf index mapping This fixes the mis-aligned cdf model derived from tree based model. It resolves the compression performance regression in dual filter, intra mode, inter mode, and transform block type coding, when ec-multisymbol is enabled by default. With dual filter enabled, the performance regression was 3.6% loss for lowres. This fix brings the performance gains back to 1% gains. Change-Id: I80f5485386045908c152c9c11eeacbc650f1e324	2017-03-25 00:15:28 +00:00
Jean-Marc Valin	e9f7742437	Do real chroma RDO search for CDEF Chroma now has a list of strenghts too, with the superblock signalling shared between luma and chroma. low-latency, cpu=4: PSNR \| PSNR Cb \| PSNR Cr \| PSNR HVS \| SSIM \| MS SSIM \| CIEDE 2000 -0.0114 \| -1.4626 \| -1.4745 \| -0.0423 \| 0.0430 \| -0.0001 \| -0.7416 Change-Id: I389c77f1d80020f810e45f8502c656ad9d397c8c	2017-03-23 21:57:30 +00:00
Alex Converse	fa16041c40	Fix Wundef errors in simd intrinsics Change-Id: I551eda906c96fac77125e10e6f71e9a6edca5baf	2017-03-23 19:08:20 +00:00
Alex Converse	64d7ef6746	Fix Wundef warnings inside the codec Change-Id: I2f4a5c836905b089b91b211368bf3a0dea682b75	2017-03-23 15:42:52 +00:00
Tom Finegan	5afa1922b1	Add arm64 iOS support to the cmake build. BUG=https://bugs.chromium.org/p/aomedia/issues/detail?id=76 Change-Id: I6c246c0a5399e34e0eeeadc98de21cbf5b850af6	2017-03-22 01:02:52 +00:00
Steinar Midtskogen	3c33def72c	Limit line buffer to 6 lines Change-Id: I6fedfa6427865e9a37fbdf9d9c1bf8be55222cba	2017-03-21 21:27:09 +00:00
Steinar Midtskogen	d280a84554	Remove boundary checks in CLPF Change-Id: Icc93783f47fe7fe3aac395aadcc8bbc307dae1fb	2017-03-21 21:27:09 +00:00
Thomas Davies	5dad9a8943	EC_ADAPT: minor simplification to adaptation mechanism. This removes an instruction from the HW path. It also improves BDR by 0.02% on all metrics (AWCY, High Latency, objective-1-fast). Change-Id: I9f8a86871e1c0db4a0704dee297acd6977abcbe4	2017-03-20 23:06:38 +00:00
Tom Finegan	97d29ea7a4	Add support for armv7s to cmake aom_dsp build. - Add function add_gas_asm_library() to handle conversion of asm sources and creation of custom dependencies. - Uses add_asm_library() to create the library build. - Add aom_dsp_common_neon_intrinsics target for the neon intrinsics. BUG=https://bugs.chromium.org/p/aomedia/issues/detail?id=76 Change-Id: Ifd99fbd69998a79613e0f5b61003a47973a804bc	2017-03-19 01:47:14 +00:00
Steinar Midtskogen	95f1c2ab6e	Fix -fsanitize=integer warnings in v64_intrinsics_c.h Change-Id: I8134c0ac4bd18478b266a0058e00bc6b1e6f8e9e	2017-03-17 20:27:58 +00:00
Steinar Midtskogen	6c79576556	Fix ubsan warnings BUG=aomedia:376 Change-Id: Ief69f220ec5b6cf15443f872ad2f9a63336c185d	2017-03-17 20:27:58 +00:00
Steinar Midtskogen	a9d41e88d2	Merge dering/clpf rdo and filtering * Dering and clpf were merged into a single pass. * 32x32 and 128x128 filter block sizes for clpf were removed. * RDO for dering and clpf merged and improved: - "0" no longer required to be in the strength selection - Dering strength can now be 0, 1 or 2 bits per block LL HL PSNR: -0.04 -0.01 PSNR HVS: -0.27 -0.18 SSIM: -0.15 +0.01 CIEDE 2000: -0.11 -0.03 APSNR: -0.03 -0.00 MS SSIM: -0.18 -0.11 Change-Id: I9f002a16ad218eab6007f90f1f176232443495f0	2017-03-17 19:06:20 +00:00
Thomas Davies	028b57f563	EC_ADAPT: Perform backwards updates directly on CDFs. The initial CDF for each frame is stored in the frame context. CDFs for actual coding are stored in the tile structures, and these are what get adapted. The initial CDF is replaced by an average CDF derived from these tile CDFs. This is carried forward to future frames when backward adaptation is on. CDFs are no longer set from the 8 bit probabilities in backwards adaptation. For now, 8 bit probabilities are maintained for use in the encoder and for symbols which do not have a CDF. Change-Id: I106b30510bfad1fa57d077f7702acc1864378a09	2017-03-15 09:31:58 +00:00
Yushin Cho	d080f4152d	Fix broken build with accounting Change-Id: I50267aa39d4d2857b48cbea0cbc8a7608489ebd7	2017-03-14 22:04:46 +00:00
Timothy B. Terriberry	561eb7cdc6	daala_ec: Remove dead code. Change-Id: Ief9581c8060132f20ca81f4c1be15e2772b6c9eb	2017-03-14 18:36:14 +00:00
Thomas Davies	f7f87ff2e6	Add a symbol decode call count to accounting. This keeps track of how many calls have been made to read symbols or bits. A given syntax element may make multiple calls to symbol decoding functions, and these variables keep track of the entropy decoding engine throughput. Change-Id: Iab3a720cbfe68f8d5ca3e4c415f7baa683b24268	2017-03-10 20:09:01 +00:00
Urvang Joshi	ee7ee7f49f	SMOOTH_PRED: Use get_msb() to get log2 of block dimension. Apart from being inefficient, the floating point operation log2() was resulting in an assertion failure due to an unrelated floating point exception that happens earlier. Related: update the MD5s in test_intra_pred_speed to fix that failure too. BUG=aomedia:384 Change-Id: I18dc0733e880bac21b3d07ad874f8ae341f59f06	2017-03-10 00:26:58 +00:00
Steinar Midtskogen	6d2f3c2a9e	Fix some potential warnings on unused functions and implicit cast Change-Id: I216935236d0f5073c4f975977572c558cf892328	2017-03-07 11:35:10 +01:00
Urvang Joshi	4d5bbbd907	SMOOTH_PRED: Use 8-bit weights for real. Use 255 instead of 256, to restrict to 8-bits. Only noise level differences in performance. AWCY: High Latency Low Latency All Keyframes -0.01 -0.01 Video overall -0.01 -0.07 Google Set: All KF Video lowres -0.005 -0.029 midres -0.008 0.028 hdres -0.010 -0.022 Note: By moving from 18-bit to 8-bit and then cutting off at 255 (this change, overall effect is noise level too (neutral or better). Change-Id: I9f2852023015e36c01203bafe486ec400b2ba46f	2017-03-06 20:11:50 +00:00
Tom Finegan	6598843c00	aom_dsp.mk: Remove redundant CONFIG_PVQ section. Same sources as CONFIG_AV1 section. Combined them, and fixed a comment. Change-Id: I5273849143c0c92a506deeb9241a761e5ee125d3	2017-03-03 17:02:03 +00:00

1 2 3 4 5 ...

293 Коммитов