Граф коммитов

293 Коммитов

Автор SHA1 Сообщение Дата
Timothy B. Terriberry f6c807c5a1 ec_smallmul: Convert CDFs to iCDFs.
Hoists the iCDF conversion outside of the daala code.
We directly store 32768 - cdf[i] in each cdf, to avoid having to
convert the whole array every time a symbol is coded.

This works with ec_multisymbol, new_tokenset, and ec_adapt.

Compared to Change-Id Idbbd3743e9189146cb519d5b984bdabd69e3f4c0,
this improves decoder runtimes by 1.15% at QP=55 and 2.64% at
QP=20.

The overall slowdown of ec_smallmul is now 0.12% at QP=55 and
0.44% at QP=20.

Encoder output should not change, and all streams should remain
decodable without decoder changes.

Change-Id: I06b8b75b667bb1bc4ddffc78f895e48a09f4c578
2017-04-18 18:47:29 +00:00
Yi Luo a4d879923f Move width branch out of height loop
- AVX2 Copy and average functions are faster,
  Copy function: ~4%-57%
  Avg function:  ~17%-54%

Change-Id: Ib1732cd90eb353379ef50ecbb1e207860969f1c3
2017-04-18 18:00:35 +00:00
Yushin Cho 27acc47869 Skip adding zero siginal to prediction with DC only idct
If DC only idct gives zero, then we can skip the steps which
add zero signal to predicted signal.
DC only idct cases will occur more frequently at lower bit rates.

Similar changes can be done for C version of high bit depth idct functions.

Change-Id: I53af22904568f7043091710da70ca8299bf361c5
2017-04-17 20:43:28 +00:00
Timothy B. Terriberry d5b89d0d07 ec_smallmul: Simplify binary read/write.
This should be the same number of operations as the non-ec_smallmul
version (though ideally we'd use the real 15-bit probability
natively).

Encoder output should not change, and all streams should remain
decodable without decoder changes.

Change-Id: I2998a77a02f566cd0c82c415395637acf49b5a97
2017-04-14 19:07:43 +00:00
Timothy B. Terriberry ead52876d6 daala_ec: Convert the decoder to use iCDFs
This only changes the internal coding engine. We convert CDFs into
iCDFs at the "bool" reader <-> daala_ec boundary.

Decoder output should not change.

Change-Id: I483dfe3e5588d2038c3c7ec4cd5ba62d6699b920
2017-04-14 19:07:43 +00:00
Timothy B. Terriberry 881f109bf3 daala_ec: Invert the internal state of the decoder
This removes one subtraction from the CDF search loop (reducing the
dependency chain for reading from the CDF) at the cost of one
increment and decrement during renormalization (easily absorbed by
the reorder buffer).

There should be no change in decoded output.

Change-Id: Ia7905bb8ca7c5d4ab73f23ccc61bcd3432349aa2
2017-04-14 19:07:43 +00:00
Timothy B. Terriberry 41b4f75b87 daala_ec: Convert the encoder to use iCDFs
This only changes the internal coding engine. We convert CDFs into
iCDFs at the "bool" writer <-> daala_ec boundary.

Encoder output should not change, and all streams should remain
decodable without decoder changes.

Change-Id: Id3ac7352926497bf6f7bc371ab9bc76e9a3569d5
2017-04-14 19:07:43 +00:00
Timothy B. Terriberry 033e53688f daala_ec: Remove non-dyadic functions.
Encoder output should not change, and all streams should remain
decodable without decoder changes.

Change-Id: Id1f1b0f2f02c3b46f150a93c451bf48abd0782ca
2017-04-14 19:07:43 +00:00
Ryan Lei dd6fa06a06 update parallel_deblocking experiment with more filter tap options
this change adds the following filter tap options:
1. add options to replace 15 tap filter with 9 or 11 tap filter
2. force chroma plane to only use maximum 7 tap filter

above options are disabled by default

Change-Id: Iab90a613210c1adaf4475976e9ed7e78ac30803b
2017-04-14 18:49:07 +00:00
Sebastien Alaiwan e5728a955a Simplify coefficient range checking
Deduplicate implementations of check_range, and deduplicate the call
to aom_read_bit.

Change-Id: I63b023758248717125e4df6d1c382d4c517bae84
2017-04-14 16:35:24 +00:00
Tom Finegan 6c86ace0a5 Fix CONFIG_HIGHBITDEPTH in cmake.
Broken since 9d247355 when aom_dsp/x86/highbd_convolve_avx2.c was
added to aom_dsp.mk.

Change-Id: Ide6779209a546e1bf84a4997c0cdcf3b2bc2b92b
2017-04-14 03:51:19 +00:00
Tom Finegan d148c9637c Add mips32 support to the cmake build.
Requires use of new cmake toolchain file:
$ cmake path/to/aom -DCMAKE_TOOLCHAIN_FILE=path/to/aom/build/cmake/toolchains/mips32-linux-gcc.cmake

DSPR2 and MSA are supported via addition of -DENABLE_DSPR2=1 and
-DENABLE_MSA=1 respectively. Note that the latter requires the addition
of -DMIPS_CPU=p5600.

BUG=https://bugs.chromium.org/p/aomedia/issues/detail?id=76

Change-Id: Idf7d7f2daecf18cc45b834166eaf34ee9f414d49
2017-04-13 13:24:33 -07:00
Steinar Midtskogen 1b2b739bd2 Add s8 -> s16 unpack instrinsics
Change-Id: Iec22c6442c55a5908d858766ff6dfb8bff69835d
2017-04-13 07:48:44 +00:00
Sebastien Alaiwan 71e87847eb Homogenize configuration option name.
Rename '--enable-aom-highbitdepth' to '--enable-highbitdepth'

Change-Id: I1de13c3508c30c552532993419d8ace326142ab6
2017-04-12 22:29:11 +00:00
Timothy B. Terriberry b1c5760ed8 Add ec_smallmul experiment.
This reduces the multiplier width of daala_ec from 16x15->31 to
8x15->23, which reduces hardware latency by an estimated 20% (and
area for this module by an estimated 40%).

These are the smallest logical changes required to achieve this,
but the approach will be optimized significantly in subsequent
commits.

When enabled:

ec_smallmul1c_base@2017-03-08T00:49:01.830Z ->
 ec_smallmul1c@2017-03-08T00:49:45.091Z

  PSNR | PSNR Cb | PSNR Cr | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
0.0203 |  0.0203 |  0.0204 |   0.0203 | 0.0203 |  0.0203 |     0.0202

Change-Id: Idbbd3743e9189146cb519d5b984bdabd69e3f4c0
2017-04-12 20:17:54 +00:00
Sarah Parker f178329191 Add gm parameter coding based on ref parameters
Change-Id: Ic2344a6475b967fa07f70b3ffad2714de657bb49
2017-04-07 22:39:42 +00:00
James Zern 859931ed9d variance_neon: sync variance*() w/c,sse2
removes some unnecessary casts and adds a few explicit uint32 ones for
larger sizes to quiet -Wshorten-64-to-32 warnings

ported from libvpx:
e372bfd5a variance_neon: sync variance*() w/c,sse2

Change-Id: I63c5fce8e62c426d5cf5c10a66a113c119a43518
2017-04-06 22:21:47 +00:00
Yunqing Wang 1e64e70b5f Fix forward transform compilation errors
BUG=aomedia:395

Change-Id: I381f16e66e2540c9bf24727abf8915a3850dcc92
2017-04-04 16:25:55 +00:00
Urvang Joshi c07b23de18 RTCD defs: refactor intra prediction protos.
Change-Id: I0f4576522a07597dbb04089b02ca1fae67075ba4
2017-04-03 23:15:42 +00:00
Steinar Midtskogen a2fa9ee3a4 Improve SSE2 implementation of v64_abs_s8 and v128_abs_s8
Change-Id: I5243432106c2456f1220adb9d8f24ae5e4249748
2017-04-03 08:59:23 +02:00
Steinar Midtskogen 6033fb853d Add v64_abs_s8, v128_abs_s8 and v256_abs_s8
Change-Id: I529509e4e997ba123799a3a581d20624d75cf582
2017-04-02 21:45:46 +02:00
Steinar Midtskogen 9b8444a17c Add v64_ssub_u16, v128_ssub_u16 and v256_ssub_u16
Change-Id: I60543913cbd8dc5cad524ab74697227f9e93836e
2017-04-02 02:04:11 +00:00
Urvang Joshi 5ddac0aac8 RTCD defs: Remove empty specialize statements once and for all.
A similar cleanup happened before, but the empty statements have since
reappeared. I added a check in 'specialize' subroutine to die whenever
such an empty specialize call is found, so that config+make would fail.

Change-Id: I300ca0f0b077c0aeca8096d6460d8fb1c364d9b9
2017-03-31 16:40:03 +00:00
Alex Converse 29608d84af variance: Add odd size sse functions
Change-Id: I5eb7870d4b1b83bb907e539528f27f80a42e2fad
2017-03-31 16:25:44 +00:00
Yi Luo 9d24735537 High bit depth inter prediction filter AVX2
On i7-6700:
- Function level speed improvement: 23%-29%
- User level speed improvement:
   decoder: ~%2-%4.
   encoder: <1%.

Change-Id: I02937a72304c3b356ca41e580352790df391f0a2
2017-03-30 23:12:13 +00:00
Alex Converse 4c5b020472 Make aom_sum_squares_2d_i16 take width and height parameters.
SSE2 may be needed for nx4 and 4xn.

Change-Id: I3c10112447fdb5fe51a68bc2c6e2f2641b102723
2017-03-30 15:49:22 +00:00
Steinar Midtskogen b8ff6aaf5d Add SIMD support for CDEF dering for sse2/ssse3 and neon
Change-Id: Ibaaed850ddceba9c3db542eaf4a1c623ce6b412b
2017-03-29 23:47:21 +00:00
Steinar Midtskogen 73aa77c034 Increase parallelism in CLPF SIMD
Change-Id: I66cdb67f8a1c2072516a65822dcc838e516ba9d7
2017-03-29 17:53:04 +00:00
Tom Finegan 5e9d15f34a aom_dsp: Fix cmake build.
Add missing sources:
- binary_codes_{reader,writer}.{c,h}

Change-Id: I4eda322a4c07a61f9da7370fda3163f726ab531c
2017-03-28 23:33:53 +00:00
Debargha Mukherjee 47748b5692 Adds binary code lib for coding various symbols
Adds a variable length binary code library for
coding various symbols for typical use in headers.

The main codes implemented are:
1. Coding a symbol from an n-ary alphabet using a
quasi-uniform code.
2. A bilevel code for coding symbols from an n-ary
alphabet based on a reference value for the symbol
also taken from the same alphabet.
The code has two steps. If the symbol is close to
the reference a shorter code is used, while if it is
farther away a longer code is used.
3. A finite (terminated) subexponential code that codes
a symbol from an n-ary alphabet using subexp parameter k.
4. A finite (terminated) subexponential code that codes
a symbol from an n-ary alphabet using subexp parameter k,
based on a given reference also taken from the same
alphabet. This code essentially reorders the values
before using the same code as 3.

Also adds corresponding encoder side functions to count
the number of bits used.

These codes will be subsequently used for more efficient
encoding of loop-restoration parameters and global motion
parameters.

Change-Id: I28c82b611925c1ab17f544c48c4b1287930764b7
2017-03-27 21:25:38 +00:00
Jingning Han 8e67c05f53 Fix tree to cdf index mapping
This fixes the mis-aligned cdf model derived from tree based
model. It resolves the compression performance regression in
dual filter, intra mode, inter mode, and transform block type
coding, when ec-multisymbol is enabled by default.

With dual filter enabled, the performance regression was 3.6%
loss for lowres. This fix brings the performance gains back to 1%
gains.

Change-Id: I80f5485386045908c152c9c11eeacbc650f1e324
2017-03-25 00:15:28 +00:00
Jean-Marc Valin e9f7742437 Do real chroma RDO search for CDEF
Chroma now has a list of strenghts too, with the superblock signalling
shared between luma and chroma.

low-latency, cpu=4:

   PSNR | PSNR Cb | PSNR Cr | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
-0.0114 | -1.4626 | -1.4745 |  -0.0423 | 0.0430 | -0.0001 |    -0.7416

Change-Id: I389c77f1d80020f810e45f8502c656ad9d397c8c
2017-03-23 21:57:30 +00:00
Alex Converse fa16041c40 Fix Wundef errors in simd intrinsics
Change-Id: I551eda906c96fac77125e10e6f71e9a6edca5baf
2017-03-23 19:08:20 +00:00
Alex Converse 64d7ef6746 Fix Wundef warnings inside the codec
Change-Id: I2f4a5c836905b089b91b211368bf3a0dea682b75
2017-03-23 15:42:52 +00:00
Tom Finegan 5afa1922b1 Add arm64 iOS support to the cmake build.
BUG=https://bugs.chromium.org/p/aomedia/issues/detail?id=76

Change-Id: I6c246c0a5399e34e0eeeadc98de21cbf5b850af6
2017-03-22 01:02:52 +00:00
Steinar Midtskogen 3c33def72c Limit line buffer to 6 lines
Change-Id: I6fedfa6427865e9a37fbdf9d9c1bf8be55222cba
2017-03-21 21:27:09 +00:00
Steinar Midtskogen d280a84554 Remove boundary checks in CLPF
Change-Id: Icc93783f47fe7fe3aac395aadcc8bbc307dae1fb
2017-03-21 21:27:09 +00:00
Thomas Davies 5dad9a8943 EC_ADAPT: minor simplification to adaptation mechanism.
This removes an instruction from the HW path. It also improves
BDR by 0.02% on all metrics (AWCY, High Latency,
objective-1-fast).

Change-Id: I9f8a86871e1c0db4a0704dee297acd6977abcbe4
2017-03-20 23:06:38 +00:00
Tom Finegan 97d29ea7a4 Add support for armv7s to cmake aom_dsp build.
- Add function add_gas_asm_library() to handle conversion of asm
  sources and creation of custom dependencies.
- Uses add_asm_library() to create the library build.
- Add aom_dsp_common_neon_intrinsics target for the neon intrinsics.

BUG=https://bugs.chromium.org/p/aomedia/issues/detail?id=76

Change-Id: Ifd99fbd69998a79613e0f5b61003a47973a804bc
2017-03-19 01:47:14 +00:00
Steinar Midtskogen 95f1c2ab6e Fix -fsanitize=integer warnings in v64_intrinsics_c.h
Change-Id: I8134c0ac4bd18478b266a0058e00bc6b1e6f8e9e
2017-03-17 20:27:58 +00:00
Steinar Midtskogen 6c79576556 Fix ubsan warnings
BUG=aomedia:376

Change-Id: Ief69f220ec5b6cf15443f872ad2f9a63336c185d
2017-03-17 20:27:58 +00:00
Steinar Midtskogen a9d41e88d2 Merge dering/clpf rdo and filtering
* Dering and clpf were merged into a single pass.
* 32x32 and 128x128 filter block sizes for clpf were removed.
* RDO for dering and clpf merged and improved:
  - "0" no longer required to be in the strength selection
  - Dering strength can now be 0, 1 or 2 bits per block

              LL    HL
PSNR:       -0.04 -0.01
PSNR HVS:   -0.27 -0.18
SSIM:       -0.15 +0.01
CIEDE 2000: -0.11 -0.03
APSNR:      -0.03 -0.00
MS SSIM:    -0.18 -0.11

Change-Id: I9f002a16ad218eab6007f90f1f176232443495f0
2017-03-17 19:06:20 +00:00
Thomas Davies 028b57f563 EC_ADAPT: Perform backwards updates directly on CDFs.
The initial CDF for each frame is stored in
the frame context. CDFs for actual coding are
stored in the tile structures, and these are
what get adapted. The initial CDF is replaced
by an average CDF derived from these tile CDFs.
This is carried forward to future frames when
backward adaptation is on.

CDFs are no longer set from the 8 bit probabilities
in backwards adaptation.

For now, 8 bit probabilities are maintained for
use in the encoder and for symbols which do not
have a CDF.

Change-Id: I106b30510bfad1fa57d077f7702acc1864378a09
2017-03-15 09:31:58 +00:00
Yushin Cho d080f4152d Fix broken build with accounting
Change-Id: I50267aa39d4d2857b48cbea0cbc8a7608489ebd7
2017-03-14 22:04:46 +00:00
Timothy B. Terriberry 561eb7cdc6 daala_ec: Remove dead code.
Change-Id: Ief9581c8060132f20ca81f4c1be15e2772b6c9eb
2017-03-14 18:36:14 +00:00
Thomas Davies f7f87ff2e6 Add a symbol decode call count to accounting.
This keeps track of how many calls have been made
to read symbols or bits. A given syntax element
may make multiple calls to symbol decoding functions,
and these variables keep track of the entropy
decoding engine throughput.

Change-Id: Iab3a720cbfe68f8d5ca3e4c415f7baa683b24268
2017-03-10 20:09:01 +00:00
Urvang Joshi ee7ee7f49f SMOOTH_PRED: Use get_msb() to get log2 of block dimension.
Apart from being inefficient, the floating point operation log2()
was resulting in an assertion failure due to an unrelated floating
point exception that happens earlier.

Related: update the MD5s in test_intra_pred_speed to fix that failure
too.

BUG=aomedia:384

Change-Id: I18dc0733e880bac21b3d07ad874f8ae341f59f06
2017-03-10 00:26:58 +00:00
Steinar Midtskogen 6d2f3c2a9e Fix some potential warnings on unused functions and implicit cast
Change-Id: I216935236d0f5073c4f975977572c558cf892328
2017-03-07 11:35:10 +01:00
Urvang Joshi 4d5bbbd907 SMOOTH_PRED: Use 8-bit weights for real.
Use 255 instead of 256, to restrict to 8-bits.

Only noise level differences in performance.

AWCY:
                   High Latency     Low Latency
  All Keyframes    -0.01            -0.01
  Video overall    -0.01            -0.07

Google Set:
                  All KF            Video
lowres            -0.005            -0.029
midres            -0.008             0.028
hdres             -0.010            -0.022      

Note: By moving from 18-bit to 8-bit and then
cutting off at 255 (this change, overall effect is
noise level too (neutral or better).

Change-Id: I9f2852023015e36c01203bafe486ec400b2ba46f
2017-03-06 20:11:50 +00:00
Tom Finegan 6598843c00 aom_dsp.mk: Remove redundant CONFIG_PVQ section.
Same sources as CONFIG_AV1 section. Combined them, and fixed
a comment.

Change-Id: I5273849143c0c92a506deeb9241a761e5ee125d3
2017-03-03 17:02:03 +00:00