Hoists the iCDF conversion outside of the daala code.
We directly store 32768 - cdf[i] in each cdf, to avoid having to
convert the whole array every time a symbol is coded.
This works with ec_multisymbol, new_tokenset, and ec_adapt.
Compared to Change-Id Idbbd3743e9189146cb519d5b984bdabd69e3f4c0,
this improves decoder runtimes by 1.15% at QP=55 and 2.64% at
QP=20.
The overall slowdown of ec_smallmul is now 0.12% at QP=55 and
0.44% at QP=20.
Encoder output should not change, and all streams should remain
decodable without decoder changes.
Change-Id: I06b8b75b667bb1bc4ddffc78f895e48a09f4c578
If DC only idct gives zero, then we can skip the steps which
add zero signal to predicted signal.
DC only idct cases will occur more frequently at lower bit rates.
Similar changes can be done for C version of high bit depth idct functions.
Change-Id: I53af22904568f7043091710da70ca8299bf361c5
This should be the same number of operations as the non-ec_smallmul
version (though ideally we'd use the real 15-bit probability
natively).
Encoder output should not change, and all streams should remain
decodable without decoder changes.
Change-Id: I2998a77a02f566cd0c82c415395637acf49b5a97
This only changes the internal coding engine. We convert CDFs into
iCDFs at the "bool" reader <-> daala_ec boundary.
Decoder output should not change.
Change-Id: I483dfe3e5588d2038c3c7ec4cd5ba62d6699b920
This removes one subtraction from the CDF search loop (reducing the
dependency chain for reading from the CDF) at the cost of one
increment and decrement during renormalization (easily absorbed by
the reorder buffer).
There should be no change in decoded output.
Change-Id: Ia7905bb8ca7c5d4ab73f23ccc61bcd3432349aa2
This only changes the internal coding engine. We convert CDFs into
iCDFs at the "bool" writer <-> daala_ec boundary.
Encoder output should not change, and all streams should remain
decodable without decoder changes.
Change-Id: Id3ac7352926497bf6f7bc371ab9bc76e9a3569d5
Encoder output should not change, and all streams should remain
decodable without decoder changes.
Change-Id: Id1f1b0f2f02c3b46f150a93c451bf48abd0782ca
this change adds the following filter tap options:
1. add options to replace 15 tap filter with 9 or 11 tap filter
2. force chroma plane to only use maximum 7 tap filter
above options are disabled by default
Change-Id: Iab90a613210c1adaf4475976e9ed7e78ac30803b
Requires use of new cmake toolchain file:
$ cmake path/to/aom -DCMAKE_TOOLCHAIN_FILE=path/to/aom/build/cmake/toolchains/mips32-linux-gcc.cmake
DSPR2 and MSA are supported via addition of -DENABLE_DSPR2=1 and
-DENABLE_MSA=1 respectively. Note that the latter requires the addition
of -DMIPS_CPU=p5600.
BUG=https://bugs.chromium.org/p/aomedia/issues/detail?id=76
Change-Id: Idf7d7f2daecf18cc45b834166eaf34ee9f414d49
This reduces the multiplier width of daala_ec from 16x15->31 to
8x15->23, which reduces hardware latency by an estimated 20% (and
area for this module by an estimated 40%).
These are the smallest logical changes required to achieve this,
but the approach will be optimized significantly in subsequent
commits.
When enabled:
ec_smallmul1c_base@2017-03-08T00:49:01.830Z ->
ec_smallmul1c@2017-03-08T00:49:45.091Z
PSNR | PSNR Cb | PSNR Cr | PSNR HVS | SSIM | MS SSIM | CIEDE 2000
0.0203 | 0.0203 | 0.0204 | 0.0203 | 0.0203 | 0.0203 | 0.0202
Change-Id: Idbbd3743e9189146cb519d5b984bdabd69e3f4c0
removes some unnecessary casts and adds a few explicit uint32 ones for
larger sizes to quiet -Wshorten-64-to-32 warnings
ported from libvpx:
e372bfd5a variance_neon: sync variance*() w/c,sse2
Change-Id: I63c5fce8e62c426d5cf5c10a66a113c119a43518
A similar cleanup happened before, but the empty statements have since
reappeared. I added a check in 'specialize' subroutine to die whenever
such an empty specialize call is found, so that config+make would fail.
Change-Id: I300ca0f0b077c0aeca8096d6460d8fb1c364d9b9
Adds a variable length binary code library for
coding various symbols for typical use in headers.
The main codes implemented are:
1. Coding a symbol from an n-ary alphabet using a
quasi-uniform code.
2. A bilevel code for coding symbols from an n-ary
alphabet based on a reference value for the symbol
also taken from the same alphabet.
The code has two steps. If the symbol is close to
the reference a shorter code is used, while if it is
farther away a longer code is used.
3. A finite (terminated) subexponential code that codes
a symbol from an n-ary alphabet using subexp parameter k.
4. A finite (terminated) subexponential code that codes
a symbol from an n-ary alphabet using subexp parameter k,
based on a given reference also taken from the same
alphabet. This code essentially reorders the values
before using the same code as 3.
Also adds corresponding encoder side functions to count
the number of bits used.
These codes will be subsequently used for more efficient
encoding of loop-restoration parameters and global motion
parameters.
Change-Id: I28c82b611925c1ab17f544c48c4b1287930764b7
This fixes the mis-aligned cdf model derived from tree based
model. It resolves the compression performance regression in
dual filter, intra mode, inter mode, and transform block type
coding, when ec-multisymbol is enabled by default.
With dual filter enabled, the performance regression was 3.6%
loss for lowres. This fix brings the performance gains back to 1%
gains.
Change-Id: I80f5485386045908c152c9c11eeacbc650f1e324
This removes an instruction from the HW path. It also improves
BDR by 0.02% on all metrics (AWCY, High Latency,
objective-1-fast).
Change-Id: I9f8a86871e1c0db4a0704dee297acd6977abcbe4
- Add function add_gas_asm_library() to handle conversion of asm
sources and creation of custom dependencies.
- Uses add_asm_library() to create the library build.
- Add aom_dsp_common_neon_intrinsics target for the neon intrinsics.
BUG=https://bugs.chromium.org/p/aomedia/issues/detail?id=76
Change-Id: Ifd99fbd69998a79613e0f5b61003a47973a804bc
* Dering and clpf were merged into a single pass.
* 32x32 and 128x128 filter block sizes for clpf were removed.
* RDO for dering and clpf merged and improved:
- "0" no longer required to be in the strength selection
- Dering strength can now be 0, 1 or 2 bits per block
LL HL
PSNR: -0.04 -0.01
PSNR HVS: -0.27 -0.18
SSIM: -0.15 +0.01
CIEDE 2000: -0.11 -0.03
APSNR: -0.03 -0.00
MS SSIM: -0.18 -0.11
Change-Id: I9f002a16ad218eab6007f90f1f176232443495f0
The initial CDF for each frame is stored in
the frame context. CDFs for actual coding are
stored in the tile structures, and these are
what get adapted. The initial CDF is replaced
by an average CDF derived from these tile CDFs.
This is carried forward to future frames when
backward adaptation is on.
CDFs are no longer set from the 8 bit probabilities
in backwards adaptation.
For now, 8 bit probabilities are maintained for
use in the encoder and for symbols which do not
have a CDF.
Change-Id: I106b30510bfad1fa57d077f7702acc1864378a09
This keeps track of how many calls have been made
to read symbols or bits. A given syntax element
may make multiple calls to symbol decoding functions,
and these variables keep track of the entropy
decoding engine throughput.
Change-Id: Iab3a720cbfe68f8d5ca3e4c415f7baa683b24268
Apart from being inefficient, the floating point operation log2()
was resulting in an assertion failure due to an unrelated floating
point exception that happens earlier.
Related: update the MD5s in test_intra_pred_speed to fix that failure
too.
BUG=aomedia:384
Change-Id: I18dc0733e880bac21b3d07ad874f8ae341f59f06
Use 255 instead of 256, to restrict to 8-bits.
Only noise level differences in performance.
AWCY:
High Latency Low Latency
All Keyframes -0.01 -0.01
Video overall -0.01 -0.07
Google Set:
All KF Video
lowres -0.005 -0.029
midres -0.008 0.028
hdres -0.010 -0.022
Note: By moving from 18-bit to 8-bit and then
cutting off at 255 (this change, overall effect is
noise level too (neutral or better).
Change-Id: I9f2852023015e36c01203bafe486ec400b2ba46f