Zero, one, and two or more coded as one symbol (head).
Remaining tokens coded as a tail symbol.
The pareto CDF distribution is adjusted to cover tokens from
two onwards.
Change-Id: I98b33fab6b9f52690f6ad618ac55e725a97be056
- Added comments for some tables and #defines for clarity.
- Renamed some variables to ensure we use "color_index" instead of
"color" for palette color index related variables.
Change-Id: Ica95a26e0f171a41a3259c8e6b3b891b8cd10151
This commit makes the daala-ec work in the cb4x4 mode. As compared
to --enable-experimental, --enable-experimental --enable-cb4x4
improves the coding performance by:
lowres 2.6%
midres 1.2%
Change-Id: Ifee6f011c80364492c4a547513d24eb2958b5a56
Now that we have small number of contexts (5), use hash multipliers
(instead of base 11), so that color context hash is within a small
range. This allows us to use a lookup table to get color context
instead of a for loop.
Output bitstreams are bit-exact, so no change in compression.
Change-Id: I8cd8c893048c2fc6b22ccbd56f652d11486e2ee9
This reduces the complexity in a number of ways:
- We need just 3 neighbors instead of 4.
- Possible contexts reduce from 16 to 5.
- On hardware side, getting the contexts for a whole block will be more
parallelizable.
At the same time, compression performance improves very slightly:
- Screen-content set (videos) (Google): BDRate improved by 0.32
- screenshots set (images) (AWCY): PSNR improved by 0.62:
https://arewecompressedyet.com/?job=palette_withTR2%402017-01-27T21%3A30%3A28.890Z&job=palette_noTR2%402017-01-27T21%3A41%3A34.312Z
Change-Id: Ie84ca32f05d55ad481a51c2d3abc579468597189
Cherry-pick Daala e248823a
Getting rid of the DCT in od_compute_dist_8x8()
Replacing the DCT and frequency weighting by a filter
Change-Id: Icc3a46e5dbb561e4e3b00fa6c2290d54299c05cb
This commit fixes the encoding/decoding mismatch issue when
ext-partition and ext-partition-type are both turned on in cb4x4
mode.
BUG=aomedia:336
Change-Id: I4d6ad5863c9d3bc8e3a41c259b8b39f130164790
Adjusts the value by 1 to make sure that the center tap
if the Wiener filter does not drop below 0.
BUG=aomedia:315
Change-Id: I41c3a2eb3f36dd49072a4873a995003d18f94ece
Introduced in I745ca032f313c5041aacc98c03ae4bfc33d840de.
Stride should be plane_block_width and width should be cols,
sanity check: cols <= plane_block_width.
Change-Id: Ic5128e94a909e498010c92fef2013da8df6d6d85
Change the list of search offsets searched when ext-partition-types
is on for square block_sizes. This is because the VERTICAL_A and
HORIZONTAL_A partitions are incompatible with the default list.
BUG=AOMEDIA:141
Change-Id: I884c45c3d11039b7dcb72336a928362f926473ed
If part of a block falls outside right and/or bottom image boundary,
then only store tokens for the part of it within the boundary.
Also, consider only the part of the block within the boundary when
calculating the number of colors in the image, deciding the base
colors for palette, RD calculation etc.
The part of color map corresponding to pixels outside the image
boundary is padded with color indices copied from same row/column.
This behavior is similar to how pixels outside the boundary are padded.
For screen_content set, this is improves compression performance by
0.038 overall. One clip, in particular, has a significant gain of 0.8.
Change-Id: I745ca032f313c5041aacc98c03ae4bfc33d840de
Creates the PVQ_SKIP_TYPE enum to encapsulate the different types of
skipping that can be signaled by PVQ (i.e. skip: AC, DC or both).
There is no impact on the bitstream. However, the decoder will now emit
an internal error if the decoded skip flag is out of range. The
block_skip variable is also renamed to ac_dc_coded as it stores the same
information.
Change-Id: Ib2aadaf99dc1736ea392ae5ed8948c3cdc12da9b
Fixes a mismatch issue with ext-inter+motion-var+warped-motion
due to unset num_proj_ref values.
BUG=aomedia:311
Change-Id: I042551f6c53e8cc005f2133704a03b243c98c12a
only expose the static functions needed in the test file to avoid link
errors for e.g., av1_fht4x4_c
Change-Id: I35111d322f30bc2bfc57b32c11f691f0717cfaba
Now that https://aomedia-review.googlesource.com/#/c/6729/
has been merged, build_intra_predictors_for_interintra() is
now redundant, so replace it by a direct call to
av1_predict_intra_block() and remove the old function.
Reset rect_interintra back to 1.
To do this, we need to make the intra predictor take a
BLOCK_SIZE instead of a TX_SIZE. This is because we need to
be able to predict 32x64 and 64x32 blocks, but there is no
TX_32X64 or TX_64X32.
No effect on output or performance.
Change-Id: I8c185a211c97a85012cc54ec293c785a693608ed
Resolve the broken coding pipeline in ext-inter experiment when
cb4x4 mode is enabled. Turn off rectangular inter-intra mode.
This needs some more work to hook up. Given that it gives fairly
limited coding performance gains, disable it for the moment.
BUG=aomedia:309
Change-Id: I9b406df6183f75697bfd4eed5125a6e9436d84b0
This follows the naming for the other frame types, and allows libaom
to be compiled against other libraries that also #define NONE.
Change-Id: Ic2e2814587bbc5ea67385a9af775396d29b7dde0
* The restriction on the parameter 'delta' was too strict, so we
loosen it (delta only ever gets multiplied by -4, ... , 4,
whereas beta gets multiplied by -7, ..., 7)
* Correct a comment about the border clamping
* Fix an issue with the test case
Change-Id: I30e55203455ba6e419b5a8b646151a6d1fd5cc3b
This commit adds a new experiment, Daala's distortion function,
which is designed to better approximate perceptual distortion
in 8x8 pixel blocks.
This experiment is expected to work best with PVQ.
It measures the variance of overlapped 4x4 regions in the 8x8 area,
then uses these variances to scale the MSE of weighted frequency domain
distortion of 8x8 block.
Since AV1 calculates distortion in blocks as small as 4x4, it is not possible to
directly replace the existing distortion functions of AV1,
such as dist_block() and block_rd_txf().
Hence, there has been substantial changes in order to apply
Daala's 8x8 distortion function.
The daala distortion function is applied
after all 4x4 tx blocks in a 8x8 block are encoded (during RDO),
as in below two cases:
1) intra/inter sub8x8 predictions and
2) 4x4 transform with prediction size >= 8.
To enable this experiment, add '--enable-daala-dist' with configure.
TODO: Significant tuning of parameters is required since the function has
originally came from Daala thus most parameters would not work
correctly outside Daala.
The fact that chroma distortion is added to the distortion of AV1's RDO is
also critical since Daala's distortion function is applied to luma only
and chroma continues to use MSE.
Change-Id: If35fdd3aec7efe401f351ba1c99891ad57a3d957
This commit resolves an enc/dec mismatch issue when both filter-intra
and cb4x4 modes are enabled.
BUG=aomedia:253
Change-Id: I4026d93c00a819f2ce69aedba9d34a774319acbf
This commit enables the adaptive scan order system support
rectangular trnasform block sizes. It resolves the coding failure
when rect-tx or var-tx are enabled.
BUG=aomedia:143
Change-Id: Ic565284e811e3f7e0ebf2e08fb3748257ce8a049
Fix an encoding failure issue when var-tx is enabled, while ext-tx
and rect-tx are disabled. This doesn't change coding statistics
when all are enabled.
Change-Id: I4b32387a0a1497380980f8087832aaf6467cdcbe
This commit makes ext-tx and rect-tx experiments supported in the
cb4x4 mode. It resolves an enc/dec mismatch issue when all the
transform experiments are enabled.
The coding gains are
ext-tx + rect-tx cb4x4 vartx total
lowres 4.0% 2.3% 0.5% 6.9%
The encoding speed is about the same when cb4x4 and vartx are
further enabled.
BUG=aomedia:139
Change-Id: I3fdabc6d5de23ceb78ac0751a9bf7332ebc0a3ac
Properly determine and use horizontal and vertical masks
for loop filtering when rectangular transforms are used.
Fixes an intermittent mismatch issue and improves coding
efficiency.
BDRATE results for ext-tx + rect-tx:
lowres: -3.739% (up from -3.443%)
midres: -3.366% (up from -3.006%)
Change-Id: If26fa14261f3893662eb1245f0b876d68513247c
By turning on CONVOLVE_POST_ROUNDING, in the compound inter
prediction mode, FILTER_BITS rounding is moved after the summation
of two predictions.
Note that the post rounding is only applied on non-sub8x8 block
PSNR BDRate
lowres -0.808% -0.673%
Change-Id: Ib91304e6122c24d832a582ab9f5757d33eac876c
Instead of returning skip, av1_pvq_encode_helper and od_pvq_encode now
return ac_dc_coded. This gives more information on whether the DC part
or the AC part was skipped.
Although it is possible to obtain ac_dc_coded from the pvq_info struct,
this struct is not always used, in which case the information was lost.
This change does not impact the bitstream.
Change-Id: Ie303de915f74e8da384f822332eb1aa27f677bd3
This commit fixes an enc/dec mismatch issue in ext-partition-type
in the cb4x4 mode.
BUG=aomedia:137
Change-Id: I19f538a967a6059a40b1668eed076bc315b46149
This commit resolves the coding pipeline breakage when ext-partition
and cb4x4 are both enabled.
BUG=aomedia:138
Change-Id: Ic17da68af80d7a66565b0e1c69b895be27282a9a
If _d == 0 we are already off to the UB races due to out of bounds
access in OD_DIVU_SMALL_CONSTS.
Change-Id: I55a76c51483885bbb38667f14836be9830e130a8
The warp filter for the (0,1) case is changed to use a real
8-tap filter.
Improves coding efficiency.
BDRATE on lowres:
-0.772% (up from -0.633%) with --enable-global-motion
-1.124% (up from -1.001%) with --enable-warped-motion
Change-Id: I296efe36dbc72a7af74773b71b445f19a2aa7205
rand_r() isn't visible by default with -std=c99. this can be changed to
_POSIX_SOURCE after -std=c99 is enabled.; the portability of rand_r()
can be addressed in a future change.
BUG=aomedia:111
Change-Id: Id540f7f4a70007f70585261814b6fb09925fb32b
+ M_SQRT2 / M_SQRT1_2 to keep the daala diffs down
adapted from:
ebb9b28 Move math.h fills to odintrin.h.
these aren't visible by default with -std=c99.
BUG=aomedia:111
Change-Id: Iaa65986f35d914bf92c8c49a8211e0e6864c64e4
model_rd_for_sb() can quickly compute an approximated RD cost. We
use the estimated RD cost to skip running full RD for some bad
mode candidates.
This only affects keyframe encoding. Observed 22% encoding time
reduction, and 0.03% compression loss.
Change-Id: I793f1eda98d67e8da9bc1648dcf272222b30a556
Improve the speed of the warp filter itself by ~30%. This leads
to an overall decoder speedup of 5-20%, depending on bitrate,
for the global-motion experiment, and a small speedup for
warped-motion.
Applies a very minor change to the rounding during filter
selection (ROUND_POWER_OF_TWO makes slightly more sense here
than ROUND_POWER_OF_TWO_SIGNED, and is faster)
Change-Id: I3f364221d1ec35a8aac0d2c8b0e427f527d12e43
* Use the same function for domaintxfmrf in both highbd and lowbd
cases
* Move an assertion out of a loop in
apply_selfguided_restoration_highbd, to match the lowbd case
No change to output, but a decoder speed improvement of ~3.5%
(roughly independent of bitrate) with loop-restoration on a
10bpp sample.
Change-Id: I970a3bb8f1c6b0ac60aa4a6fe4e7f54d1e6c1452
This commit makes the motion-var support cb4x4 mode. It resolves
the encoding failure issue when both experiments are enabled.
BUG=aomedia:136
Change-Id: I2fa963d62cbdd24cc54d5a95d02f2dc226e6d2d0
Offset the default probability set in motion_var to account for
the added block sizes in cb4x4 mode.
Change-Id: I18d90fda1678fad2fc738036e0d9caff6ac894b7
This commit adds 2x2 transform block scan order to make the
adaptive scan order support cb4x4 mode.
BUG=aomedia:135
Change-Id: Ic8c3ae9ed65d577df629524b617b386b5e799d4c
When segment feature is on, frame level cm->tx_mode can be set to
ONLY_4X4 only if all segments are lossless. Otherwise will cause
bugs when xd->lossless[i] is 0 and xd->lossless[0] is 1.
Also fix the condition of coding tx_type, which should be on when
the qindex of current segment is > 0.
BUG=aomedia:106
BUG=aomedia:104
Change-Id: Ic076083bb78b3b99a6f7d17ec82ee402c64bcc52
We need uint16_t buf for storing no-rounding prediction.
Add uint16_t buf in conv_params for that.
This CL let us avoid changing interface of convolve functions.
Change-Id: I079fad911327f40ffb98e17c73e7638b1719c975
Separate prediction code and parameter generating code.
This will not change bitstream statistics.
Change-Id: I194480166d3f8641592e53683029be1d466cfba9
At the final round of encoding of each superblock, will go through
each prediction block to check if ncobmc mode is better than non-
overlapped prediction. Note that causal obmc mode is dumped here.
PSNR gain (MOTION_VAR + NCOBMC): -2.845% lowres
Change-Id: Ibe504f7f1882446a08ba426e1e9824bca73bf655
While encoding a key frame with quantizer = 0 and aq-mode = 1,
for some segment_ids, the quantizer got modified and could be
> 0, and lossless[segment_id] might be 0 or 1 depending on the
segment_id. Namely, blocks with lossless[segment_id] = 0 were
allowed to choose transform sizes other than 4x4. This conflicted
with tx_mode which was a frame-level decision. In this patch,
the transform search condition was modified so that the transform
choice was consistent with tx_mode of that frame.
BUG=aomedia:104
Change-Id: Ia39127b5dee129283a133cf5e4000da62d9e0f1c
At the edges of the picture only a subset of partitions are legal. Add
new contexts for these borders so they don't distort the probabilities of
the interior of the image where all partitions are legal.
Only include one context for each block size of each border direction
because so few blocks fall into these contexts to begin with.
objective-1-fast:
PSNR | PSNR Cb | PSNR Cr | PSNR HVS | SSIM | MS SSIM | CIEDE 2000
-0.0294 | -0.0911 | -0.2382 | -0.0481 | -0.0441 | -0.0450 | -0.0454
derf144: -0.135
lowres: -0.124
midres: -0.076
hdres: -0.078
Change-Id: I909b98eebb7e49273cde90154c8408febe334158
Adds a few options to make the compound mask lightly dependent on the
the two predictors.
Also adds high bit depth support
Change-Id: If57b6e8ddd140e0c00fd9d4738927d37225091cb
Beside above and left positions, additional above-left,
above-right, and bottom-left positions are added as
neighbor candidates.
In av1_update_neighbors, two available positions will be picked as
context neighbors.
The picking priority is
above -> left -> above-left -> above->right -> bottom->left
Change-Id: I82eaf0b23d0189caaea008ecc86776492886a05b
When both GLOBAL_MOTION and WARPED_MOTION are enabled, identify
the neighbors using global motion, and generate correct projection
samples, from which the local warped motion is estimated.
Change-Id: I13556a49649208e6f4d30bc570a41074aabc8ae6
In order to use mvs from a future block in obmc, we first send mbmi
info for the entire superblock, and then call another recursion to
handle the coeffs and recon.
Note: this change is currently not compatible with SUPERTX, later I
will move detoken and recon for supertx to a proper place
Change-Id: I19ab77fa137f53a370e68ea777f70d0306e3e303
Use a round flag in ConvolveParams to indicate if the destination buffer
has the result rounded by FILTER_BITS or not.
This CL is part of the goal of reducing interpolation rounding error in
compound prediction mode.
Change-Id: I49e522a89a67a771f5a6e7fbbc609e97923aecb6
In order to reduce the code complexity for handling parameter
coding and recon separately for each 64x64 in non-causal obmc
experiment, we break them down to two steps calling separate
functions, one for params, the other dealing with coefficients
and recon(decoder side).
Note: actually the non-causal prediction can use the original
syntax, but right now in the decoder coeff detoken and recon are
heavily nested.
Change-Id: I72d9c42ab8f38b57850d6b0481551893f1702822
End-to-end speed improvements: (measured on tempete_cif.y4m,
20 frames for encoder and all 260 frames for decoder)
* GLOBAL_MOTION encoder: ~10% faster
* GLOBAL_MOTION decoder: 100-200% faster depending on bitrate
* WARPED_MOTION encoder: ~2.5% faster
* WARPED_MOTION decoder: ~20-40% faster depending on bitrate
The improvement in the GLOBAL_MOTION decoder is particularly
large because its runtime is dominated by calls to warp_plane().
This introduces minor changes to the output of the warp filter,
but these should be rare.
Change-Id: I5813ab9e90311e27587045153c32d400b6b9eb92
- Witness the follow user-level speedup on AV1 baseline:
Encoding time reduction: 4.26%
Decoding time reduction: 25.35%
Change-Id: Ideaf3cd473ad45ed9256c80d5a5daed0a6e098cf
Fix the corner case and use the right rate cost udpate for rd_debug.
This would make the var-tx pass rd_debug test.
Change-Id: Ib0fbd2d73030c0d150222c6b7c2dfffc0c6af085
Make the transform block partition context model support the
rectangular transform block size partition. The coding gains
from cb4x4 and var-tx are:
cb4x4 + var-tx
lowres 4.3%
midres 2.6%
Change-Id: I6cc1413fbf6d7707ca7fd24300623a3f0118be7c
The generic coder now uses the AOM entropy coder API and no longer
needs to include the entenc.h and entdec.h headers.
Change-Id: I213acb5b6bd8a3fe60dc096b83d76ae72315e9de
The functions aom_encode_pvq_split() and aom_decode_pvq_split() code
the rest value as raw bits using the od_ec_enc_bits() and
od_ec_dec_bits() functions.
These code bits in the reverse order as the aom_write_literal() and
aom_read_literal() functions, so both the encoder and decoder must
be changed at the same time.
This commit has no impact on metrics but is a bitstream change.
Change-Id: Iee79777f35aebbb23043a7efa7fe439af70348ba
The functions aom_laplace_encode_special() and
aom_laplace_decode_special() code the rest value as raw bits using the
od_ec_enc_bits() and od_ec_dec_bits() functions.
These code bits in the reverse order as the aom_write_literal() and
aom_read_literal() functions, so both the encoder and decoder must
be changed at the same time.
This commit has no impact on metrics but is a bitstream change.
Change-Id: I428d5a83dd108c3a54f3c1dbae2c7fd5e59f5726
Cherry-pick Daala da7896a7
Remove double negation and added a comment explaining that this is used
for visualization. This change does not alter the bitstream.
Change-Id: I2a01ed292cc5cfa4e1bfdbc08251da6bd2c27158
Cherry-pick Daala 85433214
Fully order the pvq search candidates
For portable and stable sorting, break ties.
Large differences in output were observed between AWCY and an OS X
machine because of the platform qsort implementation.
Change-Id: I294dd2e167c1e0464c7f61f32d60ab478341446e
While sorting, preserving the order of the rest of the list when moving
an element to the top of list makes hardware implementation much simpler.
The compression performance is roughly same: overall, avg performance on
screen-content set is 0.137% better than before in fact.
Bug=aom:127
Change-Id: Id1aa1e90254b44eae9133b47bca8f853f6a62c6b
Rename encode_inter_mb_segment() so that it tells readers
that the function is only used for sub8x8 case.
Change-Id: I2d86d9efaf0e1e96446d9e2dec8a8d97772489a7
In encode_inter_mb_segment(), when BLOCK_8X4 or BLOCK_4X8 is
passed, the nested loop inside it iterates always twice.
(For BLOCK_4X4, loop iterates only once because encode_inter_mb_segment()
is called for each of 4X4 block.)
Then, the k for 1st iteration is always zero, and the k for 2nd
iteration is always (idy * 2 + idx) with either idy == 1 or idx == 1
depending on the sb_type.
Using "+=" there could mislead readers expecting that
the # of iterations is more.
And probably using simple assignment would be more proper here.
Change-Id: I7a11255eca13403bc090ba4f0cd4785db9f0e541
Change the od_decode_cdf_adapt() function to take an aom_reader
struct instead of an od_ec_dec struct.
Rename od_decode_cdf_adapt() to aom_decode_cdf_adapt().
Change-Id: I0713d2f56acfea3f67f1b4087c0feee77c2e25cb
Change the laplace_decode_special() function to take an aom_reader
struct instead of an od_ec_dec struct.
Rename laplace_decode_special() to aom_laplace_decode_special().
Change-Id: I137ae9a4df3fb0fd0b54dea09f787f70a7d287f5
Replace the passed in bit accounting string from OD_ACCOUNTING with the
current function name as ACCT_STR in preparation for the migration to
CONFIG_ACCOUNTING.
Change-Id: Ib9946232b37cacfd88f6ff914b99e91c3d7b650e
Slight improvement in midres and hdres sets of 0.02% and 0.0.9%
respectively.
This is also a better design anyways.
Change-Id: I15b60b8836070a2132641e5b1d8e9f68df426c08
Disable warped motion mode when the model parameters are out of the
range of the new interpolation algorithm.
Performance: 1.1% lowres (was 1.2%)
Change-Id: I947ce3fd07e0d574d66333c1a729e85ba0294b4a
Separate the aom_read_cdf() functionality from aom_read_symbol() which
can optionally adapt the cdf when run with --enable-ec_adapt.
Change-Id: I5446d6402835dfcf68d3462a2bd8835704fe6603
Change the od_encode_cdf_adapt() function to take an aom_writer
struct instead of an od_ec_enc struct.
Rename od_encode_cdf_adapt() to aom_encode_cdf_adapt().
Change-Id: I00de05b8b7428f67139c234160ab9aaf8900f967
Change the od_laplace_encode_special() function to take an aom_writer
struct instead of an od_ec_enc struct.
Rename od_laplace_encode_special() to aom_laplace_encode_special().
Change-Id: Ieba63c8519d363081124a11e633b437adccfa500
Separate the aom_write_cdf() functionality from aom_write_symbol() which
can optionally adapt the cdf when run with --enable-ec_adapt.
Change-Id: Ibc58690eddb647d69f08d72f0f0712779aab11d1
This large function is solely used for the RDO search for
inter prediction mode. It would be helpful for readers if its name
tells that whole function is used for inter mode decision only.
Change-Id: Ida366b142b7129bf89498227d186c54341c3af5e
In this case, calculating the shear parameters fails
with a divide-by-zero error. So disable the new filter
in this case.
We also temporarily remove the asserts blocking use
of the old filter with debugging enabled.
Change-Id: I788ff51c3bc1d841eab1099881cc3b55038ae342
* Change the behaviour of search_wiener at borders to match
the behaviour of the Wiener filter itself
* Reorder the calculation in compute_stats, saving ~5% of
encode time at low bitrates (tested on bus_cif.y4m at 200kbps)
Change-Id: I5f649d77fd66584451aaf37697ce9c9af69524e4
* Optimize the self-guided and domaintxfmrf filters
* Save 576KiB of buffers in the encoder and decoder
* Disable self-guided filter for videos whose width or
height is < 5, in order to help simplify the filter.
This results in an overall 30-40% improvement in decoder
speed with loop-restoration enabled (depending on source
and bitate), with no effect on video quality, *except* for
videos with width or height < 5 pixels.
Change-Id: Ide9181118ec3a63a0335338f316505b08df2d831
Fix an intricacy due to interactions between cb4x4 and var-tx that
sets frame header away from tx_mode_select. This resolves a rare
enc/dec mismatch issue.
Change-Id: I6981f21f7e6f04f2a47ef32f744f83a8fd34355b
The bit accounting was broken when refactor portions of PVQ to use the
aom_reader / aom_writer API because the daala_ec calls were using
OD_ACCOUNTING instead of CONFIG_ACCOUNTING.
This fixes them so that bit accounting will still work with pvq while
the full port to --enable-accounting is in review.
Change-Id: I99e6b6debc716f1a6780116d5602085f7a2bb827
This commit reworks the transform block partition context update
to support cb4x4 mode in the recursive transform block partition.
It resolves the remaining enc/dec mismatch issue when both cb4x4
and var-tx are turned on.
Change-Id: I850d121204fe4c68e81488f1d2848c570d9d08b9
Enables Wiener based loop restoration only for the UV
frames. The selfguided and domaintranform filters do not
work very well for UV components, hence they are disabled.
For each UV frame a single set of wiener parameters are
sent. They are applied tile-wise, but all tiles use the
same parameters.
BDRATE (Global PSNR) results:
-----------------------------
lowres: -1.266% (up from -0.666%, good improvement)
midres: -1.815% (up from -1.792%, tiny improvement)
Tiling on UV components will be explored subsequently.
Change-Id: Ib5be93121c4e88e05edf3c36c46488df3cfcd1e2
The functions generic_encode() and generic_decode() code the lsb values
as raw bits using the od_ec_enc_bits() and od_ec_dec_bits() functions.
These code bits in the reverse order as the aom_write_literal() and
aom_read_literal() functions, so both the encoder and decoder must
be changed at the same time.
This commit has no impact on metrics but is a bitstream change.
Change-Id: I83546e2d4b73c28a7f269ddc850742df53d227ce
Delete the unused od_laplace_decode(), od_laplace_decode_vector(), and
laplace_decode_vector_delta() functions.
Change-Id: Iec581e8cdb0bc9cac9199c09486891500c707c03
Change the od_decode_band_pvq_splits() and od_decode_pvq_split()
functions to take an aom_reader struct instead of an od_ec_dec struct.
Rename od_decode_band_pvq_splits() to aom_decode_band_pvq_splits() and
od_decode_pvq_split() to aom_decode_pvq_split().
Change-Id: I5979b32977377e1541c609a13242852e5cfab233
Change the od_decode_pvq_codeword() function to take an aom_reader
struct instead of an od_ec_dec struct.
Rename od_decode_pvq_codeword() to aom_decode_pvq_codeword().
Change-Id: I9fc2dda28a6169cb04410e822070991f3bcbc25a
Change the pvq_decode_partition() function to take an aom_reader struct
instead of an od_ec_dec struct.
Change-Id: I7247aaa0be3eedd336371ba677dc2d9f16f27d20
Use the generic AOM entropy decoder in the daala_dec_ctx struct.
This is done in preparation for migrating other entropy coder calls to
use the more generic entropy coding API.
Change-Id: I473a278174195401bcf35730fb5db7eb368b097a
Change the od_encode_band_pvq_splits() and od_encode_pvq_split()
functions to take an aom_writer struct instead of an od_ec_enc struct.
Rename od_encode_band_pvq_splits() to aom_encode_band_pvq_splits() and
od_encode_pvq_split() to aom_encode_pvq_split().
Change-Id: I72e6684e032f4c8f9f9133c6102f870830001712
Change the od_encode_pvq_codeword() function to take an aom_writer
struct instead of an od_ec_enc struct.
Rename od_encode_pvq_codeword() to aom_encode_pvq_codeword().
Change-Id: I1254eca06291740770a4371dc01c78c12e613c3a
Change the pvq_encode_partition() function to take an aom_writer struct
instead of an od_ec_enc struct.
Change-Id: I459d31c600467958c9a1cbebd632fec05e01f534
Delete the unused od_laplace_encode(), od_laplace_encode_vector(), and
laplace_encode_vector_delta() functions.
Change-Id: I92e393836c0ba4e5149b2565e7142a161c44c612
Use the generic AOM entropy encoder in the daala_enc_ctx struct.
This is done in preparation for migrating other entropy coder calls to
use the more generic entropy coding API.
Change-Id: Id627d12402a397bcb21d48d896c0de249d4d8657
Change the od_decode_cdf_adapt_q15() function to take an aom_reader
struct instead of an od_ec_enc struct.
Rename od_decode_cdf_adapt_q15() to aom_decode_cdf_adapt_q15().
Change-Id: I72315c6e89d689e232c53a99a7d4e0f9cdcfbd0c
Change the od_encode_cdf_adapt_q15() function to take an aom_writer
struct instead of an od_ec_enc struct.
Rename od_encode_cdf_adapt_q15() to aom_encode_cdf_adapt_q15().
Change-Id: I631af7be4b553fbb10a4c72e1958aa48a4c8245a
* Remove some unused variables
* Reduce need for casts by typing intermediate buffers appropriately
* Avoid copying data which is never modified; use the original data
instead.
* Reduce number of intermediate buffers required, saving allocations
of 576KiB in the decoder and ~1MiB in the encoder
No effect on performance
Change-Id: I55243904dd8e818fb6d43fa431903736475d23ff
Similarly to the refactoring of PVQ codes for 4x4 intra,
instead of calling tx and pvq_encode_helper() in 4x4 inter,
av1_xform_quant() is called.
This commit gives no change in metrics.
Change-Id: Ib69efb00ed5a5b2254478bf5db5a19d9dac12b3b
This commit adds a new experiment to allow disabling of loop filtering
on tile boundaries. It is implemented by adding a syntax field
"loopfilter_across_tiles_enabled" into the uncompressed frame header.
If it is set to 0, decoder and encoder will disables loop filtering for
block edges that are also tile boundaries.
Change-Id: Ib80bfd82d49c74f1ba46ae18ceedb30704ac8aa5
In 4x4 intra search for RDO, AV1 codes has been changed to
call av1_xform_quant() while ago, while PVQ did not but call
txfm and pvq_encode_helper() instead, which caused duplicated codes
and thus worse maintenance and testing.
This refactor also has fixed the long-sitting bug,
which we couldn't find before refactoring.
PSNR PSNR-HVS SSIM FAST-SSIM CIEDE 2000 MS-SSIM
-2.77 -2.62 -2.90 -4.07 -2.94 -2.63
Change-Id: I6e526123a64af810897962d11d53028719e82e16
If --enable-entropy-stats is on, the aggregate counts for each
frame are written out to a file named counts.stt.
Change-Id: I0c73ab872183a9dbd6d767a8c6f0642c5c117253
The convolve filters generated by loop_wiener_filter_tile
are not compatible with some existing convolve implementations
(they can have coefficients >128, sums of (certain subsets of)
coefficients >128, etc.)
So we implement a new variant, which takes a filter with 128
subtracted from its central element and which adds an extra copy
of the source just before clipping to a pixel (reinstating the
128 we subtracted). This should be easy to adapt from the existing
convolve functions, and this patch includes SSE2 highbd and
SSSE3 lowbd implementations.
Change-Id: I0abf4c2915f0665c49d88fe450dbc77b783f69e1
The bmi structure for sub8x8 block is deprecated in the cb4x4 mode.
Always fetch the transform type from coding block's mode_info
structure directly.
Change-Id: I8df8536e1a1723b292600018c4843e5fcc025284
This commit allows the sub8x8 blocks to compose and filter their
chroma components for supertx in cb4x4 mode. The coding gains of
supertx and cb4x4 are largely additive:
supertx cb4x4 cb4x4 + supertx
lowres -1.0% -2.7% -3.64%
midres -0.8% -1.3% -2.10%
Change-Id: Ie7d09f6fceb36ce375e56773728f05dd628786fe
This makes the cb4x4 mode support supertx experiment. It resolves
the enc/dec mismatch issue when both experiments are turned on.
Change-Id: If3f70fb26862b4ea95d73f7030f86a399051e21e
With PVQ, the dst buffer should be initialized as zero
before av1_inv_txfm_add_*() is called.
This bug seems introduced during resolving conflicts
when nextgenv2 was merged.
BD-Rate change:
PSNR PSNR-HVS SSIM CIEDE 2000 MS SSIM
subset1-mono -0.25 -0.25 -0.23 -0.26 -0.23
objective1-fast -0.17 -0.26 -0.14 -0.04 -0.18
Change-Id: I7c6b793ba0aa5f1e3d419312cbbe5c207a68f1f8
This commit fixes the 2x2 transform system setups for high bit-
depth setting. It enables the cb4x4 mode to support high bit-depth
process. The coding performance is improved over high bit-depth +
ref-mv:
lowres 2.5%
midres 1.2%
Change-Id: I351f9d72bdc7e15b2bd00e94286b98966a295e6d
* Fix a bug in warp_erroradv introduced by previous patch
* Add highbd version of the new warp filter
Change-Id: I791d3a97baf86f0cbfc72880776848f93df6daa6
When enable_optimize_b is false in av1_encode_intra_block_plane the
entropy contexts were never initialized.
No changes on metrics for objective-1-fast when no experiment is
enabled.
Change-Id: Ic68913f6400d2becbaec3cc14214a0257530ed0b
This commit allows the dynamic motion vector referencing system to
scale its search range according to the coding block size. This
provides higher search resolution for smaller size coding unit.
The cb4x4 mode improves the compression performance across all the
test sets:
avg low mid high
lowres 2.8% 2.4% 3.1% 3.0%
midres 1.3% 0.3% 1.8% 2.7%
hdres 0.9% 0.5% 1.4% 1.5%
Change-Id: I1bc501506a9f2f06071c5274391f6bd053b235a7
Use the proper scaling factor to decide if a block is sitting on
the frame border. This refactor does not change the coding
statistics of the code base. It fixes an enc/dec mismatch issue
due to out of boundary memory access in the cb4x4 mode.
Change-Id: Ia1e999c0f4e4ef10aac6120e69c1fb10a738dd4d
Refactor the fill_token_cost() function to automatically compute
the token cost arry for all transform block sizes.
Change-Id: I2f44c9c08fb169bc14282ba48bce23577b1ab184
This commit makes the encoder to properly account for all transform
block sizes when combining statistics from encoding threads.
Change-Id: I010acd3b247dc890f63756d3d1436b1fb52ea2d9
This uses a segmentation mask (which is temporarily computed arbitrarily)
to blend predictors in compound prediction. The mask will be computed
using a color segmentation in a followup patch.
Change-Id: I2d24cf27a8589211f8a70779a5be2d61746406b9
This function corrected for the fact that the old bilateral and
Wiener filters would not write to the outermost 3 pixels of the
destination. Now that the bilateral filter has been removed and
the Wiener filter has been rewritten, this is no longer necessary.
No effect on performance
Change-Id: I3f3b0a759bdb9ff1e2407affe963388e76a9c9e6
Failure brought by 45dc597a
Also harmonize the high-bit-depth and regular versions
of directional intra prediction.
Change-Id: I7ed6602ccbfb53470cb7e9d8f428b17a860ca596
When PVQ is on, we reencode at the end of choose_tx_size_type_from_rd to
get the entropy contexts right, previously this was done using
txfm_rd_in_plane but this is different from the encodes done in the loop
which use txfm_yrd, the result is that rd_stats is set incorrectly at
the end of choose_tx_size_type_from_rd when PVQ is on.
Results on objective-1-fast with --limit=5:
PSNR | PSNR Cb | PSNR Cr | PSNR HVS | SSIM | MS SSIM | CIEDE 2000
-0.5803 | -1.0598 | -1.4565 | -0.3377 | -0.8153 | -0.5934 | -0.9943
See https://goo.gl/Hvv0E2
Change-Id: Iccc7b0afaff849f959a0084eb48dbb838bc3cb1a
This commit enables the 4x4 level block partition search. It turns
on the 4x4 level coding block unit.
Change-Id: I7251db10176fd6c4f853604d263170721252dd4f
This is the same change as a94997aa90, it
has to be applied again as it was accidentally removed in the merge of
nextgenv2 (f883b42cab).
Change-Id: Ic9c47766e9e7d189885ce2c774b92d1796a9a574
Includes:
Some cleanups/refactoring
Better buffer management.
Some preps for future chrominance restoration.
Change-Id: Ia264b8989b5f4a53c0764ed3e8258ddc212723fc
This commit makes the rate-distortion optimization search of a
given block size support 4x4 level coding block unit.
Change-Id: I0149c3576af929bf2feb1c40850b53b21b3dca71
Support 4x4 level coding block context_tree. This would make the
leaf nodes redundant. Need to remove those after cb4x4 mode is
stable.
Change-Id: Ida33eddbca384a949bb0bf46b7dabaadcab42542
When both directions pick sharp filter, horizontal direction use
12-tap sharp filter and vertical direction uses 8-tap sharp filter.
BDRate performance drop slightly.
BDRate
lowres -0.083%
midres -0.073%
hdres -0.016%
Change-Id: I6dc075af98f6b4fae558827424a7dd8f38d56503
Remove the use case of bmi->as_mode in cb4x4 mode. Its function is
covered by 4x4 level mode_info.
Change-Id: I04abc1b7a0a97c12c3b6fddc1f16f7045512772e
This commit moves a number of large buffers from stack to heap to fix
crashes due to stack overflow.
Change-Id: I9d1592e4f6dbfa18a475d0fc5674f6d3632f39ed
Turned off, by default.
TODO: The distortion function of Daala should be added
to complete the activity masking working.
Note that PVQ QM matrix (i.e. scaler for each band of
transform block) is calculated in decoder side as exactly same
way in encoder. In Daala, this matrix is written to bitstream
and decoder does not generate it.
Activity masking can be turned on by setting below flag as 1:
Change-Id: I44bfb905cb4e0cad6aa830a4c355cd760a993ffe