Currently, trellis optimization is performed in
av1_tx_block_rd_b when var-tx is enabled even when DISABLE_TRELLISQ_SEARCH
is set to 1.
The drop in performance when DISABLE_TRELLISQ_SEARCH is set to 1 is
1.8% on lowres
Change-Id: I89e26d4d4f57944db11b528d0e10048ae650d8a1
Support the transform block kernel coding for rectangular
transform block size in var-tx. This integrates txk-sel with
var-tx.
Change-Id: I9a8edd84812168f56c79b78cc9af34f6304b1d54
Define the syntax and entropy coding templates for
NCOBMC_ADAPT_WEIGHT. The actual values of the default
probabilities and the index tree structure need to
be fine tuned.
In this experiment all mv's in a superblock are sent
first as in the ncobmc case.
Change-Id: I68d50d3d27346c2847ea449a1168c6a99fbb4d3d
This change does not impact the bitstream, it changes how to distortion
is summed when evaluating alpha. The sum is still taken over the entire
partition. However, instead of iterating over the entire surface all at
once, CfL now iterates over each transform block. This is in light of
future work to compute alpha over transform blocks and not prediction
blocks.
Results on Subset1 (compared to 9c6f854 with CfL)
PSNR | PSNR Cb | PSNR Cr | PSNR HVS | SSIM | MS SSIM | CIEDE 2000
0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000
Change-Id: Ic7b72201d29ad6b2527748e35b212bec515e3bdb
Patch b1bedf5f73 converted the three writes in bitstream.c that
specify an extended transform from using av1_write_token (encoded with
probability trees) to aom_write_symbol (encoded with CDFs).
That patch fixed up the two reads in decodemv.c but didn't fix up the
corresponding read in decodeframe.c. This patch does so.
The patch also fixes up a write of a (non-extended) transform when not
CONFIG_EXT_TX and the corresponding read.
Change-Id: Ibf5dcfcf3e7122f08dd0ef8616fb0ecddb95d99a
Support transform block level kernel selection in the recursive
transform block partitioning search.
Change-Id: I511c39705ee636b0c9fabbe4720fe5a9764b964a
mi[0] is not set properly when encoding all mvs in a
super-block first. After this patch NCOBMC can function
properly.
Change-Id: I149a50184c4823c0d3b82b6b21c7608e639668e6
The var-tx has its own suite of tx size/type RD search functions,
which recursively split the partition into square tx blocks.
The Daala-dist requires access to 8x8 pixels (both decoded and predicted)
since it measures the distortion for multiple of a 8x8 pixels.
Thus, if tx block is smaller than 8x8, it waits until all of sub8x8 blocks
are RD searched (with MSE) then replaces the MSE of 8x8 pixels with
daala-dist's calculated distortion for 8x8 pixels.
It is also applied to luma pixels only.
Change-Id: Ic4891e89b4ef05cf880aa26781d2d06ccf3142de
In previous ADSTs, DST-7 and DST-4 are used for length 4 and length
8/16/32, respectively. In this LGT experiment we explore transforms
between DST-4 and DST-7. When CONFIG_LGT flag is on, adst4 and adst8
are replaced by lgt4 and lgt8, the intermediate transforms with
pre-chosen parameters.
The LGTs applied here are lgt4_160 and lgt8_170, where the numbers
mean the self-loop weights times 100. The associated values for DST-7
and DST-4 are 100 and 200.
ovr_psnr:
lowres: -0.140
midres: -0.131
hdres: -0.078
These changes are not applied to the highbd scenario in the
current version.
Change-Id: I20600456da8766528b2b6b11aa28801e70af498e
- If invisible pixels, av1_daala_dist() simply use source pixles for dst.
- Added av1_daala_dist_diff() which inputs diff signal instead of dst.
- Refactored daala_dist code so that av1_daala_dist() and _diff()
is called inside av1's distortion calculation functions, pixel_sse() and
sum_squares_visible().
Change-Id: Id857db52fe19856d92c46a9e84ac2962c01ae045
This reverts commit a3d70911c3.
Reason for revert: this was fixed in cb63767 which moved the
definition to a different line causing this change to merge
cleanly, resulting in a duplicate.
Change-Id: I2d8763f0e2af320f043a1417ba33e82f82163592
- First pass encoding time reduces ~10.9% on i7-6700
at 100 frames, 1080p.
- avx2 works for coeff number >= 8 cases; coeff number < 8
case will be implemented by sse2.
- Unit test is added type B/FP/DC.
Change-Id: Ibe5b7807c64e6dfc2d59c470ed50a6e8ca94ef7c
Previously, for block >=8x8, and tx < 8x8,
we skipped setting the early-exit flag in block_rd_txfm() because
distortion for sub8x8 tx block is from MSE but reference (best)
is from daala-dist.
However, not setting early-exit flag turned out to be the reason
for a regression in MSE probe mode of daala-dist because
it loses the chance to set rd_stats properly.
On the other hand, there is still a small regression, say 0.05% psnr bd-rate,
which seems to occur in the case that a tx block in a partition has chosen
the skipped rd_cost since it is smaller than non-skip rd_cost and
set the early-exit flag to 0 (so, not exit), but the daala-dist applied
to the whole partition cannot access the same info but can choose from
two kinds of rd_costs:
1) all tx blocks are skipped (even if a tx block has non-zero coeff) and 0 bits
2) sum of final distortion of all tx blocks (i.e. non-zero coeff decoded)
and bits to encode coeffs.
Change-Id: I2ec69972aa1f22d465293cb9e8d5e18ef2c6f7f3
Adds an option bit in the bitstream syntax to allow chroma to
have restoration tilesize that is coupled to luma based on
subsmapling of the color components.
This is meant to ease encoder hardware implementation.
Change-Id: Ic3cc2b68c0f33701ed3ff2fe19cf57cd864da67f
cb4x4 itself should not require these sizes.
This simplifies compatibility with other experiments, since we can
first make them work with cb4x4 (which is now on by default), and
then worry about chroma_sub8x8 and chroma_2x2 (which is not) in
separate steps.
Encoder and decoder output should remain unchanged.
Change-Id: Iff2a5494cab3b7d96f881e8bd9cd4bf18c817cfa
When writing the compressed header, prob_diff_update() was called
for compound_type_prob[] for every defined block size, even though
luma never uses block sizes smaller than 4x4.
This fixes is_any_masked_compound_used() and
is_interinter_compound_used() to properly return 0 for chroma-only
block sizes, and then uses these functions to guard the probability
updates in write_compressed_header() and read_compressed_header(),
the same way the actual compound type values are guarded in
read_inter_block_mode_info() and pack_inter_mode_mvs().
Change-Id: Ib521cf53f9ec166ef634609c8b47c5814b6a9ff5
Without tempmv-signaling configured, using the previous frame's MVs
requires that the last frame was a show frame. With tempmv-signaling
configured, cm->show_last_frame is not checked when calculating
use_prev_frame_mvs. This patch adds that check and resolves mismatches
seen with random resizing and random superres.
Includes a couple fixes too - cm's last_width, last_height, and
last_show_frame were updated under different conditions. Now they're all
updated at the same time.
Change-Id: Ibdfb196cb6e9d002fd57cb4df10a899b60faac00
A motion refining was added in warped motion, which required the
declaration of rate_mv_bmc in warped motion.
BUG=aomedia:613
Change-Id: I74dfc396f915a5cc4599bfbdccad758fa630505f
CfL performs an extra loop iteration during luma mode selection. Recent
changes have broken the extra iteration. Remove previous approach.
New approach adds the extra iteration right before uv parameter
selection. Interesting fact, If the best luma intra mode already has
worse RD performance than the best inter mode found so far (if any),
then the entire chroma intra search is skipped, including the extra
iteration.
Results on Subset1 (compared to 3e18e4a with CfL)
PSNR | PSNR Cb | PSNR Cr | PSNR HVS | SSIM | MS SSIM | CIEDE 2000
-0.3090 | -2.7271 | -2.3521 | -0.3369 | -0.3463 | -0.3525 | -1.1868
Change-Id: If67b0badd2c8ea25c61685483d39d622c1729b18
Updates to intra-edge experiment
- Convert VP9-style intra pred to Ext-intra style
- Upsample edge predictors by 2x based on angle and edge size
BD-rate, 1-kf AWCY
360p: -0.11%
720p: -0.54
1080p: -0.96
Change-Id: Ib73805d31d5d286e607a7ee7470fcbdf11edbbff
Extract the compution of the luma reconstructed average out of cfl_load
and into cfl_compute_average. The reconstructed luma average is stored
in the CFL_CONTEXT to avoid computing it for each transform block and
for each plane.
Results on subset1 (compared to 803bea2 with CfL)
PSNR | PSNR Cb | PSNR Cr | PSNR HVS | SSIM | MS SSIM | CIEDE 2000
-0.0474 | -0.1486 | -0.2931 | -0.0358 | -0.0397 | -0.0127 | -0.1162
Change-Id: I9e34af0fe5961ce8dbe70cb80aea2a16221d0d92
They do not handle border extension correctly (interpolation and
border extension do not commute unless you upsample into the
border), nor do they handle crop dimensions that are not a multiple
of 8 (the upsampled version is not sufficiently large), in addition
to using massive amounts of memory and being a criminal waste of
cache (1 byte used for every 8 bytes fetched).
This commit reimplements use_upsampled_references by computing the
subpixel samples on the fly. This implementation not only corrects
the border handling, but is also faster, while maintaining the
same quality.
HL AWCY results are basically noise:
PSNR | PSNR HVS | SSIM | MS SSIM | CIEDE 2000
0.0188 | 0.0187 | 0.0045 | 0.0063 | 0.0228
Change-Id: I7527db9f83b87a7bb8b35342f7e6457cd0bef9cd
The tool of ext-comp-refs adds the uni-directional compound reference
prediction. In details, 3 pairs of uni-direcitonal compound references
are added for the comp ref prediction:
(LAST_FRAME, LAST2_FRAME),
(LAST_FRAME, GOLDEN_FRAME), and
(BWDREF_FRAME, ALTREF_FRAME).
This new tool of ext-comp-refs will eventually overwrite
one-side-compound and have the two coding tools to merge to one.
It achieves -0.35 ~ -0.55% coding gains in BDRate, compared against
AV1 baseline with the default experiments on, but without
one-sided-compound. It achieves -0.2% ~ -0.3% coding gains when
one-sided-compound is on. It achieves larger gains on higher
resolution.
Change-Id: Icbdb16e97b96aaebaf2213f5f72d5331e2e358eb
Although this does not fully convert var-tx to using
av1_block_dist(), it does make it use the same distortion functions
av1_block_dist() uses: pixel_sse() and sum_squares_visible().
Change-Id: I1173bc6941a3b895381b9fcb73b533b5afc31aab
Small change to calculate the encode size for scale checking using the
av1_calculate_scaled_size function used elsewhere instead of calculating
it in place. Done for constistency's sake.
Change-Id: I72626b729477e28e868cf9028ea4537267a12413