Support the transform block kernel coding for rectangular
transform block size in var-tx. This integrates txk-sel with
var-tx.
Change-Id: I9a8edd84812168f56c79b78cc9af34f6304b1d54
Define the syntax and entropy coding templates for
NCOBMC_ADAPT_WEIGHT. The actual values of the default
probabilities and the index tree structure need to
be fine tuned.
In this experiment all mv's in a superblock are sent
first as in the ncobmc case.
Change-Id: I68d50d3d27346c2847ea449a1168c6a99fbb4d3d
This change does not impact the bitstream, it changes how to distortion
is summed when evaluating alpha. The sum is still taken over the entire
partition. However, instead of iterating over the entire surface all at
once, CfL now iterates over each transform block. This is in light of
future work to compute alpha over transform blocks and not prediction
blocks.
Results on Subset1 (compared to 9c6f854 with CfL)
PSNR | PSNR Cb | PSNR Cr | PSNR HVS | SSIM | MS SSIM | CIEDE 2000
0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000
Change-Id: Ic7b72201d29ad6b2527748e35b212bec515e3bdb
Includes reordering and other clamping changes, as well as
changes to reduce multiplier precision.
cam_lowres (60 frames): -0.092% BDRATE improvement in
--disable-cdef --disable-global-motion --disable-ext-tx
configuation.
Change-Id: I0660c45b44fcd5a193534d8dadd1aa1ae5c5e27a
We are going to have several commits to setup new low/high
bitdepth data path selection logic. This patch is for inverse
transform. Let me summarize the ideas as following.
- For low/high bitdepth selection, encoder depends on
input configuration, e.g., video sequence bitdepth,
profile. Decoder depends on input bitstream. This has
nothing to do with compiler/build configuration.
- Typical encoder usage for sampling format 4:2:0.
1) 8-bit video sequence:
a) --profile=0
Fastest encoding/decoding pipeline on speedup.
b) --profile=2 --bit-depth=10
Image pixels are left shifted by 2 bits. It
employs 16-bit reference frame buffer and has high
calculation precision. It usually enjoys higher
compression performance.
2) 10/12-bit video sequence (HDR):
--profile=2 --bit-depth=10/12
- Transform coefficient type:
Lowbitdepth: int16_t
Highbitdepth: int32_t
- The type, tran_low_t is still used in codebase,
Which is int32_t, defining the data path capacity.
Naturally, it is high bitdepth.
Eventually we shall remove the configuration flags,
CONFIG_HIGHBITDEPTH/CONFIG_LOWBITDEPTH, and seperate
low and high bitdepth data path. Two data paths co-exist
in the same build environment.
Change-Id: I35c06d4d4f19ebf80d909168fdddbae57c3cc884
Patch b1bedf5f73 converted the three writes in bitstream.c that
specify an extended transform from using av1_write_token (encoded with
probability trees) to aom_write_symbol (encoded with CDFs).
That patch fixed up the two reads in decodemv.c but didn't fix up the
corresponding read in decodeframe.c. This patch does so.
The patch also fixes up a write of a (non-extended) transform when not
CONFIG_EXT_TX and the corresponding read.
Change-Id: Ibf5dcfcf3e7122f08dd0ef8616fb0ecddb95d99a
Support transform block level kernel selection in the recursive
transform block partitioning search.
Change-Id: I511c39705ee636b0c9fabbe4720fe5a9764b964a
mi[0] is not set properly when encoding all mvs in a
super-block first. After this patch NCOBMC can function
properly.
Change-Id: I149a50184c4823c0d3b82b6b21c7608e639668e6
This change makes the conversions similar to those in av1_quantize.c,
and fix ubsan warnings shown in nightly tests.
Change-Id: I90851a80dcb9f052a32bf22199fd9ef8ff927725
The var-tx has its own suite of tx size/type RD search functions,
which recursively split the partition into square tx blocks.
The Daala-dist requires access to 8x8 pixels (both decoded and predicted)
since it measures the distortion for multiple of a 8x8 pixels.
Thus, if tx block is smaller than 8x8, it waits until all of sub8x8 blocks
are RD searched (with MSE) then replaces the MSE of 8x8 pixels with
daala-dist's calculated distortion for 8x8 pixels.
It is also applied to luma pixels only.
Change-Id: Ic4891e89b4ef05cf880aa26781d2d06ccf3142de
In previous ADSTs, DST-7 and DST-4 are used for length 4 and length
8/16/32, respectively. In this LGT experiment we explore transforms
between DST-4 and DST-7. When CONFIG_LGT flag is on, adst4 and adst8
are replaced by lgt4 and lgt8, the intermediate transforms with
pre-chosen parameters.
The LGTs applied here are lgt4_160 and lgt8_170, where the numbers
mean the self-loop weights times 100. The associated values for DST-7
and DST-4 are 100 and 200.
ovr_psnr:
lowres: -0.140
midres: -0.131
hdres: -0.078
These changes are not applied to the highbd scenario in the
current version.
Change-Id: I20600456da8766528b2b6b11aa28801e70af498e
- If invisible pixels, av1_daala_dist() simply use source pixles for dst.
- Added av1_daala_dist_diff() which inputs diff signal instead of dst.
- Refactored daala_dist code so that av1_daala_dist() and _diff()
is called inside av1's distortion calculation functions, pixel_sse() and
sum_squares_visible().
Change-Id: Id857db52fe19856d92c46a9e84ac2962c01ae045
This patch changes the motion vector scaling and clamping to be
slightly more accurate (removing an occasional 1px offset due to
multiple roundings) and fixes the border clamping when scaling
frames.
Change-Id: I032dc0b87854eebafa58f1f803981e23c8cc2d9b
Avoids mixing accesses to ctx->pending_cx_data
with serialization logic.
"index_sz" is deduced from the write position,
instead of being redundantly computed.
Change-Id: Ic14f93886da61acc1735fbbe4f787e45a4ca79eb
This reverts commit a3d70911c3.
Reason for revert: this was fixed in cb63767 which moved the
definition to a different line causing this change to merge
cleanly, resulting in a duplicate.
Change-Id: I2d8763f0e2af320f043a1417ba33e82f82163592
0835e7b80 left out the required changes to aomenc.c for the KF numerator
arguments for resize and superres to work. This patch adds them.
Change-Id: I350b01c8b187188de5313fffaa15c1ec9f052469
NCOBMC_ADAPT_WEIGHT allow using different interpolation kernels
to combine overlapped predictions generated using mvs in the
neighboring blocks.
This experiment will build on top of MOTION_VAR and it might conflict
with WARPED_MOTION in the developing stage, so will only be effective
when MOTION_VAR is on and WARPED_MOTION is off.
Change-Id: I4f1b6e55b6146ed443955751c09bfa22ef2f33e8
The special path in build_inter_predictors for chroma blocks
corresponding to sub8x8 luma blocks would always fetch the
scale factors from 'xd', which correspond (at least in the
decoder) to the references of the block for which is_chroma_reference
returns true.
The correct behaviour is to fetch the scale factors from 'ref_buf',
which corresponds to the references of the block currently being
predicted.
This patch fixes some encode/decode mismatches which were caused
by the above behaviour when the various reference frames had
different sizes.
Change-Id: I48a0a167ea25d47d08018016cf8b77885b3b5d6b
this normalizes these tests with the regular variance ones both in
implementation and test list output
Change-Id: Iaa549f2e2a054d716c24f5a64baf700747c55295
- First pass encoding time reduces ~10.9% on i7-6700
at 100 frames, 1080p.
- avx2 works for coeff number >= 8 cases; coeff number < 8
case will be implemented by sse2.
- Unit test is added type B/FP/DC.
Change-Id: Ibe5b7807c64e6dfc2d59c470ed50a6e8ca94ef7c
Previously, for block >=8x8, and tx < 8x8,
we skipped setting the early-exit flag in block_rd_txfm() because
distortion for sub8x8 tx block is from MSE but reference (best)
is from daala-dist.
However, not setting early-exit flag turned out to be the reason
for a regression in MSE probe mode of daala-dist because
it loses the chance to set rd_stats properly.
On the other hand, there is still a small regression, say 0.05% psnr bd-rate,
which seems to occur in the case that a tx block in a partition has chosen
the skipped rd_cost since it is smaller than non-skip rd_cost and
set the early-exit flag to 0 (so, not exit), but the daala-dist applied
to the whole partition cannot access the same info but can choose from
two kinds of rd_costs:
1) all tx blocks are skipped (even if a tx block has non-zero coeff) and 0 bits
2) sum of final distortion of all tx blocks (i.e. non-zero coeff decoded)
and bits to encode coeffs.
Change-Id: I2ec69972aa1f22d465293cb9e8d5e18ef2c6f7f3