Граф коммитов

1753 Коммитов

Автор SHA1 Сообщение Дата
Yaowu Xu 5e8007fe01 align buffers to avoid Segmentation fault
BUG=aomedia:579

Change-Id: I41731a71e25db429a78512a72afed10b6929da70
2017-06-28 12:52:50 -07:00
Yi Luo 0f4195c218 Fwd txfm and quantizer HBD/LBD data paths co-exist
Change-Id: Iaae46d0735539b8b8daf9faac81c2a3434838020
2017-06-28 17:40:28 +00:00
Sarah Parker 5680a01d5b Extend DISABLE_TRELLISQ_SEARCH to include calls from var-tx
Currently, trellis optimization is performed in
av1_tx_block_rd_b when var-tx is enabled even when DISABLE_TRELLISQ_SEARCH
is set to 1.

The drop in performance when DISABLE_TRELLISQ_SEARCH is set to 1 is
1.8% on lowres

Change-Id: I89e26d4d4f57944db11b528d0e10048ae650d8a1
2017-06-28 15:25:04 +00:00
Nathan E. Egge 50f1911192 Add the av1_cost_tokens_from_cdf() function.
Change-Id: I148f8c7045d179c0a1ba7f1fe33b859f66bfc7f3
2017-06-28 15:10:17 +00:00
Frederic Barbier 05b45e6b79 Avoid use of deprecated inv-txfm in encoder
Change-Id: Ie721eaf58d0716e340b9ebdff9fd215cfe0c3c2a
2017-06-28 12:04:56 +00:00
Thomas Davies 894cc81312 NEW_MULTISYMBOL: adapt comp_ref and bwd_ref.
Change-Id: I711cd173af501ba955e889d1e2205125615a99fd
2017-06-28 09:50:17 +00:00
Jingning Han 243b66bc32 Support rectangular tx_type coding in var-tx
Support the transform block kernel coding for rectangular
transform block size in var-tx. This integrates txk-sel with
var-tx.

Change-Id: I9a8edd84812168f56c79b78cc9af34f6304b1d54
2017-06-28 04:07:09 +00:00
Wei-Ting Lin 85a8f70c5c ncobmc_adapt_weight: Add bitstream syntax
Define the syntax and entropy coding templates for
NCOBMC_ADAPT_WEIGHT. The actual values of the default
probabilities and the index tree structure need to
be fine tuned.

In this experiment all mv's in a superblock are sent
first as in the ncobmc case.

Change-Id: I68d50d3d27346c2847ea449a1168c6a99fbb4d3d
2017-06-27 22:30:13 +00:00
Todd Nguyen 302d097096 Add experiment bgsprite
Work in progress to generate ARF with stitched background image.

Change-Id: I2fea75bbe5ac6f713f53eb5825776dadfc1d98c5
2017-06-27 22:28:19 +00:00
Luc Trudeau 8fb4c9e730 [CFL] Sum Alpha Distortion Over Transform Block
This change does not impact the bitstream, it changes how to distortion
is summed when evaluating alpha. The sum is still taken over the entire
partition. However, instead of iterating over the entire surface all at
once, CfL now iterates over each transform block. This is in light of
future work to compute alpha over transform blocks and not prediction
blocks.

Results on Subset1 (compared to 9c6f854 with CfL)

  PSNR | PSNR Cb | PSNR Cr | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
0.0000 |  0.0000 |  0.0000 |   0.0000 | 0.0000 |  0.0000 |     0.0000

Change-Id: Ic7b72201d29ad6b2527748e35b212bec515e3bdb
2017-06-27 22:02:28 +00:00
Sarah Parker 784596d5f1 Remove redundant checks in read/write_tx_type
Change-Id: I348dfd1f8b555306a3fb625bf2303f9cd2649e07
2017-06-27 15:14:24 +00:00
Rupert Swarbrick 580943a78d supertx: Read and write transforms with CDF in decode_partition
Patch b1bedf5f73 converted the three writes in bitstream.c that
specify an extended transform from using av1_write_token (encoded with
probability trees) to aom_write_symbol (encoded with CDFs).

That patch fixed up the two reads in decodemv.c but didn't fix up the
corresponding read in decodeframe.c. This patch does so.

The patch also fixes up a write of a (non-extended) transform when not
CONFIG_EXT_TX and the corresponding read.

Change-Id: Ibf5dcfcf3e7122f08dd0ef8616fb0ecddb95d99a
2017-06-27 10:03:01 +00:00
Jingning Han e3b81bcf6a Rework recursive transform block partition search
Support transform block level kernel selection in the recursive
transform block partitioning search.

Change-Id: I511c39705ee636b0c9fabbe4720fe5a9764b964a
2017-06-27 02:35:37 +00:00
Wei-Ting Lin 1d46d90319 Fix token-encoding errors in NCOBMC
mi[0] is not set properly when encoding all mvs in a
super-block first. After this patch NCOBMC can function
properly.

Change-Id: I149a50184c4823c0d3b82b6b21c7608e639668e6
2017-06-27 00:24:15 +00:00
Yushin Cho 8ab875d63e daala-dist: high bit depth support
Change-Id: Idafef140d3425a9a9f66cb8864a804c4d2a89a70
2017-06-26 21:01:27 +00:00
Yushin Cho 0474912c38 Fix daala-dist for var-tx
The var-tx has its own suite of tx size/type RD search functions,
which recursively split the partition into square tx blocks.

The Daala-dist requires access to 8x8 pixels (both decoded and predicted)
since it measures the distortion for multiple of a 8x8 pixels.
Thus, if tx block is smaller than 8x8, it waits until all of sub8x8 blocks
are RD searched (with MSE) then replaces the MSE of 8x8 pixels with
daala-dist's calculated distortion for 8x8 pixels.

It is also applied to luma pixels only.

Change-Id: Ic4891e89b4ef05cf880aa26781d2d06ccf3142de
2017-06-26 19:54:41 +00:00
Lester Lu ad8290b8e6 New experiment: LGT
In previous ADSTs, DST-7 and DST-4 are used for length 4 and length
8/16/32, respectively. In this LGT experiment we explore transforms
between DST-4 and DST-7. When CONFIG_LGT flag is on, adst4 and adst8
are replaced by lgt4 and lgt8, the intermediate transforms with
pre-chosen parameters.

The LGTs applied here are lgt4_160 and lgt8_170, where the numbers
mean the self-loop weights times 100. The associated values for DST-7
and DST-4 are 100 and 200.

ovr_psnr:
lowres: -0.140
midres: -0.131
hdres: -0.078

These changes are not applied to the highbd scenario in the
current version.

Change-Id: I20600456da8766528b2b6b11aa28801e70af498e
2017-06-26 19:11:25 +00:00
Yushin Cho 75b0100431 Fix daala_dist to handle visible pixels only
- If invisible pixels, av1_daala_dist() simply use source pixles for dst.
- Added av1_daala_dist_diff() which inputs diff signal instead of dst.

- Refactored daala_dist code so that av1_daala_dist() and _diff()
is called inside av1's distortion calculation functions, pixel_sse() and
sum_squares_visible().

Change-Id: Id857db52fe19856d92c46a9e84ac2962c01ae045
2017-06-26 18:47:20 +00:00
Debargha Mukherjee 6f0566e7b1 Fix some valgrind errors in loop-restoration
BUG=aomedia:623

Change-Id: I158072895adb8a9f5f177b8146f3beec265d7406
2017-06-24 00:36:50 -07:00
Sebastien Alaiwan d94476d1c4 Fix double definition of 'pd'
This reverts commit a3d70911c3.

Reason for revert: this was fixed in cb63767 which moved the
definition to a different line causing this change to merge
cleanly, resulting in a duplicate.

Change-Id: I2d8763f0e2af320f043a1417ba33e82f82163592
2017-06-24 06:37:22 +00:00
James Zern d937cdbef9 x86: _mm_set_epi64x -> _mm_{set_epi32,cvtsi32_si128}
_mm_set_epi64x is incompatible with visual studio x86 configurations

Change-Id: I7986e43d0471699553affeefabae66a512d9d46a
2017-06-24 02:37:59 +00:00
James Zern 88896734ea {decodeframe,rdopt}.c: fix asserts with strings
lead with '0 &&' to avoid string to bool conversion warnings

BUG=aomedia:621

Change-Id: I2cd6618377f9ed94f4d9dbc252f6f5cfc81efea4
2017-06-24 00:48:17 +00:00
Angie Chiang bd99b38c7f Pass mbmi into get_scan()
This is to facilitate future experiment related to adapt_scan

Change-Id: I51628f3df81bd82db7f8f553d13da0ee5792d7d9
2017-06-24 00:38:20 +00:00
Yushin Cho a3d70911c3 Fix compile warning
Fixed the compile warning when both global-motion
and warped-motion are disabled.

Change-Id: Ie3ac036fc6c0a15e54a56427452682d7ea7864db
2017-06-24 00:30:30 +00:00
Debargha Mukherjee afe7c5fd7a Do not find transformation for very few points
Adds check to make sure that find transformation functions are
never called for 0 points.

Change-Id: I2d7cf40aace535b1d708d6189aea9c1e0f7c281b
2017-06-23 13:36:49 +00:00
Yaowu Xu 9180b6e89e Prevent divide-by-zero
Change-Id: Id22615d461bf16272d1b2e2c72ae7e00db8bcb5c
2017-06-22 19:55:13 +00:00
Yaowu Xu bdda9d4e9d convert to int before apply sign
avoids overflow of unsigned integer.

Change-Id: Ic92974b508bb0cd6fc680203ffa6cff14d644ff7
2017-06-22 19:53:31 +00:00
Jingning Han cb63767421 Fix compiler warning in joint_motion_search
Avoid compiler warning when global-motion is off.

Change-Id: Ie6a0d3e4efc0e06b263e8c8c0c0dc153738c3804
2017-06-22 19:07:52 +00:00
Zoe Liu a56f916e85 Add entropy stats dump out for individual frame context type
Change-Id: Id0cd184e8b3cea085ecc3adbc7fea7bb765c7986
2017-06-22 16:22:00 +00:00
Yi Luo 193422e76f Add avx2 highbd_quantize_b
- First pass encoding time reduces ~10.9% on i7-6700
  at 100 frames, 1080p.
- avx2 works for coeff number >= 8 cases; coeff number < 8
  case will be implemented by sse2.
- Unit test is added type B/FP/DC.

Change-Id: Ibe5b7807c64e6dfc2d59c470ed50a6e8ca94ef7c
2017-06-22 15:52:01 +00:00
Yushin Cho 04eb9594f1 Fix daala-dist, rd tx search
Previously, for block >=8x8, and tx < 8x8,
we skipped setting the early-exit flag in block_rd_txfm() because
distortion for sub8x8 tx block is from MSE but reference (best)
is from daala-dist.
However, not setting early-exit flag turned out to be the reason
for a regression in MSE probe mode of daala-dist because
it loses the chance to set rd_stats properly.

On the other hand, there is still a small regression, say 0.05% psnr bd-rate,
which seems to occur in the case that a tx block in a partition has chosen
the skipped rd_cost since it is smaller than non-skip rd_cost and
set the early-exit flag to 0 (so, not exit), but the daala-dist applied
to the whole partition cannot access the same info but can choose from
two kinds of rd_costs:
1) all tx blocks are skipped (even if a tx block has non-zero coeff) and 0 bits
2) sum of final distortion of all tx blocks (i.e. non-zero coeff decoded)
and bits to encode coeffs.

Change-Id: I2ec69972aa1f22d465293cb9e8d5e18ef2c6f7f3
2017-06-22 15:16:10 +00:00
Yaowu Xu a0cc9aa816 Add missing accumulation cross threads
BUG=aomedia:618

Change-Id: Ie96ccc363462a28527c99a72e97b7acaf2ab0ff8
2017-06-22 01:50:28 +00:00
Debargha Mukherjee 84f567c725 Add chorma tilesize option in loop-restoration
Adds an option bit in the bitstream syntax to allow chroma to
have restoration tilesize that is coupled to luma based on
subsmapling of the color components.

This is meant to ease encoder hardware implementation.

Change-Id: Ic3cc2b68c0f33701ed3ff2fe19cf57cd864da67f
2017-06-21 22:37:18 +00:00
Timothy B. Terriberry 81ec2619f8 cb4x4: Move sub-4X4 block sizes behind chroma flags.
cb4x4 itself should not require these sizes.

This simplifies compatibility with other experiments, since we can
first make them work with cb4x4 (which is now on by default), and
then worry about chroma_sub8x8 and chroma_2x2 (which is not) in
separate steps.

Encoder and decoder output should remain unchanged.

Change-Id: Iff2a5494cab3b7d96f881e8bd9cd4bf18c817cfa
2017-06-21 21:31:26 +00:00
Timothy B. Terriberry 4a81001bb7 ext_inter: Skip compound type probs. for small block sizes.
When writing the compressed header, prob_diff_update() was called
for compound_type_prob[] for every defined block size, even though
luma never uses block sizes smaller than 4x4.

This fixes is_any_masked_compound_used() and
is_interinter_compound_used() to properly return 0 for chroma-only
block sizes, and then uses these functions to guard the probability
updates in write_compressed_header() and read_compressed_header(),
the same way the actual compound type values are guarded in
read_inter_block_mode_info() and pack_inter_mode_mvs().

Change-Id: Ib521cf53f9ec166ef634609c8b47c5814b6a9ff5
2017-06-21 19:57:35 +00:00
Fergus Simpson 2b4ea11a8d Use last_show_frame in use_prev_frame_mvs calc
Without tempmv-signaling configured, using the previous frame's MVs
requires that the last frame was a show frame. With tempmv-signaling
configured, cm->show_last_frame is not checked when calculating
use_prev_frame_mvs. This patch adds that check and resolves mismatches
seen with random resizing and random superres.

Includes a couple fixes too - cm's last_width, last_height, and
last_show_frame were updated under different conditions. Now they're all
updated at the same time.

Change-Id: Ibdfb196cb6e9d002fd57cb4df10a899b60faac00
2017-06-21 16:49:36 +00:00
Yunqing Wang 562a39370c Declare rate_mv_bmc in warped motion
A motion refining was added in warped motion, which required the
declaration of rate_mv_bmc in warped motion.

BUG=aomedia:613

Change-Id: I74dfc396f915a5cc4599bfbdccad758fa630505f
2017-06-20 21:27:42 +00:00
Yi Luo 6faf349a3a Add high bit depth fast path quantizer avx2
- User level encoder timer reduction ~4.3% with
  following testing: 1080p, 10-bit, 4Mbps, 4 frames,
  profile=2, i7-6700.

Change-Id: Ib4a579d10cbd705cb7b1c4f0d619159a76bb34d7
2017-06-20 21:03:29 +00:00
David Michael Barr 23198661a6 [CFL] drop skip logic, always write alpha
Results on Subset 1 (Compared to a0f8c145 with CfL)

  PSNR | PSNR Cb | PSNR Cr | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
0.0677 | -0.3359 | -0.2115 |   0.0529 | 0.0735 |  0.0495 |    -0.0907

Change-Id: Ib61ff862e8cfbdf0c693a4eba5f2712a6e9ab819
Signed-off-by: David Michael Barr <b@rr-dav.id.au>
2017-06-20 16:40:44 +00:00
Luc Trudeau 14fc50452d [CFL] RDO Loop Rework
CfL performs an extra loop iteration during luma mode selection. Recent
changes have broken the extra iteration. Remove previous approach.

New approach adds the extra iteration right before uv parameter
selection. Interesting fact, If the best luma intra mode already has
worse RD performance than the best inter mode found so far (if any),
then the entire chroma intra search is skipped, including the extra 
iteration.

Results on Subset1 (compared to 3e18e4a with CfL)

   PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
-0.3090 | -2.7271 | -2.3521 |  -0.3369 | -0.3463 | -0.3525 |    -1.1868

Change-Id: If67b0badd2c8ea25c61685483d39d622c1729b18
2017-06-20 01:39:16 +00:00
Joe Young 830d4ce495 [intra-edge] Convert 4x4 VP9 to ext-intra; upsample edge samples
Updates to intra-edge experiment

- Convert VP9-style intra pred to Ext-intra style
- Upsample edge predictors by 2x based on angle and edge size

BD-rate, 1-kf AWCY
  360p:  -0.11%
  720p:  -0.54
  1080p: -0.96

Change-Id: Ib73805d31d5d286e607a7ee7470fcbdf11edbbff
2017-06-19 22:01:28 +00:00
Luc Trudeau 3e18e4aeaa [CFL] Compute Luma Average Over Partition Unit
Extract the compution of the luma reconstructed average out of cfl_load
and into cfl_compute_average. The reconstructed luma average is stored
in the CFL_CONTEXT to avoid computing it for each transform block and
for each plane.

Results on subset1 (compared to 803bea2 with CfL)
   PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
-0.0474 | -0.1486 | -0.2931 |  -0.0358 | -0.0397 | -0.0127 |    -0.1162

Change-Id: I9e34af0fe5961ce8dbe70cb80aea2a16221d0d92
2017-06-19 20:10:14 +00:00
Timothy B. Terriberry 5d24b6f049 encoder: Remove 64x upsampled reference buffers
They do not handle border extension correctly (interpolation and
border extension do not commute unless you upsample into the
border), nor do they handle crop dimensions that are not a multiple
of 8 (the upsampled version is not sufficiently large), in addition
to using massive amounts of memory and being a criminal waste of
cache (1 byte used for every 8 bytes fetched).

This commit reimplements use_upsampled_references by computing the
subpixel samples on the fly. This implementation not only corrects
the border handling, but is also faster, while maintaining the
same quality.

HL AWCY results are basically noise:
    PSNR | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
  0.0188 |   0.0187 | 0.0045 |  0.0063 |     0.0228

Change-Id: I7527db9f83b87a7bb8b35342f7e6457cd0bef9cd
2017-06-19 18:50:57 +00:00
Debargha Mukherjee 887069f3cd Fix a bug for non 420 formats and some refactoring
BUG=aomedia:607

Change-Id: I5a5fb893f0237e7ca6e0d807e825f8d4e26949b2
2017-06-19 17:32:11 +00:00
Zoe Liu c082bbcbad Add new coding tool of ext-comp-refs
The tool of ext-comp-refs adds the uni-directional compound reference
prediction. In details, 3 pairs of uni-direcitonal compound references
are added for the comp ref prediction:
(LAST_FRAME, LAST2_FRAME),
(LAST_FRAME, GOLDEN_FRAME), and
(BWDREF_FRAME, ALTREF_FRAME).

This new tool of ext-comp-refs will eventually overwrite
one-side-compound and have the two coding tools to merge to one.

It achieves -0.35 ~ -0.55% coding gains in BDRate, compared against
AV1 baseline with the default experiments on, but without
one-sided-compound. It achieves -0.2% ~ -0.3% coding gains when
one-sided-compound is on. It achieves larger gains on higher
resolution.

Change-Id: Icbdb16e97b96aaebaf2213f5f72d5331e2e358eb
2017-06-19 16:38:00 +00:00
Zoe Liu 0c634c704b Unify the checking on compound mode prediction
Change-Id: Id9c025febf21aeb67cbc719f585661b715bdb9ce
2017-06-19 16:30:35 +00:00
Sarah Parker 345366acce Add macro to disable trellis optimization in rdopt
Turning off the trellis optimization gives a performance
drop of 0.726% on the lowres set.

Change-Id: I4fdd1e20fb6f671162cd32b3abe699cd2aee1919
2017-06-19 14:39:24 +00:00
Di Chen 5658662222 Support two scanning passes for rd_pick_partition.
Reset xd->mi and x->mbmi_ext for the superblock after the first
scanning pass.

Change-Id: Iae9142ff2b1a2b576f54dc545b58fe37c97cecac
2017-06-17 15:10:26 -07:00
Timothy B. Terriberry d62e2a3a11 var_tx: Remove custom distortion calculations.
Although this does not fully convert var-tx to using
av1_block_dist(), it does make it use the same distortion functions
av1_block_dist() uses: pixel_sse() and sum_squares_visible().

Change-Id: I1173bc6941a3b895381b9fcb73b533b5afc31aab
2017-06-17 18:35:37 +00:00
Fergus Simpson b0157aa638 frame_superres: Use calculate_scaled_size for scale check
Small change to calculate the encode size for scale checking using the
av1_calculate_scaled_size function used elsewhere instead of calculating
it in place. Done for constistency's sake.

Change-Id: I72626b729477e28e868cf9028ea4537267a12413
2017-06-16 22:15:08 +00:00