Граф коммитов

535 Коммитов

Автор SHA1 Сообщение Дата
Yaowu Xu ed04e82a04 Merge branch 'master' into nextgenv2
Conflicts:
	vp10/common/scan.c
	vp9/common/vp9_pred_common.c
	vp9/decoder/vp9_decoder.c

Change-Id: Id559d98ea676da15d60ed464ddb6c48d3eed1111
2016-04-18 15:15:05 -07:00
Angie Chiang caf066f845 Merge changes I67543d36,I763f2924 into nextgenv2
* changes:
  Reduce shift in txfm8x8
  Let txfm's constant bit be the same for each stage
2016-04-18 19:40:33 +00:00
Angie Chiang d72560e10d Merge "Fit adst/dct's stage range into 32-bit in bd12" into nextgenv2 2016-04-18 18:40:28 +00:00
Yue Chen 16a99e967c Merge "Optimization for EXT_INTER + OBMC combination" into nextgenv2 2016-04-17 18:54:33 +00:00
Yue Chen 321794c4d5 Optimization for EXT_INTER + OBMC combination
In the rd loop, check the perf of obmc, whose mv is copied from regular
inter predictor, when wedge interinter is better than regular inter
(previously it will force allow_obmc = 0). The condition of the early
termination before this step is relaxed to avoid skipping too many obmc
predictions. The rates of the overhead are properly calculated for these tools.

The logic of the bitstream syntax:
(a single ref) the interintra flag is sent first, only if it is 0, we
send the obmc flag;
(compound refs) the obmc flag is sent first, only if it is 0, we send
the wedge interinter flag

Coding gain
lowres: 0.428% (2.287%->2.715%)

Change-Id: I5f3a34640b398e313cbf84235c9fe2073eb2173f
2016-04-15 17:03:20 -07:00
Jingning Han 4d503d1043 Remove duplicated TxfmFunc declarations
Change-Id: If3876610a1fbce0988cc21ea917596bbb467df93
2016-04-15 12:03:21 -07:00
Angie Chiang 0a715add2e Reduce shift in txfm8x8
Change-Id: I67543d365cbef3c3e113f01660ae8cb744cc556d
2016-04-14 19:12:22 -07:00
Angie Chiang dfa532cc2a Let txfm's constant bit be the same for each stage
Change-Id: I763f2924afca526db371231bca18b38879bdf793
2016-04-14 15:46:54 -07:00
Angie Chiang 02d23fbbf4 Fit adst/dct's stage range into 32-bit in bd12
Change-Id: Ie428c6f0655873de3e77e844a2f2e4203cf47dff
2016-04-14 15:44:05 -07:00
Jingning Han 525995a3d9 Apply motion vector precision check to candidate mv
This avoids repeatedly checking the candidate motion vector
precision level at the decoder end. The compression performance
varies at 0.01% level.

Change-Id: I4a88e95decd900d0cac9a0c2e70ba43ef7ecac38
2016-04-14 09:44:41 -07:00
Hui Su 436a6cc4e7 Merge "ext-tx: use raster scan order for identity transform" into nextgenv2 2016-04-13 23:52:35 +00:00
Angie Chiang 716f0ea3cf Merge changes I92819356,I50b5a313,I807e60c6,I8a8df9fd into nextgenv2
* changes:
  Branch dct to new implementation for bd12
  Change dct32x32's range
  Fit dct's stage range into 32-bit when bitdepth is 12
  Pass tx_type into get_tx_scale
2016-04-13 23:24:41 +00:00
hui su b72aa72a90 ext-tx: use raster scan order for identity transform
coding gain of ext-tx:
screen_content 12.73% -> 13.05%

Change-Id: I5fc8cf0db84c3e56dd3cb7675e1d81c9c575bc57
2016-04-13 09:42:43 -07:00
Geza Lore c50aaf3049 Make ext-refs respect encoding flags.
The VP8_EFLAG_NO_UPD_LAST and VP8_EFLAG_NO_REF_LAST flags can be
passed to the encoder to signal that it should not update/reference
the LAST ref frame when encoding the current frame. With
--enable-ext-refs turned on, the new LAST2 LAST3 and LAST4 ref frames
could still be used or updated, which causes the
  VP10/ErrorResilienceTestLarge.DropFramesWithoutRecovery/{0,1,2}
tests to fail.

With this patch, if --enable-ext-refs is used, then
VP8_EFLAG_NO_UPD_LAST and VP8_EFLAG_NO_REF_LAST also applies to the
new LAST2 LAST3 and LAST4 ref frames, as well as the LAST ref frame.

Change-Id: If482b1c09bbaf914eca8e0348a2367bff261661d
2016-04-13 12:03:58 +01:00
Angie Chiang 027d12b7d6 Merge changes I359aa49c,Ic8ca5afb into nextgenv2
* changes:
  Generalize txfm scale in highbd quantizer
  Parameterize transform scale for quantizer
2016-04-12 18:02:05 +00:00
Debargha Mukherjee 648538959d Merge "Use reduced transform set for 16x16" into nextgenv2 2016-04-11 23:32:29 +00:00
Debargha Mukherjee c4da5d500e Use reduced transform set for 16x16
Speed increase for ext-tx by 20% for a BDRATE drop of 0.26%.
The ext-tx expt becomes -2.66% BDRATE (reduced from -2.92%) for
the lowres set.

It turns out that reducing the set of transforms for intra from
12 to 5 makes very little difference in coding performance (~0.04%).
Most of the performance drop comes from the reduction is transform
set for inter. Currently there is a provision to control that with
a macro.

Change-Id: I7de05527bf72f96acc1e0ab8a74a849da0a141e5
2016-04-11 13:04:41 -07:00
Debargha Mukherjee 9930a00ed7 Merge "Refactor PC_TREE root handling." into nextgenv2 2016-04-09 13:33:53 +00:00
hui su f94d699c09 Changes to scan order neighbors
-Fix some bugs in row_scan and col_scan. In some cases, the above
or left neighbor was not considered even though it is available.

-When above or left neighbor is not available, try using the
top-left, top-right or bottom-left neighbor.

Compression improvement:
lowres   0.20%
midres   0.16%
hdres    0.20%

Change-Id: If521665589c7f29277b8e9223f21f4a8bf3fef39
2016-04-08 11:08:57 -07:00
hui su b76118b736 Reformat scan order neighbors
Change-Id: Iafcd080612012b08f3cbff45335c12f434543f38
2016-04-08 10:50:13 -07:00
Geza Lore f2be4f6058 Refactor PC_TREE root handling.
Change-Id: Id8b16c1b18bd6f909e72aae3fd582dd3503c88c6
2016-04-08 17:01:00 +01:00
hui su 69c7ad3407 Correct comments for scan order neighbors
Change-Id: I5e2dc39bf0ee8e501e4dd358be2e92ae50934593
2016-04-07 11:07:21 -07:00
Geza Lore 454989ff32 Make superblock size variable at the frame level.
The uncompressed frame header contains a bit to signal whether the
frame is encoded using 64x64 or 128x128 superblocks. This can vary
between any 2 frames.

vpxenc gained the --sb-size={64,128,dynamic} option, which allows the
configuration of the superblock size used (default is dynamic). 64/128
will force the encoder to always use the specified superblock size.
Dynamic would enable the encoder to choose the sb size for each
frame, but this is not implemented yet (dynamic does the same as 128
for now).

Constraints on tile sizes depend on the superblock size, the following
is a summary of the current bitstream syntax and semantics:

If both --enable-ext-tile is OFF and --enable-ext-partition is OFF:
     The tile coding in this case is the same as VP9. In particular,
     tiles have a minimum width of 256 pixels and a maximum width of
     4096 pixels. The tile width must be multiples of 64 pixels
     (except for the rightmost tile column). There can be a maximum
     of 64 tile columns and 4 tile rows.

If --enable-ext-tile is OFF and --enable-ext-partition is ON:
     Same constraints as above, except that tile width must be
     multiples of 128 pixels (except for the rightmost tile column).

There is no change in the bitstream syntax used for coding the tile
configuration if --enable-ext-tile is OFF.

If --enable-ext-tile is ON and --enable-ext-partition is ON:
     This is the new large scale tile coding configuration. The
     minimum/maximum tile width and height are 64/4096 pixels. Tile
     width and height must be multiples of 64 pixels. The uncompressed
     header contains two 6 bit fields that hold the tile width/heigh
     in units of 64 pixels. The maximum number of tile rows/columns
     is only limited by the maximum frame size of 65536x65536 pixels
     that can be coded in the bitstream. This yields a maximum of
     1024x1024 tile rows and columns (of 64x64 tiles in a 65536x65536
     frame).

If both --enable-ext-tile is ON and --enable-ext-partition is ON:
     Same applies as above, except that in the bitstream the 2 fields
     containing the tile width/height are in units of the superblock
     size, and the superblock size itself is also coded in the bitstream.
     If the uncompressed header signals the use of 64x64 superblocks,
     then the tile width/height fields are 6 bits wide and are in units
     of 64 pixels. If the uncompressed header signals the use of 128x128
     superblocks, then the tile width/height fields are 5 bits wide and
     are in units of 128 pixels.

The above is a summary of the bitstream. The user interface to vpxenc
(and the equivalent encoder API) behaves a follows:

If --enable-ext-tile is OFF:
     No change in the user interface. --tile-columns and --tile-rows
     specify the base 2 logarithm of the desired number of tile columns
     and tile rows. The actual number of tile rows and tile columns,
     and the particular tile width and tile height are computed by the
     codec ensuring all of the above constraints are respected.

If --enable-ext-tile is ON, but --enable-ext-partition is OFF:
     No change in the user interface. --tile-columns and --tile-rows
     specify the WIDTH and HEIGHT of the tiles in unit of 64 pixels.
     The valid values are in the range [1, 64] (which corresponds to
     [64, 4096] pixels in increments of 64.

If both --enable-ext-tile is ON and --enable-ext-partition is ON:
     If --sb-size=64 (default):
         The user interface is the same as in the previous point.
         --tile-columns and --tile-rows specify tile WIDTH and HEIGHT,
         in units of 64 pixels, in the range [1, 64] (which corresponds
         to [64, 4096] pixels in increments of 64).
     If --sb-size=128 or --sb-size=dynamic:
         --tile-columns and --tile-rows specify tile WIDTH and HEIGHT,
         in units of 128 pixels in the range [1, 32] (which corresponds
         to [128, 4096] pixels in increments of 128).

Change-Id: Idc9beee1ad12ff1634e83671985d14c680f9179a
2016-04-07 10:34:25 +01:00
Debargha Mukherjee de3d15bb2c Merge "Refactoring and cosmetic changes to ext-inter expt" into nextgenv2 2016-04-06 01:19:06 +00:00
Debargha Mukherjee 0fc82ea1cf Refactoring and cosmetic changes to ext-inter expt
Change-Id: Icd457480744b7734b3c412c9fed43be738373334
2016-04-05 15:16:18 -07:00
Angie Chiang ff8c490b9a Branch dct to new implementation for bd12
Change-Id: I9281935653aacce22ac3100f79fb956c249e2bf3
2016-04-04 12:40:10 -07:00
Angie Chiang f1060f5bc4 Change dct32x32's range
Bitdepth 10/12:
Fit coefficient range into 32 bits
Fit codfficient * const range into 32 bits

Bitdepth 8:
Fit coefficient range into 16 bits
Fit codfficient * constant range into 32 bits

Change-Id: I50b5a3132e8a9f5155c971ab0f6eb52876d2b5ca
2016-04-04 11:21:11 -07:00
Angie Chiang 39b3c025fa Fit dct's stage range into 32-bit when bitdepth is 12
Change-Id: I807e60c6dcacc50c087adcbdb1df022f8541efc5
2016-04-04 11:13:44 -07:00
Geza Lore f0290cd127 Refactor get_partition to be universal.
Change-Id: I3a2fe4073bb94c5afc24d9274e6edcdb3aed934f
2016-04-04 15:22:25 +01:00
Geza Lore e0dbfdeedc Minor refactoring of partition type processing.
Change-Id: Idcb1e94298d4b7d8832d285548ec2d2ced4b2988
2016-04-04 14:51:10 +01:00
Debargha Mukherjee 2fba8189de Merge "Loopfilter fix" into nextgenv2 2016-04-01 17:48:09 +00:00
Angie Chiang 9f879b3c5f Merge "change vp10_fwd_txfm2d_#x#_sse2 to vp10_fwd_txfm2d_#x#_sse4_1" into nextgenv2 2016-04-01 17:25:23 +00:00
Angie Chiang 2c2b9bd455 Merge "Remove redundant code from vp10_fwd_txfm2d.c" into nextgenv2 2016-04-01 17:25:13 +00:00
Angie Chiang 1b755039c6 Merge "Simplify rounding in vp10_[fwd/inv]_txfm[1/2]d_#x#" into nextgenv2 2016-04-01 17:24:50 +00:00
Angie Chiang 0a9eedfbef Merge "Add vp10_fwd_txfm2d_sse2" into nextgenv2 2016-04-01 17:24:34 +00:00
Debargha Mukherjee f7457f5e89 Loopfilter fix
Fixes mismatch introduced in
https://chromium-review.googlesource.com/#/c/336645

Change-Id: I15cded221c18dbf87b5029bc464e975d5c7c40e3
2016-03-31 19:57:42 -07:00
Yaowu Xu a416d5bd2d Fix a build issue
Change-Id: Ifdb32c487632098496bf59fcc76c518f8f0426d2
2016-03-31 16:06:24 -07:00
Debargha Mukherjee 2a6389bb8b Merge "Fix interpolation values and decouple interintra" into nextgenv2 2016-03-31 21:47:10 +00:00
Debargha Mukherjee 2be211e971 Fix interpolation values and decouple interintra
Decouples interintra modes and probability models from regular
intra modes, to enable creating/optimizing new interintra modes.
Also, fixes interpolation values for 128x128 interintra and obmc.

Change-Id: I5c2016db49b8f029164e5fe84c6274d4e02ff90e
2016-03-31 12:12:51 -07:00
Geza Lore 10232eda8e Refactor loopfilter level arrays to 2D.
Change-Id: Id20526d0b6d1371dc9f45cb8b5f24b6974da7bc4
2016-03-31 15:52:12 +01:00
Geza Lore 511da8cbe5 Rename MI_BLOCK_SIZE and MI_MASK macros.
Rename MI_BLOCK_SIZE.* -> MAX_MIB_SIZE.* (MIB is for MI Block).
Rename MI_MASK.* -> MAX_MIB_MASK.*

There are no functional changes.

This is in preparation for coding the superblock size at the frame
level, which will require some of these constants to become variables.
The new names better reflect future semantics, and hence make the code
clearer.

Change-Id: Iee08d97554cf4cc16a5dc166a3ffd1ab91529992
2016-03-31 09:57:41 +01:00
Hui Su cce6688c31 Merge "Set block size upper bound for Palette mode" into nextgenv2 2016-03-31 00:23:11 +00:00
Angie Chiang c7c40d2329 Generalize txfm scale in highbd quantizer
Change-Id: I359aa49c09b244e0d44ebd09442e365a3d22556c
2016-03-30 15:25:26 -07:00
Angie Chiang 25520d8dc3 change vp10_fwd_txfm2d_#x#_sse2 to vp10_fwd_txfm2d_#x#_sse4_1
The speed performance for running 20k times  is as follows

Notice that the vp10_highbd_fdct#x#_sse2 version is
16-bit version plus range check

The rest are 32-bit version

vp10_fwd_txfm2d_4x4_c (2 ms)
vp10_fwd_txfm2d_8x8_c (9 ms)
vp10_fwd_txfm2d_16x16_c (45 ms)
vp10_fwd_txfm2d_32x32_c (233 ms)

vp10_fwd_txfm2d_4x4_sse4_1 (2 ms)
vp10_fwd_txfm2d_8x8_sse4_1 (3 ms)
vp10_fwd_txfm2d_16x16_sse4_1 (16 ms)
vp10_fwd_txfm2d_32x32_sse4_1 (80 ms)

vp10_highbd_fdct4x4_c (1 ms)
vp10_highbd_fdct8x8_c (3 ms)
vp10_highbd_fdct16x16_c (17 ms)
highbd_fdct32x32_c (160 ms)

vp10_highbd_fdct4x4_sse2 (0 ms)
vp10_highbd_fdct8x8_sse2 (2 ms)
vp10_highbd_fdct16x16_sse2 (8 ms)
highbd_fdct32x32_sse2 (105 ms)

Change-Id: I24daf1e0d4d66e91e4ce61ef71cefa7b70ee90ce
2016-03-30 15:25:26 -07:00
Angie Chiang c75f64780b Remove redundant code from vp10_fwd_txfm2d.c
Change-Id: I87ae5e93957616c0f5160a4f679e42f77092c33f
2016-03-30 15:25:26 -07:00
Angie Chiang f2b311f580 Simplify rounding in vp10_[fwd/inv]_txfm[1/2]d_#x#
Change-Id: I24ce46e157dc5b9c0d75000a1a48e9c136ed4ee1
2016-03-30 15:25:26 -07:00
Angie Chiang 11d2bb5429 Add vp10_fwd_txfm2d_sse2
Change-Id: Idfbe3c7f5a7eb799c03968171006f21bf3d96091
2016-03-30 15:25:26 -07:00
Angie Chiang 64413a6ca7 Parameterize transform scale for quantizer
This is to facilitate changing transform scale later

Change-Id: Ic8ca5afba57d2489ebd191ccc40c1b31605a0d8c
2016-03-30 15:25:26 -07:00
hui su cbb8be769d Set block size upper bound for Palette mode
Avoid buffer overflow in case of such new experiments as
128 x 128 superblock size.

Change-Id: Ib775f3925a85fc87227c0ddd9b6a6110a12ef196
2016-03-30 14:39:44 -07:00
Debargha Mukherjee 8d3a4aa891 Some fixes/speed-ups on inter-intra part of ext-inter
Fixes an issue with rectangular inter-intra blocks.
Includes various other refactoring and cleanups to enable fast mixing
of inter and intra predictors.
Uses only the best single inter reference so far for the inter-intra
search.

About 30% speed-up with a 0.1% hit in performance.

This is part one of overhauling on the ext-inter experiment. To be
continued in subsequent patches.

Change-Id: Id10ee100c78c6e00009a3a4f930a4435ef403a95
2016-03-30 14:39:29 -07:00
Debargha Mukherjee 91707ac79e Merge "Extend superblock size fo 128x128 pixels." into nextgenv2 2016-03-30 20:55:32 +00:00
Geza Lore 552d5cd715 Extend superblock size fo 128x128 pixels.
If --enable-ext-partition is used at build time, the superblock size
(sometimes also referred to as coding unit (CU) size) is extended to
128x128 pixels.

Change-Id: Ie09cec6b7e8d765b7555ff5d80974aab60803f3a
2016-03-30 18:23:06 +01:00
Debargha Mukherjee e467627f33 Merge "Fix for ext_interp experiment" into nextgenv2 2016-03-30 14:44:39 +00:00
Yaowu Xu 37241e6f95 Merge "Merge branch 'masterbase' into nextgenv2" into nextgenv2 2016-03-29 16:05:53 +00:00
Julia Robson 068e799459 Fix for ext_interp experiment
Amends previous commit to also handle subsampling correctly.
Change ID of prev commit: I6b07e6cf9b287ba4b5bd6599af4a7412e50b3bdc

Was causing occassional failures for 422 streams due to accessing
elements beyond the extent of the bmi array.

Change-Id: I37ebabf4c01ca84bcd1851428172bdf753805d98
2016-03-29 16:09:49 +01:00
Yaowu Xu c810740c36 Merge branch 'masterbase' into nextgenv2
Conflicts:
	vp9/encoder/vp9_encoder.c
	vpx_dsp/x86/convolve.h

Change-Id: I60c3532936bedd796a75dfe78245a95ec21e2e55
2016-03-28 17:44:28 -07:00
Angie Chiang 4144a11552 Merge "Use vp10_[fwd/inv]_txfm2d_add_32x32 for bd 10" into nextgenv2 2016-03-28 19:20:48 +00:00
Hui Su 14f2d03b4b Merge "Fix assertion fail in build_intra_predictors" into nextgenv2 2016-03-28 18:14:47 +00:00
Angie Chiang 33833aefdd Merge "Use vp10_[fwd/inv]_txfm2d_add_#x# for bd 10" into nextgenv2 2016-03-28 18:11:47 +00:00
Angie Chiang 46b234478f Use vp10_[fwd/inv]_txfm2d_add_32x32 for bd 10
Change-Id: I996c48a90d7d71b52594a91a35cb8712c7fc212e
2016-03-28 11:08:40 -07:00
Alex Converse 72e29c3a73 Merge changes I3c72a2d8,I9905f3a8 into nextgenv2
* changes:
  Add pluggable bitwriters.
  Add pluggable bitreaders.
2016-03-28 16:59:18 +00:00
hui su f24b91c9e1 Fix assertion fail in build_intra_predictors
Change-Id: Id6683b9593b52aa0d159f8f013782d9e0bd07206
2016-03-28 09:37:54 -07:00
Alex Converse efd566ff93 Add pluggable bitreaders.
This will make the code change for a pure ANS experiment manageable.

Change-Id: I9905f3a89f492a4346860463a72fa8c52aac4c8e
2016-03-25 11:02:41 -07:00
Yunqing Wang bdcc14051b Recover tile coding performance
After porting tile coding from VP9 to VP10, some performance
degradation was seen because of the difference between VP9 and
Vp10 baseline. This patch disabled some features in VP10 while
tile coding is turned on. Also, an encoder control API was added
back for this use case.

Change-Id: I8f736db8388408a8cc35320a2f80abb02906571c
2016-03-25 09:05:25 -07:00
Geza Lore 490ba1ad25 Port large scale tile coding features from nextgen.
If configured with --enable-ext-tile, the codec uses an alternative
tile coding syntax in the bitstream. Changes include::
 - The maximum number of tile rows and columns is extended to 1024
   each.
 - The minimum tile width/height is 64 pixels (1 superblock).
 - A tile copy mode is added where a tile directly reuse the coded
   data of a previous tile
 - The meaning of the tile-columns and tile-rows codec parameters are
   overloaded to mean tile-width and tile-height in units of 64
   pixels.
 - All tiles should now be independent, including rows within the
   same columns, so large scale parallel, or independent decoding is
   possible.
 - vpxdec also gained the options to decode only a particular tile,
   tile row, or tile column.

Changes without --enable-ext-tile:
 - All tiles should now be independent, including rows within the
   same columns, so large scale parallel, or independent decoding is
   possible.
 - vpxenc default tile configuration changed to use 1 tile column.

Change-Id: I0cd08ad550967ac18622dae5e98ad23d581cb33e
2016-03-24 09:26:05 +00:00
Jingning Han 1fcb5fc755 Refactor motion vector residual coding process
This commit separates the predicted motion vector from the nearestmv
motion vector in the coding process for both regular and sub8x8
block sizes.

Change-Id: I703490513b0194e6669ebf719352db015facb3e1
2016-03-23 12:10:38 -07:00
Angie Chiang d9a0cbb1b7 Use vp10_[fwd/inv]_txfm2d_add_#x# for bd 10
Change-Id: Ie35bdbd7aafae693e3106d7ccbbdd8e65ee8800c
2016-03-23 12:05:12 -07:00
Yi Luo deb33056d1 Merge "Highbd fht4x4 SSE4.1 optimization for DCT_DCT mode - Setup function vp10_highbd_fht4x4_sse4_1 for highbd SSE4.1 intrinsics optimization. - Wrote SSE4.1 functions: load_buffer_4x4(), write_buffer_4x4(), and fdct4x4_sse4_1(). - Used logic right shift to avoid coeff memory write/read. - Turned on vp10_highbd_fht4x4_sse4_1 for DCT_DCT mode only. - Improved overall encoding performance >2.3% for 50 frames sequence, park_joy_1080p_12.y4m, in which, --input-bit-depth=12, --bit-depth=12, 50 frames. - Unit test passed." into nextgenv2 2016-03-23 18:30:40 +00:00
Hui Su daf2fb42e6 Merge "Add "entropy" experiment" into nextgenv2 2016-03-23 17:50:57 +00:00
Alex Converse b5454b245a Merge "Add some ANS helpers needed to replace the vpx bool coder with pure ANS." into nextgenv2 2016-03-23 16:21:58 +00:00
Yi Luo 977dccd12c Highbd fht4x4 SSE4.1 optimization for DCT_DCT mode
- Setup function vp10_highbd_fht4x4_sse4_1 for highbd SSE4.1
  intrinsics optimization.
- Wrote SSE4.1 functions: load_buffer_4x4(), write_buffer_4x4(),
  and fdct4x4_sse4_1().
- Used logic right shift to avoid coeff memory write/read.
- Turned on vp10_highbd_fht4x4_sse4_1 for DCT_DCT mode only.
- Improved overall encoding performance >2.3% for 50 frames
  sequence, park_joy_1080p_12.y4m, in which, --input-bit-depth=12,
  --bit-depth=12, 50 frames.
- Unit test passed.

Change-Id: Idd6dc6e472cbbf235f0ade4f66fbe859a860a004
2016-03-23 09:13:45 -07:00
Debargha Mukherjee 7a3bae768e Merge "Porting ext_partition experiment from nextgen" into nextgenv2 2016-03-23 04:58:38 +00:00
Alex Converse 6b9cb8c489 Add some ANS helpers needed to replace the vpx bool coder with pure ANS.
Change-Id: I32b63fca020c410cef16e93379b4e6e281ccbccd
2016-03-22 16:23:23 -07:00
Yue Chen 2613b5e9d6 Merge "Refactor prediction functions of OBMC" into nextgenv2 2016-03-22 21:06:16 +00:00
Julia Robson 5cce322a09 Porting ext_partition experiment from nextgen
This has been ported under ext_partition_types because it is due
to be combined with the coding_unit_size experiment which is
already being ported under ext_partition

Change-Id: I47af869ae123ddf0aa99160dac644059d14266ee
2016-03-22 12:29:01 -07:00
Angie Chiang 9d380d8872 Merge "mv vp10_fwd_txfm2d_#x# into vp10_rtcd.h" into nextgenv2 2016-03-22 01:07:56 +00:00
Angie Chiang 063e965d7d Merge "Passing TXFM_TYPE instead of func pointer" into nextgenv2 2016-03-22 01:07:42 +00:00
Jingning Han 4df51c8de4 Merge "Refactor sub8x8 reference motion vector search function" into nextgenv2 2016-03-22 00:07:45 +00:00
Jingning Han bfdcccd8a1 Merge "Rework the DRL syntax entropy coding system" into nextgenv2 2016-03-22 00:07:36 +00:00
Yue Chen 2e3f77316d Refactor prediction functions of OBMC
Merge the functions that generate prediction by above/left predictors
for the encoder and the decoder.

Change-Id: I57e53a8f2eb8d3028c4ed0c9abdcbf00503f95a0
2016-03-21 17:04:13 -07:00
Debargha Mukherjee 1b17559327 Adds 1D transforms for ADST/FlipADST to make 16
Makes a set of 16 transforms total, adding all 1D
combinations of ADST and FlipADST, and removng all DST
transforms.

lowres, midres both improve by about 0.1% and hdres by
-0.378% in BDRATE but with fewer transforms that are also
simpler.

Further experiments to continue later.

Change-Id: I7348a4c0e12078fdea5ae3a2d36a89a319ffcc6e
2016-03-21 11:19:36 -07:00
Angie Chiang abd447e339 mv vp10_fwd_txfm2d_#x# into vp10_rtcd.h
Change-Id: Iad7352698786791b0fd7c005a7edfd1724b71599
2016-03-21 10:51:54 -07:00
Angie Chiang 40ef86f27d Passing TXFM_TYPE instead of func pointer
This is to facilitate sse2 implementation

Change-Id: Id2f53e83c5508c4445d9b1bba00a649cb4da6b74
2016-03-21 10:50:59 -07:00
Jingning Han 66df6e7c7f Refactor sub8x8 reference motion vector search function
Rework the interface to allow codec store the reference motion
vector list information for coding process.

Change-Id: I47e26587f6c0808655e4626f316ec7614a7ad8ed
2016-03-21 10:02:08 -07:00
Jingning Han 5c9d315572 Rework the DRL syntax entropy coding system
This commit re-designs the probability model for the syntax elements
of the dynamic motion vector referencing system.

Change-Id: Icfb8203c7e8f64e10e99f5890e25e6f6b15fe5d1
2016-03-21 09:52:33 -07:00
Geza Lore efe7d4e5a2 Refactor mbmi->inter_tx_size to 2D array.
This is in preparation of increasing the superblock size.

Change-Id: I9197e397399fbe8aec1178a45ea0337dd90412d7
2016-03-18 15:30:09 +00:00
Angie Chiang ed2514a22c add dct 64x64 transform
Change-Id: I131c4d1216cd156e520b8a91c4438c2d3c6602cb
2016-03-16 19:37:21 -07:00
hui su 83b47af18d Add "entropy" experiment
This patch added two features to improve entropy coding efficiency
for coefficient tokens.

1. Choose 1 of 4 default probability tables based on q-index for
key-frames.
It is ported from nextgen branch:
https://chromium-review.googlesource.com/#/c/280586/

2. Do backward update after each superblock (64X64) row using
subframe token counts.

Coding gain: 0.1% on lowres; 0.42% on midres; 0.36% on hdres.
Much larger gain for key-frames: 2.6%, 2.3%, 1.7%.

Design doc: go/huisu-entropy

Change-Id: Ia3b6a615636be09247d70e4c520405637561532b
2016-03-16 11:55:50 -07:00
Geza Lore c2005c578b Factor out zeroing above and left context.
Change-Id: I6e5d8cff869c7415a924f845c9e6ccaabe2b7a9b
2016-03-16 13:08:29 +00:00
Debargha Mukherjee dcbbb81605 Merge "Refactor 1D transforms" into nextgenv2 2016-03-15 19:08:07 +00:00
Debargha Mukherjee cb37db126e Merge "Fix copy/zero macros." into nextgenv2 2016-03-15 17:45:31 +00:00
Jingning Han b00aa8f216 Merge "Turn off 32x32 transform type selection" into nextgenv2 2016-03-15 16:59:37 +00:00
Geza Lore a301f5e03d Fix copy/zero macros.
Change-Id: I2df3b6ecd35406ee05c2aa4e49be779e73e1bdc6
2016-03-15 10:57:58 +00:00
Debargha Mukherjee 9b88762b17 Refactor 1D transforms
In preparation for adding more 1D variants with ADST/FlipADST/etc.

BDRATE actually improves by 0.21% on lowres.

Change-Id: I2fa4720c69fe001fa666119a284dfc6b17fffab2
2016-03-14 22:30:09 -07:00
Jingning Han a2c87a3dda Turn off 32x32 transform type selection
Temporarily disable transform type selection for 32x32 transform
block size. This speeds up the encoding process. For bus at CIF
150 frames, the encoding time goes from 896s -> 762s (11% faster).
The compression performance for lowres set is improved by 0.15%,
and -0.029% for hdres.

Change-Id: If239b272970eb302150bec13b8cf192fbe045332
2016-03-14 11:31:03 -07:00
Debargha Mukherjee e38e2ad86e Merge "Fix an overflow in highbitdepth loop restoration" into nextgenv2 2016-03-11 21:48:37 +00:00
Hui Su f0e0a7e7e9 Merge "Complete (mostly) migration of palette mode" into nextgenv2 2016-03-11 19:52:41 +00:00
Debargha Mukherjee 7ea59de69c Fix an overflow in highbitdepth loop restoration
Change-Id: Ie20cd35a4c96443c0de234d2cf097187a70ec8dd
2016-03-11 11:48:24 -08:00
Hui Su f7fbc54bd1 Merge "Fix compiler warnings" into nextgenv2 2016-03-11 19:47:39 +00:00
hui su 8fce4b8543 Fix compiler warnings
Change-Id: I00314ec296e8368f1239a556b3a55feac9cec7ae
2016-03-11 10:13:08 -08:00
Jingning Han 68d9a14e9f Merge "Enable hybrid 1-D/2-D transform coding for highbd setting" into nextgenv2 2016-03-11 18:09:11 +00:00
hui su 78b0bd0a0d Complete (mostly) migration of palette mode
Coding gain on screen_content is 12.2% (was 6.6%).

Some features such as frame-level color buffer, adaptive
entropy coding, are coming in future patches.

Change-Id: I2658cf5ec0cbb02cff685475759f3b68c9807697
2016-03-11 09:56:21 -08:00
Debargha Mukherjee ce4b35d510 Merge "Adds compound wedge prediction modes" into nextgenv2 2016-03-10 17:44:45 +00:00
Jingning Han c453ae53d0 Enable hybrid 1-D/2-D transform coding for highbd setting
This commit enables the hybrid 1-D/2-D transform coding scheme for
high bit-depth setting. It improves the compression performance of
ext-tx experiment by 0.98% for lowres_all set.

Change-Id: Ic27f5037f2c36b095a93b9f15dbae34bdcdf00aa
2016-03-10 08:58:07 -08:00
Debargha Mukherjee f34deab243 Adds compound wedge prediction modes
Incorporates wedge compound prediction modes.

Change-Id: Ie73b54b629105b9dcc5f3763be87f35b09ad2ec7
2016-03-10 07:19:54 -08:00
Yi Luo 431e35913e Merge "Implemented DST 16x16 SSE2 intrinsics optimization" into nextgenv2 2016-03-09 22:27:44 +00:00
Jingning Han 240ae9729e Merge "Add horizontal and vertical scan order for 1-D transform" into nextgenv2 2016-03-09 20:47:06 +00:00
Jingning Han e0413094fb Add horizontal and vertical scan order for 1-D transform
This commit enables the 1-D transform to use Manhattan grid vertical
and horizontal scan order for transform coefficient entropy coding.

Enabled in inter prediction mode, the hybrid 1D/2D transform coding
scheme outperforms the 2D-DCT based coding system used in VP9 by
lowres_all  1.7%
hdres_all   1.4%

As one coding option, in addition to the existing 17 other transform
types in ext-tx experiment, the 1D/2D hybrid transform improves
the coding gains:
lowres_all  2.2% -> 3.0%

Change-Id: I9cefa9d9e38224546d0afd67feecd9f8d4a16ab0
2016-03-09 10:58:23 -08:00
hui su 954e560f9e Refactor entropy coding of transform size
No performance change.

Change-Id: If35125fed909d89235b303514f77a33183bb36b3
2016-03-08 16:46:00 -08:00
Yi Luo 50a164a1f6 Implemented DST 16x16 SSE2 intrinsics optimization
- Implemented fdst16_sse2(), fdst16_8col() against C version: fdst16().
- Turned on 7 DST related hybrid txfm types in vp10_fht16x16_sse2().
- Replaced vp10_fht10x10_c() with vp10_fht16x16_sse2() in
  fwd_txfm_16x16().
- Added vp10_fht16x16_sse2() unit test against C version:
  vp10_fht16x16_c() (--gtest_filter=*VP10Trans16x16*).
- Unit test passed.
- Speed improvement: 2.4%, 3.2%, 3.2%, for city_cif.y4m, garden_sif.y4m,
  and mobile_cif.y4m.

Change-Id: Ib30a67ce5d5964bef143d588d0f8fa438be8901f
2016-03-08 14:56:38 -08:00
Yaowu Xu 28eb784e46 Fix several MSVC compiler warning/errors
Change-Id: Iccaacee9b7a66b016b5747a3902c236888ad4ba1
2016-03-07 17:00:03 -08:00
Alex Converse 76d4fdd391 Merge "ANS: Switch from PDFs to CDFs." into nextgenv2 2016-03-07 20:51:45 +00:00
Debargha Mukherjee 6adfba7c0f Merge "Make sharp filter 10 tap and makes sharp2 sharper" into nextgenv2 2016-03-07 19:51:42 +00:00
Jingning Han 79c5a533cd Merge "Hybrid 1-D/2-D transform coding" into nextgenv2 2016-03-07 19:15:44 +00:00
Jingning Han a8dc9694a4 Hybrid 1-D/2-D transform coding
This commit enables a hybrid 1-D/2-D transform coding scheme and
the accompany entropy coding system. It currently uses hybrid
1-D/2-D DCT transform coding. It provides coding performance gains:

lowres_all  0.55%
hdres_all   0.43%

Change-Id: I2b30dcafd21eb2bb3371f6e854cbab440a4dfa78
2016-03-07 09:27:46 -08:00
Hui Su 5e5bef6c18 Merge "Cleanup in get_uv_tx_size" into nextgenv2 2016-03-05 07:42:26 +00:00
hui su c3c1c6f405 Cleanup in get_uv_tx_size
Change-Id: Ia2aa7558f9f53da7dff970b30fe0a94958159ffb
2016-03-04 16:53:19 -08:00
Yue Chen 10cdeab42a Fix a bug in obmc prediction
For left side obmc, the input of the mask function is corrected as
the column coordinate.
Also, minor fixes for a compiler warning.

Change-Id: Ia981ef443d5b0285a93d73e5c7ab83f8c3a23464
2016-03-04 15:54:14 -08:00
Jingning Han 351ca31238 Merge "Apply mv precision check to reference mv candidate" into nextgenv2 2016-03-04 16:54:27 +00:00
Debargha Mukherjee 7d2618bc70 Make sharp filter 10 tap and makes sharp2 sharper
There is a ~0.1% gain.

Various experiments with different kinds of windowing functions to
follow.

Change-Id: I0787fddca53607ab39e53f919066839301938e68
2016-03-03 12:01:55 -08:00
Alex Converse 6bbbe31656 ANS: Switch from PDFs to CDFs.
Make the RANS implementation operate on cumulative distribution
functions rather than individual probability distribution functions.
CDFs have shown themselves more flexible to work with.

Reduces decoding memory usage from scaling O(num_distributions *
symbol_resolution) to O(num_distributions).

No bitstream change. This is an purely implementation change.

Change-Id: I4e18d3a0a3d37a36a61487c3d778f9d088b0b374
2016-03-03 09:32:54 +00:00
Jingning Han 13fb7c1b88 Apply mv precision check to reference mv candidate
This allows the codec to use effective motion vector as the candidate
to produce the reference motion vector list.

Change-Id: Ib90be705fe28200c13376d6d7741800a61f13043
2016-03-02 20:14:07 -08:00
hui su ebc6e058db Fix a bug in vp10_predict_intra_block
Avoid mistakenly setting "have_right" as 0 for UV channel in blocks
of width no larger than 8.

Change-Id: Ic2b031e32f967a23fd118a052bf9edd7d5a3abe6
2016-03-02 11:22:09 -08:00
Hui Su 90fe1cffbf Merge "Fix a couple of minor bugs in vp10_has_right and vp10_has_bottom" into nextgenv2 2016-03-02 00:33:38 +00:00
hui su 935a837c01 Fix a couple of minor bugs in vp10_has_right and vp10_has_bottom
The above-right and left-bottom pixels were sometimes not used even
though they are available. Results on lowres_all and hdres_all are
mostly neutral.

Change-Id: Ic13533dd498442ad5592b83bb5fabf053cc8e8f0
2016-03-01 10:09:04 -08:00
Angie Chiang 7667733991 Update obmc counts in multithread mode
Change-Id: I0743e00dad9d36a87870c480922f5ae904bd5c9d
2016-02-29 17:09:02 -08:00
Debargha Mukherjee db084506d8 A build fix and some other cosmetic changes
Fixes some issues introduced by a merge of two patches.
Also decouples the temporal interpolation filter from the switchable
filters for now for ease of experimentation with both separately.

Change-Id: If1c7c08adf00e0cf818fe8d0d3656c26ea65eb32
2016-02-29 10:20:52 -08:00
Debargha Mukherjee 48589e8d07 Merge "Some refactoring and cleanups of interp filter" into nextgenv2 2016-02-29 15:55:48 +00:00
Jingning Han dca86af8f4 Merge "Unify frame border extension operation" into nextgenv2 2016-02-27 01:22:03 +00:00
Debargha Mukherjee bab2912b5e Some refactoring and cleanups of interp filter
Includes various cosmetic changes and refactoring including
naming the sharp filters differently (since they are no longer
8-tap).

Change-Id: Ida5a19ca0daa9f6a64a6734394c685b2a4a2564a
2016-02-26 15:42:49 -08:00
Jingning Han d1d11fc6dd Unify frame border extension operation
This commit unifies the encoder and decoder border extension and
motion compensated prediction process. Remove the decoder specific
flow to simplify the development flow.

Change-Id: I9c43bbe6d7c017e6da2db6a62c5bf3d0af7ccfce
2016-02-26 13:58:53 -08:00
Geza Lore 7ded038af5 Port interintra experiment from nextgen.
The interintra experiment, which combines an inter prediction and an
inter prediction have been ported from the nextgen branch. The
experiment is merged into ext_inter, so there is no separate configure
option to enable it.

Change-Id: I0cc20cefd29e9b77ab7bbbb709abc11512320325
2016-02-26 13:01:51 -08:00
Debargha Mukherjee 3287f5519e Merge "Hooks to use 32x32 masked transforms for ext-tx" into nextgenv2 2016-02-26 20:54:37 +00:00
Jingning Han 2b7196a8bb Merge "Use sharp filter for alter reference frame generation" into nextgenv2 2016-02-26 16:24:59 +00:00
Jingning Han 83ecafbd95 Merge "Enable context based motion vector entropy coding" into nextgenv2 2016-02-26 16:24:49 +00:00
Jingning Han 72eda13e50 Use sharp filter for alter reference frame generation
This commit uses 12-tap sharp filter to generate alter reference
frame. It improves the compression performance by
derf    0.45%
hevcmr  0.35%
stdhd   0.79%

No encoding time change is observed.

Change-Id: Ia5dc26d5aae6b9b0cb782e5a28dc5066eeeb2ec8
2016-02-25 14:20:38 -08:00
Debargha Mukherjee da2d4a7afc Hooks to use 32x32 masked transforms for ext-tx
Adds hooks to use 32x32 ext-tx. Also adds scan orders for the masked
transforms for 32x32.
Make macro USE_MSKTX_FOR_32X32 1 in blockd.h to support 32x32 masked
transforms for ext-tx.

Change-Id: Ie6564830266651fcafae2d536c274dafd664ce17
2016-02-24 13:08:37 -08:00
Debargha Mukherjee ad574d4008 Merge "Some fixes in reconintra" into nextgenv2 2016-02-24 20:25:25 +00:00
Yaowu Xu aa6c754635 Merge remote-tracking branch 'webm/master' into nextgenv2 2016-02-24 10:53:17 -08:00
Debargha Mukherjee 3ef0db078e Some fixes in reconintra
Change-Id: I0b0fa7c9853ce12d39ee21829686b308154b2c61
2016-02-24 10:49:35 -08:00
Geza Lore 44dba01f3e Rename above and left offset variables.
These variable names were legacy from a previous version of this
function and in the current version they were confusingly backwards.

Change-Id: I4f6c1628f296fd5b650fd9c5e2d56d7daf66a3f6
2016-02-24 17:39:48 +00:00
Jingning Han 47bc2a5741 Enable context based motion vector entropy coding
This commit enables a context based motion vector entropy coding
conditioned on dynamic reference motion vector list. This (along with
the previous CL) imporves the coding gains due to dynamic motion
vector referencing based entropy coding:
derf   0.1%
hevcmr 0.2%
stdhd  0.7%
hevchr 0.4%

No encoding time change was observed.

Change-Id: I179c723844079195f6952a12582996a3ca9e9914
2016-02-24 09:02:32 -08:00
Alex Converse 05f33142f5 Merge "Port "Better workaround for Bug 1089." to vp10 (nextgenv2)." into nextgenv2 2016-02-23 17:53:57 +00:00
Angie Chiang 5340d1424d Merge "Merge 12sharp filter into ext-interp" into nextgenv2 2016-02-23 01:26:23 +00:00
Angie Chiang e4af6a42a7 Merge 12sharp filter into ext-interp
Change-Id: I7df48e7f3b57f212798ef4be86f28aed928fc3e0
2016-02-22 16:26:38 -08:00
Angie Chiang 94493e606d Merge "Fix 12 TAP convolution bug" into nextgenv2 2016-02-22 19:03:06 +00:00
Alex Converse 9fce131de8 Port "Better workaround for Bug 1089." to vp10 (nextgenv2).
Don't initialize first pass costs for a number of symbols where first
pass probabilities aren't initialized.

As a side effect, an illegal read in the ANS experiment is fixed.

https://bugs.chromium.org/p/webm/issues/detail?id=1089

Change-Id: I97438c357bd88f52f5a15c697031cf0c3cc8f510
2016-02-22 10:19:03 -08:00
Jingning Han 1f984a5a63 Merge "Vectorize motion vector probability models" into nextgenv2 2016-02-22 17:37:29 +00:00
Jingning Han 682dad0ec7 Merge "Store predicted motion vectors" into nextgenv2 2016-02-22 17:14:05 +00:00
Angie Chiang 1e403064b9 Fix 12 TAP convolution bug
Priviously, we do 12-tap interpolation even there is no sub pixel,
This could cause a bug becuase decoder doesn't extend border when there
is no sub pixel. In this situation, if we still do interpolation, we
will access the border extension which doesn't exist and cause a
memory error

Change-Id: I55b879722f0a10c5d13261bd9617a75c826a2418
2016-02-19 19:31:38 -08:00