Граф коммитов

394 Коммитов

Автор SHA1 Сообщение Дата
Angie Chiang ff8c490b9a Branch dct to new implementation for bd12
Change-Id: I9281935653aacce22ac3100f79fb956c249e2bf3
2016-04-04 12:40:10 -07:00
Angie Chiang f1060f5bc4 Change dct32x32's range
Bitdepth 10/12:
Fit coefficient range into 32 bits
Fit codfficient * const range into 32 bits

Bitdepth 8:
Fit coefficient range into 16 bits
Fit codfficient * constant range into 32 bits

Change-Id: I50b5a3132e8a9f5155c971ab0f6eb52876d2b5ca
2016-04-04 11:21:11 -07:00
Angie Chiang 39b3c025fa Fit dct's stage range into 32-bit when bitdepth is 12
Change-Id: I807e60c6dcacc50c087adcbdb1df022f8541efc5
2016-04-04 11:13:44 -07:00
Angie Chiang c7c40d2329 Generalize txfm scale in highbd quantizer
Change-Id: I359aa49c09b244e0d44ebd09442e365a3d22556c
2016-03-30 15:25:26 -07:00
Angie Chiang 64413a6ca7 Parameterize transform scale for quantizer
This is to facilitate changing transform scale later

Change-Id: Ic8ca5afba57d2489ebd191ccc40c1b31605a0d8c
2016-03-30 15:25:26 -07:00
Angie Chiang 25520d8dc3 change vp10_fwd_txfm2d_#x#_sse2 to vp10_fwd_txfm2d_#x#_sse4_1
The speed performance for running 20k times  is as follows

Notice that the vp10_highbd_fdct#x#_sse2 version is
16-bit version plus range check

The rest are 32-bit version

vp10_fwd_txfm2d_4x4_c (2 ms)
vp10_fwd_txfm2d_8x8_c (9 ms)
vp10_fwd_txfm2d_16x16_c (45 ms)
vp10_fwd_txfm2d_32x32_c (233 ms)

vp10_fwd_txfm2d_4x4_sse4_1 (2 ms)
vp10_fwd_txfm2d_8x8_sse4_1 (3 ms)
vp10_fwd_txfm2d_16x16_sse4_1 (16 ms)
vp10_fwd_txfm2d_32x32_sse4_1 (80 ms)

vp10_highbd_fdct4x4_c (1 ms)
vp10_highbd_fdct8x8_c (3 ms)
vp10_highbd_fdct16x16_c (17 ms)
highbd_fdct32x32_c (160 ms)

vp10_highbd_fdct4x4_sse2 (0 ms)
vp10_highbd_fdct8x8_sse2 (2 ms)
vp10_highbd_fdct16x16_sse2 (8 ms)
highbd_fdct32x32_sse2 (105 ms)

Change-Id: I24daf1e0d4d66e91e4ce61ef71cefa7b70ee90ce
2016-03-30 15:25:26 -07:00
Angie Chiang c75f64780b Remove redundant code from vp10_fwd_txfm2d.c
Change-Id: I87ae5e93957616c0f5160a4f679e42f77092c33f
2016-03-30 15:25:26 -07:00
Angie Chiang f2b311f580 Simplify rounding in vp10_[fwd/inv]_txfm[1/2]d_#x#
Change-Id: I24ce46e157dc5b9c0d75000a1a48e9c136ed4ee1
2016-03-30 15:25:26 -07:00
Angie Chiang 11d2bb5429 Add vp10_fwd_txfm2d_sse2
Change-Id: Idfbe3c7f5a7eb799c03968171006f21bf3d96091
2016-03-30 15:25:26 -07:00
Debargha Mukherjee 91707ac79e Merge "Extend superblock size fo 128x128 pixels." into nextgenv2 2016-03-30 20:55:32 +00:00
Geza Lore 552d5cd715 Extend superblock size fo 128x128 pixels.
If --enable-ext-partition is used at build time, the superblock size
(sometimes also referred to as coding unit (CU) size) is extended to
128x128 pixels.

Change-Id: Ie09cec6b7e8d765b7555ff5d80974aab60803f3a
2016-03-30 18:23:06 +01:00
Debargha Mukherjee e467627f33 Merge "Fix for ext_interp experiment" into nextgenv2 2016-03-30 14:44:39 +00:00
Yaowu Xu 37241e6f95 Merge "Merge branch 'masterbase' into nextgenv2" into nextgenv2 2016-03-29 16:05:53 +00:00
Julia Robson 068e799459 Fix for ext_interp experiment
Amends previous commit to also handle subsampling correctly.
Change ID of prev commit: I6b07e6cf9b287ba4b5bd6599af4a7412e50b3bdc

Was causing occassional failures for 422 streams due to accessing
elements beyond the extent of the bmi array.

Change-Id: I37ebabf4c01ca84bcd1851428172bdf753805d98
2016-03-29 16:09:49 +01:00
Yaowu Xu c810740c36 Merge branch 'masterbase' into nextgenv2
Conflicts:
	vp9/encoder/vp9_encoder.c
	vpx_dsp/x86/convolve.h

Change-Id: I60c3532936bedd796a75dfe78245a95ec21e2e55
2016-03-28 17:44:28 -07:00
Angie Chiang 4144a11552 Merge "Use vp10_[fwd/inv]_txfm2d_add_32x32 for bd 10" into nextgenv2 2016-03-28 19:20:48 +00:00
Hui Su 14f2d03b4b Merge "Fix assertion fail in build_intra_predictors" into nextgenv2 2016-03-28 18:14:47 +00:00
Angie Chiang 33833aefdd Merge "Use vp10_[fwd/inv]_txfm2d_add_#x# for bd 10" into nextgenv2 2016-03-28 18:11:47 +00:00
Angie Chiang 46b234478f Use vp10_[fwd/inv]_txfm2d_add_32x32 for bd 10
Change-Id: I996c48a90d7d71b52594a91a35cb8712c7fc212e
2016-03-28 11:08:40 -07:00
Alex Converse 72e29c3a73 Merge changes I3c72a2d8,I9905f3a8 into nextgenv2
* changes:
  Add pluggable bitwriters.
  Add pluggable bitreaders.
2016-03-28 16:59:18 +00:00
hui su f24b91c9e1 Fix assertion fail in build_intra_predictors
Change-Id: Id6683b9593b52aa0d159f8f013782d9e0bd07206
2016-03-28 09:37:54 -07:00
Alex Converse efd566ff93 Add pluggable bitreaders.
This will make the code change for a pure ANS experiment manageable.

Change-Id: I9905f3a89f492a4346860463a72fa8c52aac4c8e
2016-03-25 11:02:41 -07:00
Yunqing Wang bdcc14051b Recover tile coding performance
After porting tile coding from VP9 to VP10, some performance
degradation was seen because of the difference between VP9 and
Vp10 baseline. This patch disabled some features in VP10 while
tile coding is turned on. Also, an encoder control API was added
back for this use case.

Change-Id: I8f736db8388408a8cc35320a2f80abb02906571c
2016-03-25 09:05:25 -07:00
Geza Lore 490ba1ad25 Port large scale tile coding features from nextgen.
If configured with --enable-ext-tile, the codec uses an alternative
tile coding syntax in the bitstream. Changes include::
 - The maximum number of tile rows and columns is extended to 1024
   each.
 - The minimum tile width/height is 64 pixels (1 superblock).
 - A tile copy mode is added where a tile directly reuse the coded
   data of a previous tile
 - The meaning of the tile-columns and tile-rows codec parameters are
   overloaded to mean tile-width and tile-height in units of 64
   pixels.
 - All tiles should now be independent, including rows within the
   same columns, so large scale parallel, or independent decoding is
   possible.
 - vpxdec also gained the options to decode only a particular tile,
   tile row, or tile column.

Changes without --enable-ext-tile:
 - All tiles should now be independent, including rows within the
   same columns, so large scale parallel, or independent decoding is
   possible.
 - vpxenc default tile configuration changed to use 1 tile column.

Change-Id: I0cd08ad550967ac18622dae5e98ad23d581cb33e
2016-03-24 09:26:05 +00:00
Jingning Han 1fcb5fc755 Refactor motion vector residual coding process
This commit separates the predicted motion vector from the nearestmv
motion vector in the coding process for both regular and sub8x8
block sizes.

Change-Id: I703490513b0194e6669ebf719352db015facb3e1
2016-03-23 12:10:38 -07:00
Angie Chiang d9a0cbb1b7 Use vp10_[fwd/inv]_txfm2d_add_#x# for bd 10
Change-Id: Ie35bdbd7aafae693e3106d7ccbbdd8e65ee8800c
2016-03-23 12:05:12 -07:00
Yi Luo deb33056d1 Merge "Highbd fht4x4 SSE4.1 optimization for DCT_DCT mode - Setup function vp10_highbd_fht4x4_sse4_1 for highbd SSE4.1 intrinsics optimization. - Wrote SSE4.1 functions: load_buffer_4x4(), write_buffer_4x4(), and fdct4x4_sse4_1(). - Used logic right shift to avoid coeff memory write/read. - Turned on vp10_highbd_fht4x4_sse4_1 for DCT_DCT mode only. - Improved overall encoding performance >2.3% for 50 frames sequence, park_joy_1080p_12.y4m, in which, --input-bit-depth=12, --bit-depth=12, 50 frames. - Unit test passed." into nextgenv2 2016-03-23 18:30:40 +00:00
Hui Su daf2fb42e6 Merge "Add "entropy" experiment" into nextgenv2 2016-03-23 17:50:57 +00:00
Alex Converse b5454b245a Merge "Add some ANS helpers needed to replace the vpx bool coder with pure ANS." into nextgenv2 2016-03-23 16:21:58 +00:00
Yi Luo 977dccd12c Highbd fht4x4 SSE4.1 optimization for DCT_DCT mode
- Setup function vp10_highbd_fht4x4_sse4_1 for highbd SSE4.1
  intrinsics optimization.
- Wrote SSE4.1 functions: load_buffer_4x4(), write_buffer_4x4(),
  and fdct4x4_sse4_1().
- Used logic right shift to avoid coeff memory write/read.
- Turned on vp10_highbd_fht4x4_sse4_1 for DCT_DCT mode only.
- Improved overall encoding performance >2.3% for 50 frames
  sequence, park_joy_1080p_12.y4m, in which, --input-bit-depth=12,
  --bit-depth=12, 50 frames.
- Unit test passed.

Change-Id: Idd6dc6e472cbbf235f0ade4f66fbe859a860a004
2016-03-23 09:13:45 -07:00
Debargha Mukherjee 7a3bae768e Merge "Porting ext_partition experiment from nextgen" into nextgenv2 2016-03-23 04:58:38 +00:00
Alex Converse 6b9cb8c489 Add some ANS helpers needed to replace the vpx bool coder with pure ANS.
Change-Id: I32b63fca020c410cef16e93379b4e6e281ccbccd
2016-03-22 16:23:23 -07:00
Yue Chen 2613b5e9d6 Merge "Refactor prediction functions of OBMC" into nextgenv2 2016-03-22 21:06:16 +00:00
Julia Robson 5cce322a09 Porting ext_partition experiment from nextgen
This has been ported under ext_partition_types because it is due
to be combined with the coding_unit_size experiment which is
already being ported under ext_partition

Change-Id: I47af869ae123ddf0aa99160dac644059d14266ee
2016-03-22 12:29:01 -07:00
Angie Chiang 9d380d8872 Merge "mv vp10_fwd_txfm2d_#x# into vp10_rtcd.h" into nextgenv2 2016-03-22 01:07:56 +00:00
Angie Chiang 063e965d7d Merge "Passing TXFM_TYPE instead of func pointer" into nextgenv2 2016-03-22 01:07:42 +00:00
Jingning Han 4df51c8de4 Merge "Refactor sub8x8 reference motion vector search function" into nextgenv2 2016-03-22 00:07:45 +00:00
Jingning Han bfdcccd8a1 Merge "Rework the DRL syntax entropy coding system" into nextgenv2 2016-03-22 00:07:36 +00:00
Yue Chen 2e3f77316d Refactor prediction functions of OBMC
Merge the functions that generate prediction by above/left predictors
for the encoder and the decoder.

Change-Id: I57e53a8f2eb8d3028c4ed0c9abdcbf00503f95a0
2016-03-21 17:04:13 -07:00
Debargha Mukherjee 1b17559327 Adds 1D transforms for ADST/FlipADST to make 16
Makes a set of 16 transforms total, adding all 1D
combinations of ADST and FlipADST, and removng all DST
transforms.

lowres, midres both improve by about 0.1% and hdres by
-0.378% in BDRATE but with fewer transforms that are also
simpler.

Further experiments to continue later.

Change-Id: I7348a4c0e12078fdea5ae3a2d36a89a319ffcc6e
2016-03-21 11:19:36 -07:00
Angie Chiang abd447e339 mv vp10_fwd_txfm2d_#x# into vp10_rtcd.h
Change-Id: Iad7352698786791b0fd7c005a7edfd1724b71599
2016-03-21 10:51:54 -07:00
Angie Chiang 40ef86f27d Passing TXFM_TYPE instead of func pointer
This is to facilitate sse2 implementation

Change-Id: Id2f53e83c5508c4445d9b1bba00a649cb4da6b74
2016-03-21 10:50:59 -07:00
Jingning Han 66df6e7c7f Refactor sub8x8 reference motion vector search function
Rework the interface to allow codec store the reference motion
vector list information for coding process.

Change-Id: I47e26587f6c0808655e4626f316ec7614a7ad8ed
2016-03-21 10:02:08 -07:00
Jingning Han 5c9d315572 Rework the DRL syntax entropy coding system
This commit re-designs the probability model for the syntax elements
of the dynamic motion vector referencing system.

Change-Id: Icfb8203c7e8f64e10e99f5890e25e6f6b15fe5d1
2016-03-21 09:52:33 -07:00
Geza Lore efe7d4e5a2 Refactor mbmi->inter_tx_size to 2D array.
This is in preparation of increasing the superblock size.

Change-Id: I9197e397399fbe8aec1178a45ea0337dd90412d7
2016-03-18 15:30:09 +00:00
Angie Chiang ed2514a22c add dct 64x64 transform
Change-Id: I131c4d1216cd156e520b8a91c4438c2d3c6602cb
2016-03-16 19:37:21 -07:00
hui su 83b47af18d Add "entropy" experiment
This patch added two features to improve entropy coding efficiency
for coefficient tokens.

1. Choose 1 of 4 default probability tables based on q-index for
key-frames.
It is ported from nextgen branch:
https://chromium-review.googlesource.com/#/c/280586/

2. Do backward update after each superblock (64X64) row using
subframe token counts.

Coding gain: 0.1% on lowres; 0.42% on midres; 0.36% on hdres.
Much larger gain for key-frames: 2.6%, 2.3%, 1.7%.

Design doc: go/huisu-entropy

Change-Id: Ia3b6a615636be09247d70e4c520405637561532b
2016-03-16 11:55:50 -07:00
Geza Lore c2005c578b Factor out zeroing above and left context.
Change-Id: I6e5d8cff869c7415a924f845c9e6ccaabe2b7a9b
2016-03-16 13:08:29 +00:00
Debargha Mukherjee dcbbb81605 Merge "Refactor 1D transforms" into nextgenv2 2016-03-15 19:08:07 +00:00
Debargha Mukherjee cb37db126e Merge "Fix copy/zero macros." into nextgenv2 2016-03-15 17:45:31 +00:00