Граф коммитов

71 Коммитов

Автор SHA1 Сообщение Дата
Alex Converse 2be9ea610f Use INTER_ALL for VAR based partitions for screencast material.
This offers 25% more compression on my HD screencast testset.

Change-Id: I85eaef95fd8f2e03e326443e9514482b2ee35cef
2014-08-05 15:23:50 -07:00
Jingning Han ca2dcb7fed Chessboard pattern partition search
This commit enables a chessboard pattern constrained partition
search for 720p and above resolutions. The scheme applies stricter
partition search to alternative blocks based on its above/left
neighboring blocks' partition range, as well as that of the
collocated blocks in the previous frame. It is currently turned
on at 16x16 block size level. The chessboard pattern is flipped
per coding frame.

The speed 3 runtime is reduced:
park_joy_1080p, 652832 ms -> 607738 ms (7% speed-up)
pedestrian_area_1080p, 215998 ms -> 200589 ms (8% speed-up)

The compression performance is changed:
hd     -0.223%
stdhd  -0.295%

Change-Id: I2d4d123ae89f7171562f618febb4d81789575b19
2014-07-30 10:32:41 -07:00
Jingning Han 54ad09586c Enable chessboard inter prediction filter type search
This commit enables a chessboard pattern prediction filter type
search scheme for rate-distortion optimization speed-up. For the
inferred motion vector modes, the encoder can re-use its above/left
neighbor blocks' prediction filter type and skip a full test on
all possible filter types. Such operation is turned on/off
alternatively in a chessboard manner.

It is turned on in speed 3. For test clip pedestrian 1080p, the
runtime is reduced from 231500 ms -> 221700 ms. The compression
performance is changed:
derf:  -0.147%
yt:    -0.134%
hd:    -0.079%
stdhd: -0.220%

Change-Id: I1912f278e7576c2dc632688e3ad7a257410c605a
2014-07-22 16:49:03 -07:00
Jingning Han ffd948bbd5 Turn on adaptive pred filter scheme for sub8x8 below 720p
For sequences of resolution below 720p, the encoder will check
intra prediction modes and inter prediction modes from LAST_FRAME.
This commit turns on adaptive prediction filter scheme for sub8x8
blocks, where inter prediction modes are enabled. For the test
sequence bus at CIF, the speed 2 runtime goes down from 17879 ms
to 16783 ms, i.e., 6% speed up. The compression performance of
derf set is down by -0.128%.

Change-Id: I01d5321a5ceab4e0666ac5be56c52d896c7a8d45
2014-07-21 16:22:56 -07:00
Yaowu Xu 51c60a891e make default_interp_filter choice a speed feature
This commit changed the hard-coded DEFAULT_INTERP_FILTER to a speed
feature with the same default value: SWITCHABLE.

Change-Id: I7f54f40f1bd3f5277841d04b85db7a84e47313f1
2014-07-16 14:28:51 -07:00
Yaowu Xu faa686bb1b Added a rt speed 12
We target this speed to achieve similar encoding speed and better
compression than vp8 rt mode with cpu-used at -12.

Change-Id: Ic1bb4371c81a17ea80e83459c1cbf4c09a3498e8
2014-07-15 16:46:22 -07:00
Jingning Han b957439c87 Fix a potential invalid memory access in non-RD coding flow
This commit fixes a potential out-of-boundary memory access due to
the use of reuse_inter_pred_sby in the non-RD coding flow. It
resolves the corresponding asan error.

Change-Id: Iff605f5921230966990013541cd855d698810922
2014-07-11 15:50:43 -07:00
Yunqing Wang a581da218e Remove repetitive code in mcomp.c
Deleted vp9_find_best_sub_pixel_comp_tree(), and combined it in
vp9_find_best_sub_pixel_tree().

Change-Id: Ifb25763c8b19822df5537cc1daa76ce88dc3b056
2014-07-09 14:50:50 -07:00
Yunqing Wang 9bd3be69a4 Adjust full-pixel search method in real-time mode
Use FAST_HEX in speed 5 and 6, which covers more points than
FAST_DIAMOND and improves motion search quality.

At speed 6, RTC set borg tests showed slight quality gain (psnr
gain: 0.143%, ssim gain: 0.226%). No noticeable encoding speed
change.

Change-Id: Ifa62875d9a52ee382ec494f271382bb77d8c67bf
2014-07-09 12:56:25 -07:00
Jingning Han f6bf614b2f Merge "Re-design quantization process for 32x32 transform block" 2014-07-09 11:55:26 -07:00
Jingning Han 9ad1b9fc67 Re-design quantization process for 32x32 transform block
This commit enables a new quantization process for 32x32 2D-DCT
transform coefficient blocks. It improves the compression
performance of speed 5 by 1.4%. The overall compression gains of
speed 5 due to the new quantization scheme is 4.7%. It also includes
the SSSE3 implementation of the 32x32 quantization process.

Change-Id: I0855b124fd6462418683f783f5bcb44255c9993b
2014-07-08 16:55:28 -07:00
Alex Converse f60a1178c6 Cleanup motion search speed features.
* Replace max_step_search_steps with constant MAX_MVSEARCH_STEPS
* Fold (reduce_first_step_size + speed > 5) into reduce_first_step_size
  replacing uses of reduce_first_step_size that don't add the speed
  check with zero.

Change-Id: Iae46395dbf3eaca138bf4d18b838a9e364b5a198
2014-07-07 10:08:45 -07:00
Yaowu Xu 92a6db7928 Added a speed feature controlling a motion search parameter
This commit added a speed feature to control the step_param used in
full pixel motion search. The intention is to reduced the search
steps for high speed real time coding.

Change-Id: I21d2f0105c2b647783a6688615da7fcf2b6d670b
2014-07-02 09:30:43 -07:00
Yaowu Xu 82fd084b35 Merge "Re-design quantization process" 2014-07-01 19:04:01 -07:00
Jingning Han 9ac2f66320 Re-design quantization process
This commit re-designs the quantization process for transform
coefficient blocks of size 4x4 to 16x16. It improves compression
performance for speed 7 by 3.85%. The SSSE3 version for the
new quantization process is included.

The average runtime of the 8x8 block quantization is reduced
from 285 cycles -> 255 cycles, i.e., over 10% faster.

Change-Id: I61278aa02efc70599b962d3314671db5b0446a50
2014-07-01 17:00:07 -07:00
Yunqing Wang f31ff029df Elevate NEWMV mode checking threshold in real time
The current threshold is knid of low, and in many cases NEWMV
mode is checked but not picked as the best mode. This patch
added a speed feature to increase NEWMV threshold, so that
less partition mode checking goes to check NEWMV. This feature
is enabled for speed 6 and 7.

Rtc set borg tests showed:
1. Speed 6, overall psnr: -0.088%, ssim: -1.339%;
   Average speedup on rtc set is 11.1%.
2. Speed 7, overall psnr: -0.505%, ssim: -2.320%
   Average speedup on rtc set is 12.9%.

Change-Id: I953b849eeb6e0d5a1f13eacba30c14204472c5be
2014-07-01 14:50:39 -07:00
Yunqing Wang dee5782f93 Enable encode breakout in real time
For real time speed 7, once encode breakout is on(i.e. encoding
setting --static-thresh=1), a proper encode breakout threshold
is set to speed up the encoder.

Set --static-thresh=1, RTC set borg test showed a slight overall
psnr loss of 0.162%, but ssim gain of 0.287%. The average speedup
on RTC set is 6%, and for some clips, the speedup can be 10+%.

Change-Id: Id522d9ce779ff7c699936d13d0c47083de4afb85
2014-06-30 10:41:12 -07:00
Yunqing Wang 9d41313e4b Decide the partitioning threshold from the variance histogram
Before encoding a frame, calculate and store each 16x16 block's
variance of source difference between last and current frame.
Find partitioning threshold T for the frame from its variance
histogram, and then use T to make partition decisions.

Comparing with fixed 16x16 partitioning, rtc set test showed an
overall psnr gain of 3.242%, and ssim gain of 3.751%. The best
psnr gain is 8.653%.

The overall encoding speed didn't change much. It got faster for
some clips(for example, 12% speedup for vidyo1), and a little
slower for others.

Also, a minor modification was made in datarate unit test.

Change-Id: Ie290743aa3814e83607b93831b667a2a49d0932c
2014-06-30 09:36:23 -07:00
Yaowu Xu d0cb273e04 Allow encoder to set lpf level to 0
As a way to speed-up rtc encoding at speed 7.

Change-Id: Ie36a010392cf7b741dc130df21a4e733622a75b7
2014-06-27 15:23:41 -07:00
Yaowu Xu 3f92b7b994 Added a new speed 7 in rt mode
To experiment with different speed/quality compromises.

Change-Id: Ia9d4b85243554d620498a327da37c356e752b07f
2014-06-27 13:29:09 -07:00
Jingning Han 5a3e3c6d3f Adaptive txfm size selection depending on residual sse/variance
This commit enables an adaptive transform size selection method
for speed -6. It uses largest transform size when the sse is more
than 4 times of variance, i.e., most energy is compacted in the
DC coefficient. Otherwise, use the default TX_8X8. It improves
the compression efficiency for rtc set of speed -6 by 0.8%, no
speed change observed.

Change-Id: Ie6ed1e728ff7bf88ebe940a60811361cdd19969c
2014-06-26 16:00:42 -07:00
Jingning Han 2aa50eafb2 Make non-RD intra mode search txfm size dependent
This commit fixes the potential issue in the non-RD mode decision
flow that only checks part of the block to estimate the cost. It
was due to the use of fixed transform size, in replacing the
largest transform block size. This commit enables per transform
block cost estimation of the intra prediction mode in the non-RD
mode decision.

Change-Id: I14ff92065e193e3e731c2bbf7ec89db676f1e132
2014-06-25 18:52:18 -07:00
Yunqing Wang bccc785f63 Merge "Reuse inter prediction result in real-time speed 6" 2014-06-25 08:18:33 -07:00
Yunqing Wang 0aae100076 Reuse inter prediction result in real-time speed 6
In real-time speed 6, no partition search is done. The inter
prediction results got from picking mode can be reused in the
following encoding process. A speed feature reuse_inter_pred_sby
is added to only enable the resue in speed 6.

This patch doesn't change encoding result. RTC set tests showed
that the encoding speed gain is 2% - 5%.

Change-Id: I3884780f64ef95dd8be10562926542528713b92c
2014-06-24 12:46:33 -07:00
Paul Wilkins 8160a26fa0 Fix some bugs in multi-arf
Fix some bugs relating to the use of buffers
in the overlay frames.

Fix bug where a mid sequence overlay was
propagating large partition and transform sizes into
the subsequent frame because of :-
  sf->last_partitioning_redo_frequency  > 1 and
  sf->tx_size_search_method == USE_LARGESTALL

Change-Id: Ibf9ef39a5a5150f8cbdd2c9275abb0316c67873a
2014-06-24 13:07:48 +01:00
Jingning Han 48b8ce21f0 Merge "Allow key frame more flexibility in mode search" 2014-06-20 09:38:02 -07:00
Jingning Han c99a8fd7c8 Allow key frame more flexibility in mode search
This commit allows the key frame to search through more prediction
modes and more flexible block sizes. No speed change observed. The
coding performance for rtc set is improved by 1.7% for speed -5 and
3.0% for speed -6.

Change-Id: Ifd1bc28558017851b210b4004f2d80838938bcc5
2014-06-19 14:47:12 -07:00
Yunqing Wang 55834d42cc Modify non-rd intra mode checking
Speed 6 uses small tx size, namely 8x8. max_intra_bsize needs to
be modified accordingly to ensure valid intra mode checking.
Borg test on RTC set showed an overall PSNR gain of 0.335% in speed
-6.

This also changes speed -5 encoding by allowing DC_PRED checking
for block32x32. Borg test on RTC set showed a slight PSNR gain of
0.145%, and no noticeable speed change.

Change-Id: I1502978d8fbe265b3bb235db0f9c35ba0703cd45
2014-06-18 11:38:44 -07:00
Dmitry Kovalev 4ff1a614f1 Adding MV_SPEED_FEATURES struct.
Moving all motion vector related speed parameters from SPEED_FEATURES to
MV_SPEED_FEATURES.

Change-Id: I3e9af0039c7162f8671878c5920bce3cb256a84e
2014-06-12 14:15:27 -07:00
Dmitry Kovalev 22368479c0 Merge "Removing chessboard_index from SPEED_FEATURES." 2014-06-10 10:53:53 -07:00
Yunqing Wang b04d766800 Use small transform size in non-rd real-time mode
In non-rd real-time mode, choosing smaller transform size in
encoding gives better video quality and good speed gain than
choosing larger transform size. This patch set tx size search
method to ALLOW_8X8, which is better than using 4x4 or other
larger sizes.

Borg tests on rtc set at speed 6 showed significant gain on quality.
PSNR gain: 11.034% and SSIM gain: 15.466%.

The speed gain is 5% - 12% for <720p clips, and 2% - 7% for
720p clips.

Change-Id: If4dc74ed2df359346b059f47fb73b4a0193ec548
2014-06-09 08:26:50 -07:00
Dmitry Kovalev 923c30a174 Removing chessboard_index from SPEED_FEATURES.
This is not a speed feature, adding inline function instead.

Change-Id: Ia48c41802eec9e92cf990339d724097279695c9a
2014-06-05 18:17:54 -07:00
Dmitry Kovalev bd0bb363bd Removing lossless field from VP9EncoderConfig.
Right now there is just one place to check: xd->lossless and for the first
pass there is a function is_lossless_requested().

Change-Id: I949a6834e64ce51e422e2892f097f2b871b5429a
2014-06-03 12:52:49 -07:00
Dmitry Kovalev 5132e6da1a Merge "Converting disable_inter_mode_mask to inter_mode_mask." 2014-05-31 00:08:45 -07:00
Dmitry Kovalev 403719963e Converting disable_inter_mode_mask to inter_mode_mask.
Making this consistent with intra mode masks: you need to specify
allowed inter/intra modes to use.

Change-Id: Iaecd28bf79047259707d8e7a59a57bb7b856383e
2014-05-29 12:25:41 -07:00
Dmitry Kovalev 26bdf26ddc Consistent names for intra mask flags.
Change-Id: Ibdd5255d37200fb8a1d50f71a2a49c6089ae21e7
2014-05-29 12:11:02 -07:00
Dmitry Kovalev d262cda524 Making speed checks consistent in set_rt_speed_feature().
Change-Id: Id3d0a49836fe996b806707d29a8130acf9d7ea0e
2014-05-29 11:11:50 -07:00
Alex Converse b9c24dfa23 Always partition check after keyframe (rt speed 5)
Prevents too small partitions from being copied to the next frame.

Change-Id: I4b97c30b27d06051574d54aaaca5434407a0c9ff
2014-05-22 16:51:06 -07:00
Yaowu Xu 04cf82fb04 Merge "Enable various thresholds of motion detection" 2014-05-22 09:09:42 -07:00
Yaowu Xu 3bda7ec1ba Enable various thresholds of motion detection
This commit changed to enable the encoder to adjust motion dection
speed threshold based on picture size. In addition, cpu-used 1 now
does a partition search every other frame instead of every third
frame for low resolution inputs.

The change has no quality/speed impact for 720p and above. Test
showed the change increase encoding time by between 3% to 6% for
cpu-used 2 encodiong of 360p sequences. It also has a compression
gain about .3%.

For cpu-used 2, the change resolved some very disturbing visual
artifacts in certain sequences when large block partitionings and
transforms are used as a result of copying the partition from a
previous frame.

Change-Id: Ic7fd22508cdb811d4ca935655adbf20109286cfa
2014-05-21 12:08:56 -07:00
Yunqing Wang b91b146d1d Add static-threshold skipping in non-rd mode
Added a skipping test in non-rd inter-mode. After interpolation
prediction step, the residuals are tested to see if they will be
quantized to 0 based on modeling between spatial domain and
frequency domain.

Set static-thresh to 800 for >=720p and 300 for <720p, rtc set
tests showed
1. Speed 5, psnr: -0.514%; ssim: -1.748%;
   speedup on related clips: 5% -11%
2. Speed 6, psbr: -0.628%; ssim: -1.637%;
   speedup on related clips: 4% - 9%

Change-Id: I62fbf26bc043ecd2b584f255f1a4ee5ab52bfcf3
2014-05-19 11:47:13 -07:00
Jingning Han ace194a059 Merge "Chessboard pattern prediction filter type search in non-RD coding" 2014-04-23 12:48:27 -07:00
Jingning Han 8969f7c892 Chessboard pattern prediction filter type search in non-RD coding
This commit introduces a chessboard pattern search for the prediction
filter type search. It runs extensive search in alternate blocks and
allows the rest blocks to refer coding decisions of their nearby
neighbors.

For pedestrian 1080p at 4000 kbps, the runtime of speed -5 goes down
from 43990 ms to 42200 ms. The overall compression performance for
RTC set is changed by -1.37%.

Change-Id: Icfe220c49451cda796f0ca91d935c9ed01e56c9d
2014-04-23 10:41:07 -07:00
Dmitry Kovalev ef003078e8 Renaming "onyx" to "encoder".
Actual renames:
  vp9_onyx_if.c -> vp9_encoder.c
  vp9_onyx_int.h -> vp9_encoder.h

Change-Id: I80532a80b118d0060518e6c6a0d640e3f411783c
2014-04-22 14:57:05 -07:00
Yaowu Xu d928b34efe Allow full RD TX size search for GF/ALT at speed 2
For speed 3 and above, such search is only allowed at speed 3.
The change helped cif and stdhd set by 1.2% and .7% in compression,
but increased the encoding time by around 5%.

Change-Id: Ifa4832327f1c1bef3decb032ceb769cbf50e059f
2014-04-21 12:31:46 -07:00
Dmitry Kovalev 07f86d0944 Renaming VP9_CONFIG to VP9EncoderConfig.
Change-Id: Id48edd12c6f649c82113128491ef6ea7410e93b2
2014-04-18 11:01:36 -07:00
Dmitry Kovalev 2c8c1f5370 Replacing cpu_used with speed in VP9_CONFIG.
Change-Id: I86b85b5c11388e84a48f8936330c0d920df5d1f0
2014-04-16 18:31:42 -07:00
Dmitry Kovalev 617a367c54 Merge "Consistent mode names." 2014-04-15 22:59:37 -07:00
Dmitry Kovalev e58ea39fd0 Merge "Using anonymous enum instead of macros." 2014-04-15 10:25:14 -07:00
Dmitry Kovalev c1981bdda0 Using anonymous enum instead of macros.
Change-Id: I5ed360585dae2c9fea6c32058dbfb8ec07700677
2014-04-14 15:11:13 -07:00