Handle the non-420 case and set uv_width.
This is needed to get the correct colorspace information out of
vp9e_get_preview().
Change-Id: I62ce118cd7082708d812deb0843c1be87582e0fe
This commit removes the use of best_mv in the decoding process. This
variable can be replaced with nearest_mv. It saves a few cycles on
assigning the values for best_mv.
Change-Id: Ic183f9c1fb615c54efd7e6ccfedcf09d493435e4
This commit setups a test framework for real-time coding. It enables
a light motion search for non-RD mode decision purpose.
Change-Id: I8bec656331539e963c2b685a70e43e0ae32a6e9d
Calculate the skip_coeff as part of the encode process, rather than
checking the eobs after the fact with another pass.
Change-Id: Ib41b139e96a97dee30e4b993b4cc53d86337128d
Fixes assert that fails occasionally on small values of
max-key frame intervals. Also, adds a small change on
updating frames_to_key for frame drops.
Change-Id: Icc2b33b25e3e4ced7e49f8db73e0a887ef9c99e0
Applies an upper limit on burst bitrate for any
frame. This is to insure that typical encodes for YT
do not produce frames that are so large that they
risk stalling HW implementations. Such frames
could also cause playback problems in SW.
For now the limit is set at 250 bits per MB for 1080P
and larger (with the 1080P limit used for smaller frames).
Setting maxQ, constant quality mode or targeting a
very high bandwidth will have precedence over this limit.
Change-Id: Ie6f776c38b06ac7cec043d034085f4b79ee46a38
Reference frame masking helped good quality mode to gain about 5% in
encoding speed, this commit enable it for rt mode to gain the speed
improvement.
In addition, this commit move the speed feature setup to a separate
function.
Change-Id: I015e8f78bbb21dd43ae183b9b9355bea2ccda9c5
To reduce pulsing we now allow an arf just before forced key frames
and at the end of a clip or section (which may be stitched to
another clip or section). However, this does not make sense for
key frames arising from real scene cuts.
Change from original patch reflects other recent changes in regard
to alignment of gf/arf and kf groups.
Change-Id: I074a91d1207e9b3e28085af982f6718aa599775f
Introducing calc_psnr() which calculates psnr between two yv12 buffers.
Previously we incorrectly used width/height instead of
crop_width/crop_height to calculate number of samples -- fixed.
Change-Id: Iecda01980555de55ad347e0276e6641c793fa56c
Added comments to explain what the various speed features do, and removed
1 that was clearly unused.
Change-Id: Icd37a536072ddafedbfaefcecbe48979f6d10faf
This funtion initializes buffer pointers and first stage motion vector
prediction. It will be needed by both regular rate-distortion
optimization loop and the non-RD mode decision. Hence move its
declaration in vp9_rdopt.h
Change-Id: I64e8b6316c9d05f20756a62721533a2e4d158235
This commit allows encoder to compare the SAD cost associated with
the best motion vector predictor, per frame. If one reference frame
has this cost more than 4 times of the best SAD cost given by other
reference frames, skip NEARESTMV, NEARMV, ZEROMV mode check of this
reference frame.
This setting is turned on in speed 2 and above. Compression quality
change in speed 2:
derf -0.014%
yt -0.097%
hd -0.023%
stdhd 0.046%
It reduces the speed 2 runtime of test sequences:
pedestrian_area_1080p 4000 kbps 310763 ms -> 303595 ms
bluesky_1080p 6000 kbps 259852 ms -> 251920 ms
Change-Id: I7f59cf79503d51836d61d56d50dc5bdf0e502e22
Under a configuration change, where the bitrate suddenly decreases,
the buffer level may be larger than maximum allowed (for that first frame to be encoded after change_config).
This change keeps it clipped to its maximum level.
Change-Id: I4d0b5b3d1fd8148600dd39e02bd630c9464baba5
1. Made speed choices to be progressive
2. Adjusted rt speed settings to achieve better speed/quality
Overall, rt-5 gained 2.5% in compression/quality, encoding time of 720p
niklas clip goes from 137,052ms to 121,874ms
Change-Id: Ia6e7e1e15225395a868a2f1059c3db8e266e1600
This commit further optimizes SSE2 operations in the second 1-D
inverse 16x16 DCT, with (<10) non-zero coefficients. The average
runtime of this module goes down from 779 cycles -> 725 cycles.
Change-Id: Iac31b123640d9b1e8f906e770702936b71f0ba7f
Optimizing all SSSE3 assembly for convolution:
1. vp9_filter_block1d4_h8_sse2
2. vp9_filter_block1d8_h8_sse2
3. vp9_filter_block1d16_h8_sse2
4. vp9_filter_block1d4_v8_sse2
5. vp9_filter_block1d8_v8_sse2
6. vp9_filter_block1d16_v8_sse2
my optimization include:
-processing 2x8 elements in one 128 bit register instead of processing
8 elements in one 128 bit register.
-removing unecessary loads.
This optimization gives between 2.4% user level gain for 480p input
and 1.6% user level gain for 720p.
This Optimization done only for 64bit.
Change-Id: Icb586dc0c938b56699864fcee6c52fd43b36b969
This commit is the first patch optimizing SSE2 implementation of inverse
16x16 DCT with <10 non-zero coefficients. It focused on the first 1-D (row)
transformation. It exploits the fact that only top-left 4x4 block contains
non-zero coefficients, in a 2-D inverse 16x16 DCT with <10 coeffients.
The average runtime of idct16x16_10 unit is reduced from
883 cycles -> 779 cycles (12% faster).
For pedestrian_area_1080p 300 frames at 4000 kbps, the speed 2 runtime goes
down from 310651 ms -> 305910 ms. The decoding speed goes up from
80.37 fps -> 80.87 fps.
Change-Id: Ic6f3ac5a637a76c07ba73ddaafe318a699fea645
Optimizing the variance functions: vp9_variance16x16, vp9_variance32x32,
vp9_variance64x64, vp9_variance32x16, vp9_variance64x32,
vp9_mse16x16 by migrating to AVX2
some of the functions were optimized by processing 32 elements instead of 16.
some of the functions were optimized by processing 2 loop strides of 16
elements in a single 256 bit register
This optimization gives between 2.4% - 2.7% user level performance gain
and 42% function level gain.
Change-Id: I265ae08a2b0196057a224a86450153ef3aebd85d
Fix miss alignment of the frames contributing to the
error score and bit allocation for gf/arf groups.
Initial results slightly +.
Change-Id: Ie508bdcfdac52e592d48e1f13e01b3551b523deb
Some cleanups on frames_to_key, frames_since_key.
Also removes the unused fixed_q parameters in vp9.
Change-Id: If8743a32c71de30a8d17136477b53d607a7acda8
The previous implementation stops motion vector prediction test when
the zero motion vector appears for the second time. This commit fixes
it by simply skipping the second time check on zero mv and continuing
on to next mv candidate.
It slightly improves stdhd in speed 2 by 0.06% on average. Most static
sequences are not affected. A few hard ones, like jet, ped, and riverbed
were improved by 0.1 - 0.2%.
Change-Id: Ia8d4e2ffb7136669e8ad1fb24ea6e8fdd6b9a3c1
The feature undergoes prior assumption that the recursive partition
size search from 4x4 to 64x64, hence utilizing information from small
blocks to determine early termination in large block rate-distortion
optimization search. The current codebase is now going from top down.
The previous function might go with not properly initialized values,
hence removed.
Tested on pedestrian_area_1080p at 4000 kbps running under speed 2.
No visible difference in runtime observed.
Change-Id: I553df415c6191413762db7ae34e8790c71d8118e
This patch sets frame types correctly in the new
vp9_get_second_pass_params() function called prior
to encode_frame_to_data_rate() function, so that the
latter function can just work with what is passed to
it. This will allow multiple vp9_get_second_pass_params()
to be created for various encode strategies without
messing with the core encode function.
There is no difference in derf and yt. stdhd/hd are pending.
Change-Id: I70dfb97e9f497e9cee04052e0e8e0c2892eab0c3
This commit adds input/output ports for IDCT8_1D macro function to
provide more flexibility in variable use. It allows to skip several
buffer swap operations.
Change-Id: I21f3450509537322293043b3281bfd3949868677
Adding RefBuffer to simplify reference buffer management. The struct has a
pointer to image data and scale factors relative to the current frame.
Change-Id: If38eb1491ff687cc11428aee339f3e052e2c5d9e
This commit merges the initial buffer swap operations in idct8_1d_sse2
into the array transpose step, hence reducing number of instructions
therein.
Change-Id: I219f6f50813390d2ec3ee37eecf2a4a2b44ae479
This commit optimizes the SSE2 implmentation of idct8x8_10. It exploits
the fact that only top-left 4x4 block contains non-zero coefficients,
and hence reduces the instructions needed.
The runtime of idct8x8_10_sse2 goes down from 216 to 198 CPU cycles,
estimated by averaging over 100000 runs. For pedestrian_area_1080p 300
frames coded at 4000kbps, the average decoding speed goes up from
79.3 fps to 79.7 fps.
Change-Id: I6d277bbaa3ec9e1562667906975bae06904cb180
In two pass encodes bits are allocated to each frame
according to a modified error score for the frame as a
fraction of the modified error score for the clip or section.
Previously a minimum rate per frame was reserved and
subtracted from the bits allocatable by the two pass code.
The vbr max section rate was enforced by clipping the
actual number of bits allocated.
In this patch the min and max vbr rates are enforced
instead by clipping the modified error scores for each frame
rather than the number of bits allocated.
Small gains for all test sets (psnr and SSIM) ranging from
~ +0.05 for YT psnr up to ~ +0.25 for Std-hd SSIM.
Change-Id: Iae27d70bdd3944e3f0cceaf225bad2e8802833de
vpx_codec_vp9x_cx is not used internally. Experimental flag from
vp9_extracfg is also not really used. YUV 4:4:4 just works after these
changes (you have to specify --profile=1 for the encoder).
Change-Id: Ib1c8461d0d19d159827e005efe868f891eea0140
This commit takes a preliminary attempt to refine the motion search
control. It detects the SAD associated with mv predictor per reference
frame, and based on which to determine whether the encoder wants to
reduce the motion search range (if the predicted mv provides fairly
small SAD), or to skip the current reference frame (if there exists
another ref frame that gives much smaller SAD cost).
This feature is turned on in the settings of speed 1 and above.
In speed 1, compression performance changed
derf -0.018%
yt -0.043%
hd -0.045%
stdhd -0.281%
speed-up
pedestrian_area_1080p at 4000 kbps 100 frames
199651ms -> 188846ms (5.5% speed-up)
blue_sky_1080p at 6000 kbps
443531ms -> 415239ms (6.3% speed-up)
In speed 2, compression performance changed
derf -0.026%
yt -0.090%
hd -0.055%
stdhd -0.210%
speed-up
pedstrian 113949ms -> 108855ms (4.5% speed-up)
blue_sky 271057ms -> 257322ms (5% speed-up)
Change-Id: I1b74ea28278c94fea329d971d706d573983d810d
Buffer the SSE of prediction residuals in the rate-distortion
optimization loop of a given block. This information would be used
for later encoding control.
Change-Id: If4e63f3462490513c48be9407d3327c8dd438367
Moving back to scale_factors struct. We don't need anymore x_offset_q4 and
y_offset_q4 because both values are calculated locally inside vp9_scale_mv
function.
Change-Id: I78a2122ba253c428a14558bda0e78ece738d2b5b
Before mv scaling it is required to calculate x_offset_q4/y_offset_q4
by calling set_scaled_offsets(). Now offset configuration can not be
missed because it happens just before scale_mv().
Change-Id: I7dd1a85b85811a6cc67c46c9b01e6ccbbb06ce3a
Various cleanups and streamlining of interfaces as precursor
to further advancements in rate control.
Pre-encode parameter setting for different use cases:
One-pass, first of 2-pass, second of 2-pass, and Svc
are separated out.
There is no change in output with this change.
Change-Id: Ied8ca7d84d610993776aa30ef263fe20452e0e3e
Take account of the fact that the overlay frame is usually
very cheap so distribute target bits among the other frames.
Change-Id: I120685122e8cbbe75da8d07d02932f7877059867
This will hurt metrics in some cases (particularly for static
clips at low data rates where there is extra overhead, but it
helps smooth transitions around forced key frames between
stitched kf sections.
Change-Id: I7e1026ae0de6c77bba863061e115136d7f283cc0
Slightly reduces the mean tendency to undershoot target
rate in vbr, especially when using the memory less mode
and when recodes are disabled.
The effect is primarily at low q.
Change-Id: I59a593b99522cc7da31b4134d1c8a65f5b7b7c53
This commit reworks the prediction filter rate-distortion cost update
process consistent for all block sizes.
Change-Id: I5874349ab38df380240f96c2d4ef924072bab68d
Guard against incorrect size values moving *data past data_end.
Check read length against the difference of the buffers.
Change-Id: Ie0b54e2db517fd41a0f3ceb23402ee44839a4739