Add a full range motion search for regular block sizes. This runs
exhaustive search within the given reference area. This commit further
optimizes the search process by combining 4 points test into one
pipeline, which gives 30% speed-up as compared to run each individual
point at a time.
This full range search serves as a best possible motion search reference.
When replacing the diamond search with full range search, the speed 0
runtime of bus CIF at 2000 kbps goes from 153872ms to 623051ms. The
compression performance compared to speed 0 setting gains 0.585% for
derf set.
Change-Id: Ieef1225216b0b86b4ac4872fa7fb9e18bf2eabb3
Removed an adaptive rate correction factor that was having
a negative impact on quality in many clips. This factor
was influencing the Q range available to each frame
independently of the bits allocated to each.
Average results with DISABLE_RC_LONG_TERM_MEM.
derf +0.199, -0.059.
yt +3.957, +3.798
std hd +1.577, +2.140
yt hd +4.127, +4.513
Average results without DISABLE_RC_LONG_TERM_MEM
derf -0.628, -0.665
yt +3.432, +3.015
std hd -0.105, +0.153
yt hd +3.432, +3.015
Change-Id: I45bab6b606f49a442e7b27a6d631f3ffd843bbce
Includes various cleanups.
Streamlines the interfaces so that all rate control state
updates happen in the vp9_rc_postencode_update() function.
This will hopefully make it easier to support multiple
rate control schemes.
Removes some unnecessary code, which in rare cases can casue
a difference in the constrained quality mode output, but
other than that there is no bitstream change yet.
Change-Id: I3198cc37249932feea1e3691c0b2650e7b0c22fc
This commit makes the coefficient tree initialized prior to token
initialization, where the coefficient costs are filled out according
to the probabilities associated with coefficient value categories.
Change-Id: If4e89c3923058376f8382c683fe4a225a4a38af3
This commit fixes the intra prediction reference source selection
in the settings of skip_encode. Use original boundary pixels as
prediction reference, when the inverse transform and reconstruction
are skipped in the per block size rate-distortion optimization loop.
Change-Id: I36081aa30aa46e203e0e6f4e8a420fd08269469a
This commit fixes the use of uv_intra_estimate by properly restoring
the mode_info struct required by rd_pick_intra_sbuv_mode.
Change-Id: I6a156d79533c4e2e60dfd3b8c5bb0a42a8eca280
The idea here is to allow "in frame" adjustment of the final Q
value used to encode each SB64, using segmentation.
There is also adjustment of the rd mult in regions of overspend.
Activated using aq_mode=2
Change-Id: I2f140cd898c9f877c32cd6d2e667f5e11ada4b1c
When calling check_initial_width through vp9_set_size_literal
the function was defaulting to using non-subsampled chroma.
This patch changes the default to assume sampled chroma as
an interim solution until complete support for other
color formats is added.
Change-Id: Id8e7e919b350e3473dfdf7551af6fd0716478b04
This commit takes out vp9_extend_frame_borders from
vp9_setup_scale_factors.
The refactoring is for the preparation of the use of lazy border
extension at decoder. This makes it necessary to handle border
extension separately at encoder/decoder. The use of
vp9_extend_frame_borders will be removed, when lazy border extension
is ready.
Change-Id: Ia3baba3d179d5f11eee1634f19b3b319d2a59186
Moves all rate control variables to a separate structure,
removes some currently unused variables,
moves some rate control functions to vp9_ratectrl.c,
and splits the encode_frame_to_data_rate function.
Change-Id: I4ed54c24764b3b6de2dd676484f01473724ab52b
Simplifies the code by implementing band mapping with static arrays.
A lot of the code complexity introduced in a previous patch
disappears.
Change-Id: Ia3fac36e594fb5ad2d55ae141c58bba4c55c2d28
The switch to the rate-correction damping factor
in https://gerrit.chromium.org/gerrit/#/c/67536/ was not conditioned on CBR mode.
Change-Id: I2326704e8ac030a4f7b592dd3fedb94c7dd0644d
Overall change (using dual buffer scheme for superblocks of both inter
and intra modes) reduces speed 2 runtime:
bluesky_1080p at 6000kbps: 263553ms -> 257441ms
riverbed_1080p at 8000kbps: 233230ms -> 225308ms.
Change-Id: Idf8d70f768a4b0d97b2a8506372c57b7b4022119
Implements scan order to band map with arrays in both the encoder
and decoder to remove conditional statements.
Encoding seems to be about 1% faster at speed 0, tested on football.
Decoding seems to be about 0.5-1% faster on a set of 25 videos.
Change-Id: Idb233ca0b9e0efd790e30880642e8717e1c5c8dd
This commit enables the dual buffer rate-distortion optimization
and encoding scheme. It stacks the original transform coefficients,
quantized levels, and reconstructed coefficients, in the rate-
distortion optimization search process, hence eliminates the need
to re-run residual generation, forward transform, and quantization
in the encoding stage.
Change-Id: I011bfad3a59a380a869ee552e91dae0394ec492e
Allocate memory space of dual buffer sets that store the coeff, qcoeff,
dqcoeff, and eobs. Connect the pointers of macroblock_plane and
macroblockd_plane to the actual buffer in use accordingly.
Change-Id: I2f0b5f482ca879fae39095013eaf8901db20a5a4
Make the macroblockd_plane contain dynamic buffer pointers instead
static pointers to the memory space allocated therein. The decoder
uses the buffer allocated in pbi, while encoder will use a dual
buffer approach for rate-distortion optimization search.
Change-Id: Ie6f24be2dcda35df7c15b4014e5ccf236fb3f76c
SVC multiple layer per frame encoding is invoked with vpx_svc_init and
vpx_svc_encode. These interfaces are designed to be invoked from ffmpeg.
Additional improvements:
- make dummy frame handling a bit more explicit
- fixed bug with single layer encodes
- track individual frame sizes and psnrs instead of averages
- parameterized quantizer, 16th scalefactors, more logging,
- enabled single layer encodes to generate baseline
- include new mode for 3 layer I frame with 5 total layers
Change-Id: I46cfa600d102e208c6af8acd6132e0cc25cda8d4
Removed:
goldfreq, avg_encode_time, avg_pick_mode_time,
cpu_freq, interquantizer
member variables from VP9_COMP since they are no longer
used in the code.
Change-Id: I010a82c217d0da03c3f53d1858d3462190c12dcf
Removed three members from the VP9_COMP data structure:
inter_zz_count, gf_bad_count, gf_update_recommended.
These were part of the VP8 real-time mode implementation
that was removed from the initial VP9 codecbase.
Change-Id: I866b083b88ef02c74837277d50ce532ca88492f3
-Don't reduce maxQ for gold/alt in CBR mode.
-Fix to min/maxQ for first/initial key frame.
-Add more speeds to datarate test and reduce the starting bitrate for test.
Change-Id: Id2a333d76dd3f6a51b322ca984588e2a22159c58
For consistency with idct function names. Renames:
vp9_short_fdct4x4 -> vp9_fdct4x4
vp9_short_walsh4x4 -> vp9_fwht4x4
Change-Id: Id15497cc1270acca626447d846f0ce9199770f58
The pointer was asigned only once with vp9_regular_quantize_b_4x4, calling
this function directly now. Also removing unused declarations:
prototype_quantize_block
prototype_quantize_block_pair
prototype_quantize_mb
vp9_regular_quantize_b_4x4_pair
vp9_regular_quantize_b_8x8
Change-Id: I14325bc2f082336820671eafbc06126651b79f73
Just making fdct consistent with iht/idct/fht functions which all use
stride (# of elements) as input argument.
Change-Id: I0ba3c52513a5fdd194f1e7e2901092671398985b
This 2-pass rate control setting allocates bits based
on first pass stats to each kf group, gf group and individual
frame but does not correct the bits left and allocation after
each frame.
In other words it recommends a bit allocation for each frame
but does not try and correct any over or under spend on a
frame over the remainder of the clip. This reduces the accuracy
of rate control in terms of hitting an average bitrate but prevents
problems that may arise because early frames either use to many
or too few bits. This mode is currently more inclined to undershoot
than overshoot (particularly at higher data rates).
Also minor changes to rate of adaption when recode loop is not
enabled.
This mode is currently enabled by default for VBR.
It gives the following % performance gains.
derf +0.467, +1.072
yt 2.962, 2.645
stdhd 1.682, 1.595,
yt-hd 2.3, 2.174
Change-Id: I3c84a9bf8884e5b345698ff0e19187f792c2f3a0
Delta reduced because of concern about popping on some
very hard clips.
Also allow some frame recode at speed 2 for kf/gf/arf.
Change-Id: Ib47dff42da41aa6eec83b7285fcaaca24abb851e
This commit makes the buffer allocation of zcoeff_blk array in
pick_mode_context block size aware. It calculates the number of
4x4 blocks in the partition and assigns the memory space accordingly.
This process (and the uninitialization) is done once for each encoding
pass. It allows memory copy of smaller buffer when possible.
For football at 600kbps, the runtimes improve by about 1%:
speed 1, 45961ms -> 45472ms
speed 2, 23863ms -> 23598ms
Change-Id: Id2ca24906fa89f46fa5fe742ec4b8efc2a61f877
The only case where they were intentionally pointing to different
structures was in mbgraph, and this didn't have the expected behavior
because both of these pointers are used interchangeably through the code
Change-Id: I979251782f90885fe962305bcc845bc05907f80c
This should be similar to what x264 does with --aq-mode 1.
It works well with clips like parkjoy and touhou
(http://x264.nl/developers/Dark_Shikari/LosslessTouhou.mkv).
At low bitrates, the segmentation signaling overhead may negate the
benefits of this feature.
(PGW) Default changed to feature OFF to allow provisional merge.
Change-Id: I938abf9bb487e1d4ad3b0264ea03d9826275c70b
Updated the encoder to handle frames that are coded
intra-only. Intra-only frames must be non-showable,
that is, the "show frame" flag must be set to 0 in
the frame header.
Tested by forcing the ARF frames to be coded intra-
only.
Note: The rate control code will need to be modified
to account for intra-only frames better than they
are currently handled.
Change-Id: I6a9dd5337deddcecc599d3a44a7431909ed21079
Remove the semicolon in the definition of vp9_zero macro. Make all
the use cases of vp9_zero of consistent format.
Change-Id: Ibaf9751e8595872b12766381a93d185a4d90df8f
The commit changes to mask available intra prediction modes for test
based on prediction block size.
With this patch, encoding time of CpuUsed 2 reduces from 10% to 20% for
HD clips with a compression drop of 0.2%
Change-Id: I65f320f1237c0f5ae3a355bf7caf447f55625455
When the codec in VBR (or cq) mode hits its max q limits and is
struggling to hit a target bandwidth, the bit target per frame collapses.
In the first instance normal frames cap out at the maximum allowed
Q and then the ARF and GFs do the same. This latter behavior is not
generally desirable as GFs and ARFs are only effective from a quality
and data rate perspective if they have at lease some level of -Q delta
compared to the surrounding frames.
In this patch I define a separate max Q for GFs and ARFs that is
derived from but somewhat lower than that defined for normal frames.
In effect there is a minimum Q delta that will always be available for
GFs and ARFs regardless of the target rate and MAXQ setting.
This may of course mean that the absolute lowest rate obtainable for
a given clip is somewhat higher.
Change-Id: I268868b28401900d0cd87e51e609cd3b784ab54a
Use b_mode_info to store the inter prediction mode of sub8x8 block,
in replacement of the use of partition_info. Remove redundant buffer
update for partition_info. For bus_cif at 2000 kbps, this seem to make
speed 0 about 1% faster.
Change-Id: Id1b3be45e75a24fb4b42335ac480c23e440978f6
We already have itxm_add member in MACROBLOCKD structure. Both
inv_txm4x4_1_add and inv_txm4x4_add are just its special cases for
different eob values. But eob logic is already implemented in
vp9_iwht4x4_add and vp9_idct4x4_add (that's why also removing
inverse_transform_b_4x4_add).
Change-Id: I80bec9b6f7d40c5e5033c613faca5c819c3e6326
For CpuUsed 1 & 2, this commit allow to skip retangular partition check
when NONE is better than SPLIT. It also changed to allow such logic
on alt ref frame coding rather than use square partition all them. The
change has gain compressio about .3% on yt and ythd for both 1&2, It
helped .6% compression on cif and stdhd for both CpuUsed 1&2.
Change-Id: I814b653baf89f59acd20e042629a12938a1bd4e5
This commit allows sub8x8 intra modes test in the rate-distortion
loop for hd sequences in speed 1 and 2.
For sequence y90n of hd set at 8000 kbps, speed 2 runtime goes
from 207s to 210s. For ped_1080p at 3000 kbps, speed 2 runtim goes
from 336s to 337s. Both are running with 300 frames.
This improves compression performance by 0.24% for stdhd and 0.32%
for hd.
Change-Id: I173ca38a6411565ae6cfadd184c42b2070c5de1f
The idea is to have the following names for each transform size:
vp9_idct4x4_add
vp9_idct4x4_1_add
vp9_idct4x4_10_add
vp9_idct4x4_16_add
vp9_idct8x8_add
vp9_idct8x8_1_add
vp9_idct8x8_10_add
vp9_idct8x8_64_add
etc for 16x16, 32x32
The actual list of renames in this patch:
vp9_idct_add_lossless -> vp9_iwht4x4_add
vp9_short_iwalsh4x4_add -> vp9_iwht4x4_16_add
vp9_short_iwalsh4x4_1_add -> vp9_iwht4x4_1_add
vp9_idct_add -> vp9_idct4x4_add
vp9_short_idct4x4_add -> vp9_idct4x4_16_add
vp9_short_idct4x4_1_add -> vp9_idct4x4_1_add
Change-Id: I6f43f7437c68dd30cdd05d72e213765578ed30b1
Speed 4 still does not give a big gain over speed 3.
This just cleans it up a little from the last patch and comments
out features that do not seem to be giving much benefit.
Change-Id: I5f366e6160e1dbe5dc45cf5eb90cc02712baa1b6
Allow selective masking of individual split modes rather than
just a single on / off flag.
For speed 2 recovers the large speed loss seen for some derf
clips in change Ie6bdfa0a370148dd60bd800961077f7e97e67dd4
and a small quality gain.
For speed 1 10 % speed increase observed locally on some derf clips
for minimal quality change.
Change-Id: If86191087b93cbc05351c26c60c7933e2149e485