- Inherited base class TransformTestBase to derived class VP10Trans8x8HT.
- Employed RunCoeffCheck() to test vp10_fht8x8_sse2() against C reference
function vp10_fht8x8_c().
- fdst8_sse2() related seven hybrid transform cases are covered in this
test.
- Test passed (4 test cases w/o EXT_TX; 16 test cases with EXT_TX).
Change-Id: Id9a9b308c707164a120d9ceb2c30e572026fb1d0
Inherited class TransformTestBase to derived class VP10Trans4x4HT.
Employed RunCoeffCheck() to test vp10_fht4x4_sse2() against
C reference vp10_fht4x4_c().
fdst4_sse2() related seven hybrid transform cases are covered
in this test.
Wrote a header file for test base class. Some modification to
make sure the base class can be used for 8x8, 16x16, 32x32 cases.
All related tests passed.
Change-Id: I6b19a39d3ea30b657847781e78e73b829998a57a
Make the RANS implementation operate on cumulative distribution
functions rather than individual probability distribution functions.
CDFs have shown themselves more flexible to work with.
Reduces decoding memory usage from scaling O(num_distributions *
symbol_resolution) to O(num_distributions).
No bitstream change. This is an purely implementation change.
Change-Id: I4e18d3a0a3d37a36a61487c3d778f9d088b0b374
Use the superframe counter to set the key frame, and force
it to the key frame on base spatial layer only.
Also, update svc frame counters under frame dropping.
Update unittest: add specific tests with short key frame period.
https://bugs.chromium.org/p/webm/issues/detail?id=1150
Change-Id: I5b1c9a09253e6e5fbfce51b4cf603ae22d422b01
For 1 pass cbr mode: allow for two-stage 1:2 scaling
(which will use the 1:2 optimized scaler) if the spatial
layer is 1/4x1/4 of souce.
Without this change, the base layer for 3 spatial layers would
be using the non-normative scaler which is un-optimized/C code.
Change-Id: I9d73f92a4a96927d0f1d6bf75315c1e60513226a
restore the value for VP9 to 9999 to satisfy the current test
expectations; without this
VP9/DatarateTestVP9Large.ChangingDropFrameThresh/8 will overshoot.
Change-Id: I88dad574ae4ab10f923579824c7347ff468c7045
Includes various cosmetic changes and refactoring including
naming the sharp filters differently (since they are no longer
8-tap).
Change-Id: Ida5a19ca0daa9f6a64a6734394c685b2a4a2564a
Reset the scale factors before build_inter_predictors.
Add datarate tests for 3 spatial layers, which exposed this issue.
Change-Id: I7f81efbe44345ecea9fdd5f639a4cca76aed3874
some configurations may fail if AltRefTest is undefined though
VP8_INSTANTIATE_TEST_CASE is defined away.
Change-Id: I7272775a506718336bd6cee2225cf83bd72fede5
the same as vp8, with the same reasoning from:
2a0d7b1 Reduce the default kf_max_dist to 128.
see also:
https://trac.ffmpeg.org/ticket/4904https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=815673
+ restore vpxenc behavior of taking the library default rather than
forcing 5s
This change also exposes an issue with one-pass svc in cbr mode, keep
the old default in datarate_test.cc for now.
Change-Id: Id6d1244f42490b06fefc1a7b4e12a423a1f83e88
This commits adds a shift stage for FASTSSIM computaton when source
bit depth is different from working bit depth, to make sure metric
results are calculated in bit_depth consistent with source.
Change-Id: I997799634076ef7b00fd051710544681ed536185
This commit extends the HBDMetricTests to handle testing for metric
computation where input source depth is different from working bit
depth.
Change-Id: I5d11101cc9603a3fd09e8439816bb982a0f1b654
Priviously, we do 12-tap interpolation even there is no sub pixel,
This could cause a bug becuase decoder doesn't extend border when there
is no sub pixel. In this situation, if we still do interpolation, we
will access the border extension which doesn't exist and cause a
memory error
Change-Id: I55b879722f0a10c5d13261bd9617a75c826a2418
replace with vpx_highbd_lpf_horizontal_edge_16 and
vpx_highbd_lpf_horizontal_edge_8 to avoid passing a count parameter
Change-Id: I551f8cec0fce57032cb2652584bb802e2248644d
replace with vpx_lpf_horizontal_edge_16 and vpx_lpf_horizontal_edge_8 to
avoid passing a count parameter
Change-Id: I848c95c02a3c6ebaa6c2bdf0983dce05cd645271
Using this we can eliminate large numbers of calls to predict intra,
and is also faster than most of the variance functions it replaces.
This is an equivalence transform so coding performance is unaffected.
Encoder speedup is approx 7% when var_tx, super_tx and ext_tx are all
enabled.
Change-Id: I0d4c83afc4a97a1826f3abd864bd68e41bb504fb
External dynamic resize with swapping width and height was
not handled properly.
Fix is to re-init loop-filter under certain condtions.
Modify unittest to test this case.
Without this change test will fail.
Relates to: https://bugs.chromium.org/p/webm/issues/detail?id=1140
Change-Id: I7d81ca7fe0783b3bc103a52a7b7cf073a96be26e
This commit adds computation of PSNRHVS for highbitdepth build, it
also adds tests to make sure the calculation of psnrhvs metric for
10 and 12 bit correct.
Change-Id: Iac8a8073d2b3e3ba5d368829d770793212fa63b6
An issue exists with reference_masking in non-rd pickmode for spatial
scaling. It was kept off for internal dynamic resizing and svc, this
change is to keep it off also for external dynamic resizing.
Update to external resize test, and update TODO to re-enable this
at frame level when references have same scale as source.
Change-Id: If880a643572127def703ee5b2d16fd41bdbf256c
This commit adds the computation of fastSSIM for highbitdepth build,
it also modifies the hbdmetric test to be more generic and applicable
for fastSSIM.
The 255 used for calculating ssim constants c1 and c2 is not exactly
scaled by 4x and 16x to 1023 and 4095, therefore requries the metric
test to have a thresold more tolerant than 0, currently at 0.03dB.
Change-Id: I631829da7773de400e77fc36004156e5e126c7e0
For dynamic resizing (whether the new codec size is determined internally
or externally set by user), we should for now keep rc.resize_allowed enabled.
This prevent the use of referene_masking for real-time mode
(in set_rt_speed_feature()).
Change-Id: Ibb7c3ff35be88afdf1a3c6db6693521766f177a3
When the codec frame size is the same as the reference frame size,
release the scaled reference before assigning it a new buf_idx.
Only affects 1 pass non-svc mode, where the scaled references are
release only under certain conditions (to prevent un-needed scaling
of the references every frame).
Modified a unittest that can trigger this bug without this change.
https://code.google.com/p/chromium/issues/detail?id=582598
Change-Id: I9a884e36ebd7608b1641ec2a469e20a4f829cf43
If the application changes frame size (external size changes),
and aq-mode=3 is on, reset the cyclic refresh.
Modify the TestExternalResize unittest (longer run with more resize
actions). Without this change an assert would be triggered on this
longer test.
Change-Id: I0eefd2cd7ffa0c557cca96ae30d607034a2599ce
the lookahead buffer allocation is deferred to receipt of the first
frame to allow profile changes. if the encoder was flushed before
supplying any frames the encoder would crash trying to dereference the
NULL buffer. vp8 is unaffected.
fixes mozilla bug:
https://bugzilla.mozilla.org/show_bug.cgi?id=1237848
Change-Id: Icee4b64de760476eee0d33b568f0a1010335ff13
Remove comment(s) and enable frame-dropper for tests.
Frame dropper for 1 pass svc was fixed a while ago:
https://chromium-review.googlesource.com/#/c/309230/
Change-Id: I5fd3192825b22e562db9210d3dc7b246a1799d8d
This reverts commit ea48370a50, reversing
changes made to 15939cb2d7.
The commit was insufficiently tested and causes failures.
Change-Id: I623d6fc2cd3ae6fd42d0abab1f8eada465ae57a7
Reallocate the xmm register usage so that no ARCH_X86_64 required.
Reduce memory access to the left neighbor by half.
Speed up by single digit on big core machine.
Change-Id: I392515ed8e8aeb02e6a717b3966b1ba13f5be990
Relocate the function from SSSE3 to SSE2, Unroll loop from 16 to 8,
and reduce mem access to left.
Speed up by single digit in ./test_intra_pred_speed on big core
machines.
Change-Id: I2b7fc95ffc0c42145be2baca4dc77116dff1c960
4x4 Intra predictor implemented with MMX is replaced with SSE2.
Segfault in change 315561 when decoding vp8 is taken care of.
Change-Id: I083a7cb4eb8982954c20865160f91ebec777ec76
use CONFIG_VP[89] to protect white-box tests and drop redundant
uses of CONFIG_VP9 in variable assignments within that block
Change-Id: Id3c6cf5c7822aa161b19768b295f58829a1c6447
Relocate the function from SSSE3 to SSE2, Unroll loop from 8 to 4,
and reduce mem access to left.
Speed up by >20% in ./test_intra_pred_speed.
Change-Id: Ie48229c2e32404706b722442942c84983bda74cc
Relocate the function from SSSE3 to SSE2, Unroll loop from 4 to 2,
and reduce mem access to left.
Speed up by >20% in ./test_intra_pred_speed.
Change-Id: Ib9f1846819783b6e05e2a310c930eb844b2b4d2e
Relocate h_predictor_4x4 from SSSE3 to SSE2 with XMM registers.
Speed up by ~25% in ./test_intra_pred_speed.
Change-Id: I64e14c13b482a471449be3559bfb0da45cf88d9d
Always round sum error and sum square error toward zero in variance
calculations. This prevents variance from becoming negative.
Avoiding rounding variance at all might be better but would be far
more invasive.
Change-Id: Icf24e0e75ff94952fc026ba6a4d26adf8d373f1c
the final sum may use up to 26 bits
+ add a unit test
+ disable the sse2 as the result will rollover; this will be fixed in a
future commit
Change-Id: I2a49811dfaa06abfd9fa1e1e65ed7cd68e4c97ce
tm_predictor_4x4 is implemented with SSE2 using XMM registers.
Speed up by ~25% in ./test_intra_pred_speed.
Change-Id: I25074b78d476a2cb17f81cf654bdfd80df2070e0
For 1 pass CBR mode: increase waiting time after key frame
before we start sampling rate control behavior for determining
resize. This change need to disable one internal resize(DownUp)
temporally since it requires a longer clip to do so.
Change-Id: If21beda1be23f169ee541ab4dd642f718347887a
Jenkins per-commit test need to be expedited as more experiments are
added into the nextgenv2 branch. This patch does the following:
thread test: change the length of test clip from 5 frames to 3 frames;
only test speed 1.
ArfFreq test: marked as "large".
The tests marked as "large" will be removed from per-commit test
(to nightly test).
Change-Id: I62b373c52b481dcd281e741ebf5098408a97ff4d
this avoids redefining vpx_codec_vp9_dx, vpx_codec_vp9_dx_algo in
vp9_encoder_parms_get_to_decoder.cc
Change-Id: I3b89e7a62497227ee32419f1a7d30e4c10a13c05
EXT_TX introduces some new symbols to be decoded.
The encoder counts how many times these are used.
In multithreaded mode, the counts from the worker threads
need to be accumulated into the main thread.
This change means that VP10/VPxEncoderThreadTest now works
with more choices of cpu-used and number of passes.
Change-Id: Ibe7e6a3c58145265f4ead155ff98fb4cb37c3513
A new version of vp9_highbd_error_8bit is now available which is
optimized with AVX assembly. AVX itself does not buy us too much, but
the non-destructive 3 operand format encoding of the 128bit SSEn integer
instructions helps to eliminate move instructions. The Sandy Bridge
micro-architecture cannot eliminate move instructions in the processor
front end, so AVX will help on these machines.
Further 2 optimizations are applied:
1. The common case of computing block error on 4x4 blocks is optimized
as a special case.
2. All arithmetic is speculatively done on 32 bits only. At the end of
the loop, the code detects if overflow might have happened and if so,
the whole computation is re-executed using higher precision arithmetic.
This case however is extremely rare in real use, so we can achieve a
large net gain here.
The optimizations rely on the fact that the coefficients are in the
range [-(2^15-1), 2^15-1], and that the quantized coefficients always
have the same sign as the input coefficients (in the worst case they are
0). These are the same assumptions that the old SSE2 assembly code for
the non high bitdepth configuration relied on. The unit tests have been
updated to take this constraint into consideration when generating test
input data.
Change-Id: I57d9888a74715e7145a5d9987d67891ef68f39b7
If high bit depth configuration is enabled, but encoding in profile 0,
the code now falls back on optimized SSE2 assembler to compute the
block errors, similar to when high bit depth is not enabled.
Change-Id: I471d1494e541de61a4008f852dbc0d548856484f
The serial decode check is too strict for tile-threaded decoding as
there is no guarantee on the decode order nor which specific error
will take precedence. Currently a tile-level error is not forwarded so
the frame will simply be marked corrupt.
Change-Id: I51cf1e39e44bedeac93746154b36a4ccb2f059b1
define NOMINMAX to allow the std:: versions to be used; min/max will be
defined transitively via windows.h otherwise
Change-Id: I692b03fa3e70b7a53962d3fd209498f70f712fed
Resolved Conflicts in the following files:
configure
vp10/common/idct.c
vp10/encoder/dct.c
vp10/encoder/encodemb.c
vp10/encoder/rdopt.c
Change-Id: I4cb3986b0b80de65c722ca29d53a0a57f5a94316
In the decoder, map this to the output variable vpx_image_t.r_w/h.
This is intended as an improved version of VP9D_GET_DISPLAY_SIZE,
which doesn't work with parallel frame decoding. In the encoder,
map this to a codec control func (VP9E_SET_RENDER_SIZE) that takes
a w/h pair argument in a int[2] (identical to VP9D_GET_DISPLAY_SIZE).
Also add render_size to the encoder_param_get_to_decoder unit test.
See issue 1030.
Change-Id: I12124c13602d832bf4c44090db08c1009c94c7e8