The arm assembly files are named .s after conversion, to reuse
as much of the existing makefile infrastructure for conversion to
gas format as possible. Within the generated visual studio project,
only the converted assembly sources are available, which might not
be optimal for actually developing it, but is acceptable for
just building the library.
Multithreading is disabled since the traditional win32 threading
functions aren't available on WinRT/Windows Phone 8.
Building of vpx itself and the examples succeed, while building the
tests fail due to them using functions not available in the
windows store/windows phone API subsets - therefore the unit tests
are disabled.
This works for building in Visual Studio Express 2012 for Windows
Phone, while Visual Studio Express 2012 for Windows 8 (for
"Windows Store" apps) seems to reject the vcxproj files due to
not supporting "classic style native application or managed
projects". The built static library should be compatible with that
platform though.
Change-Id: Idcd7eca60bfaaaeb09392a9cd7a02e4a670e3b30
This is a mostly-working implementation of an extra channel in the
bitstream. Configure with --enable-alpha to test. Notable TODOs:
- Add extra channel to all mismatch tests, PSNR, SSIM, etc
- Configurable subsampling
- Variable number of planes (currently always uses all 4)
- Loop filtering
- Per-plane lossless quantizer
- ARNR support
This implementation just uses the same contents as the Y channel
for the A channel, due to lack of content and general pain in
playing back 4 channel content. A later patch will use the actual
alpha channel passed in from outside the codec.
Change-Id: Ibf81f023b1c570bd84b3064e9b4b8ae52e087592
In current code, motion vectors got from single prediction mode are used
in compound prediction mode directly. These motion vectors may not give
accurate prediction since they are searched independently. In this patch,
we took Pascal's suggestion, and did joint motion search in compound
prediction mode to find better motion vectors in this situation.
Test results:
Overall PSNR: 0.570%(derf), 0.918%(stdhd);
SSIM: 0.572%(derf), 1.009%(stdhd);
The encoder is a little slower. This can be improved since some c
code is used in motion search.
Change-Id: Ib30c9240f6c56c9b070867b4ca89412a76d9f3c6
Adds a new experiment CONFIG_NON420 that allows other chroma subsamplings
to be passed to the codec. This commit allows the data to be passed from
a y4m input file through vpxenc to the codec, where they're currently
rejected. Later commits will finish support for this inside the codec.
Change-Id: Ib3aac604d8cad9e24cef395fa1067f16ba7e8e43
Adds an experiment that codes an end-of-orientation symbol
for every eligible zero encountered in scan order.
This cleans out various other sub-experiments that were part
of the origiinal patch, which will be later included if found
useful.
Results are slightly positive on all sets (0.1 - 0.2% range).
Change-Id: I57765c605fefc7fb9d1b57f1b356843602abefaf
This is work-in-progress, it implements multiple ARF
encoding behind an experimental flag.
It adds the ability to insert multiple ARF frames into a
single ARF group. This patch implements the reordering
of the coded frames, and implements a fixed-length coding
pattern. It applies a fixed quantizer strategy based on
where the frame is in the coding sequence.
Further work to modify the rate control strategy is
ongoing and will be submitted via a set of future patches.
In this first step, each ARF group is recursively
bisected and an ARF frame added at that position in the
sequence. The recursion continues until ARF frames are
within MIN_GF_INTERVAL frames.
The code sits behind the "multiple-arf" experimental
flag ("CONFIG_MULTIPLE_ARF"). The experimental flag
"oneshotq" ("CONFIG_ONESHOTQ") also needs to be enabled
for this patch to work correctly.
Change-Id: Ie473b05ebb43ac473c0cfb659b2b8042823085e2
Merge sb32x32 and sb64x64 functions; allow for rectangular sizes. Code
gives identical encoder results before and after. There are a few
macros for rectangular block sizes under the sbsegment experiment; this
experiment is not yet functional and should not yet be used.
Change-Id: I71f93b5d2a1596e99a6f01f29c3f0a456694d728
Pick up VP8 encryption, quantization changes, and some fixes to vpxenc
Conflicts:
test/decode_test_driver.cc
test/decode_test_driver.h
test/encode_test_driver.cc
vp8/vp8cx.mk
vpxdec.c
vpxenc.c
Change-Id: I9fbcc64808ead47e22f1f22501965cc7f0c4791c
This gains about 0.2% on derf, 0.1% on hd and 0.4% on stdhd. I can put
this under an experimental flag if wanted, just trying to get my patch
queue in shape.
Change-Id: Ibe1a30fe0e0b07bec4802e0f3ff0ba22e505f576
Adds an experiment to use a weighted prediction of two INTER
predictors, where the weight is one of (1/4, 3/4), (3/8, 5/8),
(1/2, 1/2), (5/8, 3/8) or (3/4, 1/4), and is chosen implicitly
based on consistency of the predictors to the already
reconstructed pixels to the top and left of the current macroblock
or superblock.
Currently the weighting is not applied to SPLITMV modes, which
default to the usual (1/2, 1/2) weighting. However the code is in
place controlled by a macro. The same weighting is used for Y and
UV components, where the weight is derived from analyzing the Y
component only.
Results (over compound inter-intra experiment)
derf: +0.18%
yt: +0.34%
hd: +0.49%
stdhd: +0.23%
The experiment suggests bigger benefit for explicitly signaled weights.
Change-Id: I5438539ff4485c5752874cd1eb078ff14bf5235a
Adds a per-frame, strength adjustable, in loop deringing filter. Uses
the existing vp9_post_proc_down_and_across 5 tap thresholded blur
code, with a brute force search for the threshold.
Results almost strictly positive on the YT HD set, either having no
effect or helping PSNR in the range of 1-3% (overall average 0.8%).
Results more mixed for the CIF set, (-0.5 min, 1.4 max, 0.1 avg).
This has an almost strictly negative impact to SSIM, so examining a
different filter or a more balanced search heuristic is in order.
Other test set results pending.
Change-Id: I5ca6ee8fe292dfa3f2eab7f65332423fa1710b58
Replaces the default tables for single coefficient magnitudes with
those obtained from an appropriate distribution. The EOB node
is left unchanged. The model is represeted as a 256-size codebook
where the index corresponds to the probability of the Zero or the
One node. Two variations are implemented corresponding to whether
the Zero node or the One-node is used as the peg. The main advantage
is that the default prob tables will become considerably smaller and
manageable. Besides there is substantially less risk of over-fitting
for a training set.
Various distributions are tried and the one that gives the best
results is the family of Generalized Gaussian distributions with
shape parameter 0.75. The results are within about 0.2% of fully
trained tables for the Zero peg variant, and within 0.1% of the
One peg variant.
The forward updates are optionally (controlled by a macro)
model-based, i.e. restricted to only convey probabilities from the
codebook. Backward updates can also be optionally (controlled by
another macro) model-based, but is turned off by default. Currently
model-based forward updates work about the same as unconstrained
updates, but there is a drop in performance with backward-updates
being model based.
The model based approach also allows the probabilities for the key
frames to be adjusted from the defaults based on the base_qindex of
the frame. Currently the adjustment function is a placeholder that
adjusts the prob of EOB and Zero node from the nominal one at higher
quality (lower qindex) or lower quality (higher qindex) ends of the
range. The rest of the probabilities are then derived based on the
model from the adjusted prob of zero.
Change-Id: Iae050f3cbcc6d8b3f204e8dc395ae47b3b2192c9
This patch revamps the entropy coding of coefficients to code first
a non-zero count per coded block and correspondingly remove the EOB
token from the token set.
STATUS:
Main encode/decode code achieving encode/decode sync - done.
Forward and backward probability updates to the nzcs - done.
Rd costing updates for nzcs - done.
Note: The dynamic progrmaming apporach used in trellis quantization
is not exactly compatible with nzcs. A suboptimal approach has been
used instead where branch costs are updated to account for changes
in the nzcs.
TODO:
Training the default probs/counts for nzcs
Change-Id: I951bc1e22f47885077a7453a09b0493daa77883d
Rebased.
Remove the old matrix multiplication transform computation. The 16x16
ADST/DCT can be switched on/off and evaluated by setting ACTIVE_HT16
300/0 in vp9/common/vp9_blockd.h.
Change-Id: Icab2dbd18538987e1dc4e88c45abfc4cfc6e133f
rebased.
This patch includes 16x16 butterfly inverse ADST/DCT hybrid
transform. It uses the variant ADST of kernel
sin((2k+1)*(2n+1)/4N),
which allows a butterfly implementation.
The coding gains as compared to DCT 16x16 are about 0.1% for
both derf and std-hd. It is noteworthy that for std-hd sets
many sequences gains about 0.5%, some 0.2%. There are also few
points that provides -1% to -3% performance. Hence the average
goes to about 0.1%.
Change-Id: Ie80ac84cf403390f6e5d282caa58723739e5ec17
Removal of the NEWCOEFCONTEXT experiment to
reduce code clutter and make it easier to experiment with
some other changes to the coefficient coding context.
Change-Id: Icd17b421384c354df6117cc714747647c5eb7e98
fixed format issues.
Implement the inverse 4x4 ADST using 9 multiplications. For this
particular dimension, the original ADST transform can be
factorized into simpler operations, hence is retained.
Change-Id: Ie5d9749942468df299ab74e90d92cd899569e960
Refactor the 8x8 inverse hybrid transform. It is now consistent
with the new inverse DCT. Overall performance loss (due to the
use of this variant ADST, and the rounding errors in the butterfly
implementation) for std-hd is -0.02.
Fixed BUILD warning.
Devise a variant of the original ADST, which allows butterfly
computation structure. This new transform has kernel of the
form: sin((2k+1)*(2n+1) / (4N)). One of its butterfly structures
using floating-point multiplications was reported in Z. Wang,
"Fast algorithms for the discrete W transform and for the discrete
Fourier transform", IEEE Trans. on ASSP, 1984.
This patch includes the butterfly implementation of the inverse
ADST/DCT hybrid transform of dimension 8x8.
Change-Id: I3533cb715f749343a80b9087ce34b3e776d1581d
Linking when we don't use it but it is available is probably harmless.
Gtest requires pthreads. Don't automatically enable unit tests if we
don't have it.
Change-Id: I5e6c3b609f840c7b6dbb36fc65809f0ef84685f8
This experiment gives little gains and adds relatively much code
complexity (and it hinders other experiments), so let's get rid of
it.
Change-Id: Id25e79a137a1b8a01138aa27a1fa0ba4a2df274a
This patch removes the old pred-filter experiment and replaces it
with one that is implemented using the switchable filter framework.
If the pred-filter experiment is enabled, three interopolation
filters are tested during mode selection; the standard 8-tap
interpolation filter, a sharp 8-tap filter and a (new) 8-tap
smoothing filter.
The 6-tap filter code has been preserved for now and if the
enable-6tap experiment is enabled (in addition to the pred-filter
experiment) the original 6-tap filter replaces the new 8-tap smooth
filter in the switchable mode.
The new experiment applies the prediction filter in cases of a
fractional-pel motion vector. Future patches will apply the filter
where the mv is pel-aligned and also to intra predicted blocks.
Change-Id: I08e8cba978f2bbf3019f8413f376b8e2cd85eba4
This is to add to the 64x64 transform experiment as an alternative to
a 64x64 DCT.
Two levels of wavelet decomposition is used on a 64x64 block, followed
by 16x16 DCT on the four lowest subbands. The highest three subbands
are left untransformed after the first level DWT.
Change-Id: I3d48d5800468d655191933894df6b46e15adca56
Adds an experiment to derive the previous context of a coefficient
not just from the previous coefficient in the scan order but from a
combination of several neighboring coefficients previously encountered
in scan order. A precomputed table of neighbors for each location
for each scan type and block size is used. Currently 5 neighbors are
used.
Results are about 0.2% positive using a strategy where the max coef
magnitude from the 5 neigbors is used to derive the context.
Change-Id: Ie708b54d8e1898af742846ce2d1e2b0d89fd4ad5
First attempt at avoiding all the compile-time environment detection for
cases where you can generate the environments statically, as when the
real build is being performed by another build system.
Change-Id: Ie3cf95d71d6c5169900f31e263b84bc123cdf73f
This commit changed the ENTROPY_CONTEXT conversion between MBs that
have different transform sizes.
In additioin, this commit also did a number of cleanup/bug fix:
1. removed duplicate function vp9_fix_contexts() and changed to use
vp8_reset_mb_token_contexts() for both encoder and decoder
2. fixed a bug in stuff_mb_16x16 where wrong context was used for
the UV.
3. changed reset all context to 0 if a MB is skipped to simplify the
logic.
Change-Id: I7bc57a5fb6dbf1f85eac1543daaeb3a61633275c
This adds Debargha's DCT/DWT hybrid and a regular 32x32 DCT, and adds
code all over the place to wrap that in the bitstream/encoder/decoder/RD.
Some implementation notes (these probably need careful review):
- token range is extended by 1 bit, since the value range out of this
transform is [-16384,16383].
- the coefficients coming out of the FDCT are manually scaled back by
1 bit, or else they won't fit in int16_t (they are 17 bits). Because
of this, the RD error scoring does not right-shift the MSE score by
two (unlike for 4x4/8x8/16x16).
- to compensate for this loss in precision, the quantizer is halved
also. This is currently a little hacky.
- FDCT and IDCT is double-only right now. Needs a fixed-point impl.
- There are no default probabilities for the 32x32 transform yet; I'm
simply using the 16x16 luma ones. A future commit will add newly
generated probabilities for all transforms.
- No ADST version. I don't think we'll add one for this level; if an
ADST is desired, transform-size selection can scale back to 16x16
or lower, and use an ADST at that level.
Additional notes specific to Debargha's DWT/DCT hybrid:
- coefficient scale is different for the top/left 16x16 (DCT-over-DWT)
block than for the rest (DWT pixel differences) of the block. Therefore,
RD error scoring isn't easily scalable between coefficient and pixel
domain. Thus, unfortunately, we need to compute the RD distortion in
the pixel domain until we figure out how to scale these appropriately.
Change-Id: I00386f20f35d7fabb19aba94c8162f8aee64ef2b
A patch on compound inter-intra prediction.
In compound inter-intra prediction, a new predictor for
16x16 inter coded MBs are obtained by combining a single
inter predictor with a 16x16 intra predictor, in a manner
that the weight varies with distance from the top/left
boundary. The current search strategy is to combine the best
inter mode with the best intra mode obtained independently.
Results so far:
derf +0.31%
yt +0.32%
std-hd +0.35%
hd +0.42%
It is conceivable that the results would improve somewhat
with a more thorough search strategy where all intra modes
are searched given the best mv, or even a joint search for
the best mv and the best intra mode.
Change-Id: I7951f1ed0d6eb31ca32ac24d120f1585bcd8d79b
Preliminary patch on a new 4x4 intra mode B_CONTEXT_PRED where the
dominant direction from the context is used to encode. Various decoder
changes are needed to support decoding of B_CONTEXT_PRED in conjunction
with hybrid transforms since the scan order and tokenization depends on
the actual direction of prediction obtained from the context. Currently
the traditional directional modes are used in conjunction with the
B_CONTEXT_PRED, which also seems to provide the best results.
The gains are small - in the 0.1% range.
Change-Id: I5a7ea80b5218f42a9c0dfb42d3f79a68c7f0cdc2
Creates a merge between the master and experimental branches. Fixes a
number of conflicts in the build system to allow *either* VP8 or VP9
to be built. Specifically either:
$ configure --disable-vp9 $ configure --disable-vp8
--disable-unit-tests
VP9 still exports its symbols and files as VP8, so that will be
resolved in the next commit.
Unit tests are broken in VP9, but this isn't a new issue. They are
fixed upstream on origin/experimental as of this writing, but rebasing
this merge proved difficult, so will tackle that in a second merge
commit.
Change-Id: I2b7d852c18efd58d1ebc621b8041fe0260442c21
Documentation is typically auto-detected by checking for php and
doxygen. Add an option to explicitly disable it.
Remove toggle keywords from libraries, examples, documentation and
unit tests. They were not consistent with the default status.
Change-Id: I21049675ccfd8e58ac612cd058641b197db5c0eb
As suggested by Paul, this commit separate the subpel refmv selection
into a separate experiment. It also changed a couple variable names
to better reflect the nature of the variables.
Change-Id: Id951c3cadc61a982dd15afe641000f60213b8995
Documentation is typically auto-detected by checking for php and
doxygen. Add an option to explicitly disable it.
Remove toggle keywords from libraries, examples, documentation and
unit tests. They were not consistent with the default status.
Change-Id: I21049675ccfd8e58ac612cd058641b197db5c0eb
Results: derf (vanilla or +hybridtx) +0.2% and (+hybrid16x16
or +tx16x16) +0.7%-0.8%; HD (vanilla or +hybridtx) +0.1-0.2%
and (+hybrid16x16 or +tx16x16) +1.4%, STD/HD (vanilla or +hybridtx)
about even, and (+hybrid16x16 or +tx16x16) +0.8-1.0%.
Change-Id: I03899e2f7a64e725a863f32e55366035ba77aa62
Adds a new experiment with redesigned/refactored motion vector entropy
coding. The patch also takes a first step towards separating the
integer and fractional pel components of a MV. However the fractional
pel encoding still depends on the integer pel part and so they are
not fully independent. Further experiments are in progress to see
how much they can be decoupled without affecting performance.
All components including entropy coding/decoding, costing for MV
search, forward updates and backward updates to probability tables,
have been implemented.
Results so far:
derf: +0.19%
std-hd: +0.28%
yt: +0.80%
hd: +1.15%
Patch: Simplifies the fractional pel models:
derf: +0.284%
std-hd: +0.289%
yt: +0.849%
hd: +1.254%
Patch: Some changes in the models, rebased.
derf: +0.330%
std-hd: +0.306%
yt: +0.816%
hd: +1.225%
Change-Id: I646b3c48f3587f4cc909639b78c3798da6402678
Enable ADST/DCT of dimension 16x16 for I16X16 modes. This change provides
benefits mostly for hd sequences.
Set up the framework for selectable transform dimension.
Also allowing quantization parameter threshold to control the use
of hybrid transform (This is currently disabled by setting threshold
always above the quantization parameter. Adaptive thresholding can
be built upon this, which will further improve the coding performance.)
The coding performance gains (with respect to the codec that has all
other configuration settings turned on) are
derf: 0.013
yt: 0.086
hd: 0.198
std-hd: 0.501
Change-Id: Ibb4263a61fc74e0b3c345f54d73e8c73552bf926
Alternative strategy for finding a list of candidate motion
vectors to use as reference values in mv coding and as
nearest and near.
Sort by sad in vp8_find_best_ref_mvs() rather than just
pick the best. Allow 0,0 as a best ref option but not a
nearest or near unless there are no alternatives.
Encode/Decode verified on at least some clips.
Some commented out experimental and stats code still in place.
Gain over existing code averages about 1% on derf (alll metrics)
with improvement on all clips. Other test results pending.
The entropy coding of the mode (nearest/near etc) still
depends upon and requires the old "findnear" code so
this needs looking at and may provide room for further gains.
Change-Id: I871d7cba1d1c379c4bad9bcccce1fb19c46b8247
If you build with --enabled-shared on a Linux arch not explicitly
listed, the configure script will abort because it didn't detect
"linux" in the fallback generic-gnu tuple.
Since this is the fallback tuple and people are passing
--enable-shared, assume the user knows what they're in for.
Change-Id: Ia35b657e7247c8855e3a94fca424c9884d4241e3
Merges this experiment in to make it easier to run tests on
filter precision, vectorized implementation etc.
Also removes an experimental filter.
Change-Id: I1e8706bb6d4fc469815123939e9c6e0b5ae945cd
Using surrounding reconstructed pixels from left and above to select
best matching mv to use as reference motion vector for mv encoding.
Test results:
AVGPSNR GLBPSNR VPXSSIM
Derf: 1.107% 1.062% 0.992%
Std-hd:1.209% 1.176% 1.029%
Change-Id: I8f10e09ee6538c05df2fb9f069abcaf1edb3fca6
This allows building on MountainLion as the 10.6 SDK has been
removed from the latest Xcode version (4.4 4F250). Also fix
all warnings for that build.
Change-Id: Ib70bca4a25295f13595f0d10ea9f0229631de5a4
Merged in the high_precision_mv experiment to make it easier
to work on new mv encoding strategies. Also removed
coef_update_probs3().
Change-Id: I82d3b0bb642419fe05dba82528bc9ba010e90924
Fixed the code review comments.
Under the htrans8x8 experiment the 8X8 DCT in the
I8X8 mode is replaced with a combination of 8X8 ADST and
DCT.
Overall coding gains with the htrans8x8 experiment are:
derf: 0.486
std-hd: 1.040
hd: 1.063
yt: 0.506
Note that part of the gain comes from bigger transforms
(8x8 instead of 4x4) and part comes from replacing the DCT
wth the ADST.
Change-Id: I92ca6bbfce11b4165d612b81d9adfad4d010c775
Set on all 16x16 intra/inter modes
Features:
- Butterfly fDCT/iDCT
- Loop filter does not filter internal edges with 16x16
- Optimize coefficient function
- Update coefficient probability function
- RD
- Entropy stats
- 16x16 is a config option
Have not tested with experiments.
hd: 2.60%
std-hd: 2.43%
yt: 1.32%
derf: 0.60%
Change-Id: I96fb090517c30c5da84bad4fae602c3ec0c58b1c