Граф коммитов

463 Коммитов

Автор SHA1 Сообщение Дата
John Koleszar bdd35c13cc avoid resetting framerate during vpx_codec_enc_config_set()
The calculated frame_rate is a state variable in the codec, and
shouldn't be maintained in the configuration struct. Move it to the
main part of cpi so that it isn't clobbered when the configuration
struct is updated. The initial framerate estimate is moved from the
vp8_cx_iface.c wrapper into the body of init_config() in onyx_if.c, so
that it is only called once and not reset on every call to
vp8_change_config().

Change-Id: I8d9a3d1283330d1ee297d07e9d78d1f2875f2465
2011-11-11 14:45:58 -08:00
Scott LaVarnway df49c7c58d SSE2 optimizations for vp8_build_intra_predictors_mby{,_s}()
Ronald recently sent me this patch that he did in April.
> From: Ronald S. Bultje <rbultje@google.com>
> Date: Thu, 28 Apr 2011 17:30:15 -0700
> Subject: [PATCH] SSE2 optimizations for
> vp8_build_intra_predictors_mby{,_s}().
HD decode tests have shown a performance boost up to 1.5%,
depending on material.
Patch set 3: Fixed encoder crash.

Change-Id: Ie1fd1fa3dc750eec1a7a20bfa2decc079dcf48c8
2011-11-09 15:30:35 -05:00
Scott LaVarnway 9532bda0fb Merge "Relocated idct/add calls for encoder" 2011-11-09 10:17:43 -08:00
Johann ea2229bab6 Merge "ARMv6 optimized Intra4x4 prediction" 2011-11-09 09:36:33 -08:00
John Koleszar 82e8884ad8 Merge "Remove unused file recon.c" 2011-11-09 09:31:23 -08:00
Scott LaVarnway 861ed6a5c1 Relocated idct/add calls for encoder
Call the idct/add after the tokenize.  This is WIP with
the goal of creating a common idct/add for the encoder and
decoder. This move is necessary because the decoder's version
of the idct clobbers qcoeff, which is used by the tokenize.

Change-Id: I6b08d8e8397cd873647fa4fb9469884e3c876756
2011-11-09 10:41:05 -05:00
Tero Rintaluoma 5a2fd63a2a ARMv6 optimized Intra4x4 prediction
Added ARM optimized intra 4x4 prediction
 - 2x faster on Profiler compared to C-code compiled with -O3
 - Function interface changed a little to improve BLOCKD structure
   access

Change-Id: I9bc2b723155943fe0cf03dd9ca5f1760f7a81f54
2011-11-09 09:13:51 +02:00
James Zern 9d60506130 threading: avoid defining _WIN32_WINNT
The referenced function (SignalObjectAndWait) isn't used. Reduces the
warnings with mingw32-w64 which defines this.

Change-Id: I4ce592879ec9372bf196dac640204c4d370bd210
2011-11-08 18:50:45 -08:00
John Koleszar f89e109f56 Remove unused file recon.c
File not referenced from anywhere and no longer compiles.

Change-Id: I38b11bd60db615c2c2c9d7ad35caba3a1adf1750
2011-11-08 15:54:56 -08:00
James Zern f89ea3432f fix file permissions
all of googletest import (0ab00a22) was marked executable

Change-Id: Id7b7ee03efc21ab998bb03349bd91644e8af25da
2011-11-04 18:50:35 -07:00
John Koleszar 7ca6c91732 Merge "Changing decoder input partition API to input fragments." 2011-11-04 09:36:27 -07:00
Scott LaVarnway 46639567a0 Merge "Change use of eob in the encoder" 2011-11-03 08:06:06 -07:00
Tero Rintaluoma e4f2ec7a52 Change use of eob in the encoder
Changed 'int eob' to 'char *eob' in BLOCKD so that both encoder and
decoder will use eobs[25] array from MACROBLOCKD structure. In future,
this will enable use of the decoder side IDCT in the encoder.

Change-Id: I6e1c011628cb8864fd4a0b80f0279ce16a5ca978
2011-11-03 16:08:09 +02:00
Stefan Holmer 1427205215 Changing decoder input partition API to input fragments.
Adding support for several partitions within one input fragment.
This is necessary to fully support all possible packetization
combinations in the VP8 RTP profile. Several partitions can
be transmitted in the same packet, and they can only be split
by reading the partition lengths from the bitstream.

Change-Id: If7d7ea331cc78cb7efd74c4a976b720c9a655463
2011-11-01 14:44:37 -07:00
Scott LaVarnway e0309e1509 Merge "Improved decode_split_mv()" 2011-10-28 09:27:17 -07:00
Scott LaVarnway 6064384d59 Improved decode_split_mv()
Tests showed ~1.2% performance boost on the HD clip used.
Performance will vary based on material.

Change-Id: Icbcf1a828750d5b4ae5252bf596b3ef594042e8a
2011-10-27 11:26:30 -04:00
Scott LaVarnway 0db5599957 Merge "Improved mv_bias" 2011-10-27 06:14:00 -07:00
Scott LaVarnway 21970d1dc2 Improved mv_bias
Small performance gains.

Change-Id: I709b9390a8a27a70f5f23574313b8db85ac7f23d
2011-10-26 11:46:10 -04:00
Attila Nagy de82809444 Reduce partial frame copy in encoder's pick_filter_level_fast
The partial frame copy function used to copy an extra 8 lines above
and  below. The partial frame filtering can only modify 3 pixel rows
above the partial frame. Reduce copy to bare minimum needed, which is
4 lines, so that partial filtering on copied frame is possible.

Define the "magic" fraction number for partial filtering in
loopfilter.h .

Change-Id: I4791ffc541b6884b12759a0d0714a8faf16147ec
2011-10-26 15:25:07 +03:00
James Berry bc7151131d Fix: check cx_data buffer prior to write
check to make sure that cx_data buffer has enough room before
writting to it, prior behavior did not which could result in a crash.

Change-Id: I3fab6f2bc4a96d7c675ea81acd39ece121738b28
2011-10-20 15:55:00 -04:00
Scott LaVarnway ed9c66f584 Remove usage of predict buffer for decode
Instead of using the predict buffer, the decoder now writes
the predictor into the recon buffer.  For blocks with eob=0,
unnecessary idcts can be eliminated.  This gave a performance
boost of ~1.8% for the HD clips used.

Tero: Added needed changes to ARM side and scheduled some
      assembly code to prevent interlocks.

Patch Set 6:  Merged (I1bcdca7a95aacc3a181b9faa6b10e3a71ee24df3)
into this commit because of similarities in the idct
functions.
Patch Set 7: EC bug fix.

Change-Id: Ie31d90b5d3522e1108163f2ac491e455e3f955e6
2011-10-18 12:06:50 -04:00
Adrian Grange 04182a121a Merge "Added rate-targeted temporal scalability" 2011-10-11 12:54:52 -07:00
Adrian Grange 217591fde5 Added rate-targeted temporal scalability
Added the ability to create rate-targeted, temporally
scalable, VP8 compatible bitstreams.

The application vp8_scalable_patterns.c demonstrates how
to use this capability. Users can create output bitstreams
containing upto 5 temporally separable streams encoded
as a single VP8 bitstream.
(previously abandoned as:
I92d1483e887adb274d07ce9e567e4d0314881b0a)

Change-Id: I156250a3fe930be57c069d508c41b6a7a4ea8d6a
2011-10-11 12:49:12 -07:00
James Berry 05bde9d4a4 bug fix - starting/optimal/max and buffer_level changed from int to int64_t
buffer_level in VP8_COMP and starting_buffer_level, optimal_buffer_level
and maximum_buffer_size in VP8_CONFIG changed from int to int64_t
to avoid potential crash issues for larger target bit rates.

Change-Id: I0d5ab6c8a44c2fef51f30cd8df4bb4b739c5df26
2011-10-10 12:16:55 -04:00
Scott LaVarnway af12c23e8e Merge "Improved tokenize" 2011-10-04 09:57:42 -07:00
Johann 2aa408524c Merge "Reduce computational complexity of generic C loop filter." 2011-09-30 16:17:56 -07:00
Scott LaVarnway ab00d209bc Improved tokenize
For a realtime HD encodings, up to 1.6% gains seen.



Change-Id: If45028e23db95124da63f9d38ffe06e05596cc6e
2011-09-30 12:49:46 -04:00
Johann 3556deaca3 combine loopfilter data access
The data processed by the loopfilter overlaps. At the block level, this
results in some redundant transforms. Grouping the filtering allows for
a single 16x16 transpose (and inversion) instead of three 16x8 transposes
(and three more inversions).

This implementation is x86_64 only. We retain the previous
implementation for x86.

Improvements are obviously material dependant, but it seems to be ~%1 in
tests here.

Change-Id: I467b7ec3655be98fb5f1a94b5d145e5e5a660007
2011-09-30 07:38:35 -07:00
Aaron Watry 69aa303d96 Reduce computational complexity of generic C loop filter.
Change-Id: I1e7f9ed3cd907844a495b9e0073bc140b87e5c06
2011-09-29 17:25:48 -05:00
John Koleszar 6f9457ec12 Merge "clamp_mvs() using the wrong motion vector information" 2011-09-22 11:54:15 -07:00
Attila Nagy 1a7d25a484 Replace vpx_ports/config.h with vpx_config.h
Just a clean-up.

Change-Id: Iea5b6dc925dcfa7db548bc1ab1a13d26ed5a2c9a
2011-09-22 13:33:54 +03:00
Scott LaVarnway c0ee870b0a clamp_mvs() using the wrong motion vector information
In the "Removed bmi copy to/from BLOCKD" commit, the copy
to the bmi in BLOCKD was eliminated.  The clamp_mvs() used
the bmi in BLOCKD, which now contains incorrect values.  This
patch fixes this problem.

Change-Id: I8eca1eaf4015052b0b63e90876f7ad321aba7cff
2011-09-16 11:03:53 -04:00
Scott LaVarnway 222c72e50f Merge "Removed bmi copy to/from BLOCKD" 2011-08-31 06:57:20 -07:00
Scott LaVarnway b870947d42 Removed bmi copy to/from BLOCKD
for SPLITMV and B_PRED modes.  Modified code to use the bmi
found in mode_info_context instead of BLOCKD.  On the decode
side, the uvmvs are calculated only when required, instead of
every macroblock.  This is WIP. (bmi should eventually be
removed from BLOCKD)
Small performance gains noticed for RT encodes and decodes.(VGA)

Change-Id: I2ed7f0fd5ca733655df684aa82da575c77a973e7
2011-08-24 14:42:26 -04:00
Fritz Koenig 112bd4e2b4 Fix naming of sse2 idct functions.
Prepend idct function names with vp8_
so that under profiling they show up
associated with libvpx.

Change-Id: I4fe357b50236cb7730a4cc00164c0a3487a1d8b4
2011-08-24 10:25:32 -07:00
Scott LaVarnway 1de5da80c9 Merge "Faster vp8_default_coef_probs" 2011-08-24 07:52:10 -07:00
Johann 85358d04cd Fix data accesses for simple loopfilters
The data that the simple horizontal loopfilter reads is aligned, treat
it accordingly.

For the vertical, we only use the bottom 4 bytes, so don't read in 16
(and incur the penalty for unaligned access).

This shows a small improvement on older processors which have a
significant penalty for unaligned reads.

postproc_mmx.c is unused

Change-Id: I87b29bbc0c3b19ee1ca1de3c4f47332a53087b3d
2011-08-23 20:42:45 -04:00
Fritz Koenig c5f890af2c Use local labels for jumps/loops in x86 assembly.
Prepend . to local labels in assembly code.  This
allows non unique labels within a file.  Also
makes profiling information more informative
by keeping the function name with the loop name.

Change-Id: I7a983cb3a5ba2413d5dafd0a37936b268fb9e37f
2011-08-23 09:05:29 -07:00
John Koleszar edec5eb5e7 Merge "Copy less when active map is in use" 2011-08-19 07:31:00 -07:00
Alpha Lam 4e8d35a461 Copy less when active map is in use
When active map is specified and the current frame is not a key frame,
golden frame nor a altref frame then copy only those active regions.

This significantly reduces encoding time by as much as 19% on the test
system where realtime encoding is used. This is particularly useful
when the frame size is large (e.g. 2560x1600) and there's only a few
action macroblocks.

Change-Id: If394a813ec2df5a0201745d1348dbde4278f7ad4
2011-08-19 10:29:41 -04:00
Scott LaVarnway 19987dcbfa Faster vp8_default_coef_probs
Copies from a generated table instead of building the
default coeff probabilities during runtime.

Change-Id: I4d9551ea3a2d7d4a4f7ce9eda006495221a8de50
2011-08-16 16:21:21 -04:00
John Koleszar e96131705a Revert "Improved 1-pass CBR rate control"
This reverts commit b5ea2fbc2c. Further
testing showed noticable keyframe popping in some cases, reverting this
for now to give time for a proper fix.

Conflicts:

	vp8/encoder/onyx_if.c
	vp8/encoder/ratectrl.c

Change-Id: I159f53d1bf0e24c035754ab3ded8ccfd58fd04af
2011-08-12 14:51:36 -04:00
Johann 30e5deae5d update extend frame borders
the neon code made several assumptions which were broken by a recent
change: https://review.webmproject.org/2676

update the code with new assumptions and guard them with a compile time
assert

Change-Id: I32a8378030759966068f34618d7b4b1b02e101a0
2011-08-02 19:26:46 -04:00
John Koleszar 06c3d5bb9a Fix building with --disable-postproc
Change-Id: I7e6bc28e7974a376da747300744e0dd5dc1d21e9
2011-08-01 17:50:23 -04:00
James Zern b45065d38b cosmetics: consistently use [u]int64_t
Removes mixed usage of (unsigned) long long and INT64.
Fixes Issue #208.

Change-Id: I220d3ed5ce4bb1280cd38bb3715f208ce23cf83a
2011-07-26 11:34:36 -07:00
Yunqing Wang 65dfcf4696 Use CONFIG_FAST_UNALIGNED consistently in codec
CONFIG_FAST_UNALIGNED is enabled by default. Disable it if it is
not supported by hardware.

Change-Id: I7d6905ed79fed918bca074bd62820b0c929d81ab
2011-07-25 10:11:24 -04:00
Johann 773bcc300d Merge "fix sharpness bug and clean up" 2011-07-22 09:34:55 -07:00
Johann a04ed0e8f3 fix sharpness bug and clean up
sharpness was not recalculated in vp8cx_pick_filter_level_fast

remove last_filter_type. all values are calculated, don't need to update
the lfi data when it changes.

always use cm->sharpness_level. the extra indirection was annoying.

don't track last frame_type or sharpness_level manually. frame type
only matters for motion search and sharpness_level is taken care of in
frame_init

move function declarations to their proper header

Change-Id: I7ef037bd4bf8cf5e37d2d36bd03b5e22a2ad91db
2011-07-22 12:33:57 -04:00
Yunqing Wang 829179e888 Merge "Preload reference area to an intermediate buffer in sub-pixel motion search" 2011-07-22 06:56:15 -07:00
Yunqing Wang 20bd1446c0 Preload reference area to an intermediate buffer in sub-pixel motion search
In sub-pixel motion search, the search range is small(+/- 3 pixels).
Preload whole search area from reference buffer into a 32-byte
aligned buffer. Then in search, load reference data from this buffer
instead. This keeps data in cache, and reduces the crossing cache-
line penalty. For tulip clip, tests on Intel Core2 Quad machine(linux)
showed encoder speed improvement:
  3.4%   at --rt --cpu-used =-4
  2.8%   at --rt --cpu-used =-3
  2.3%   at --rt --cpu-used =-2
  2.2%   at --rt --cpu-used =-1

Test on Atom notebook showed only 1.1% speed improvement(speed=-4).
Test on Xeon machine also showed less improvement, since unaligned
data access latency is greatly reduced in newer cores.

Next, I will apply similar idea to other 2 sub-pixel search functions
for encoding speed > 4.

Make this change exclusively for x86 platforms.

Change-Id: Ia7bb9f56169eac0f01009fe2b2f2ab5b61d2eb2f
2011-07-22 09:28:06 -04:00
Scott LaVarnway a25f6a9c88 Moved vp8_encode_bool into boolhuff.h
allowing the compiler to inline this function.  For real-time
encodes, this gave a boost of 1% to 2.5%, depending on the
speed setting.

Change-Id: I3929d176cca086b4261267b848419d5bcff21c02
2011-07-19 09:17:25 -04:00
John Koleszar b5ea2fbc2c Improved 1-pass CBR rate control
This patch attempts to improve the handling of CBR streams with
respect to the short term buffering requirements. The "buffer level"
is changed to be an average over the rc buffer, rather than a long
running average. Overshoot is also tracked over the same interval
and the golden frame targets suppressed accordingly to correct for
overly aggressive boosting.

Testing shows that this is fairly consistently positive in one
metric or another -- some clips that show significant decreases
in quality have better buffering characteristics, others show
improvenents in both.

Change-Id: I924c89aa9bdb210271f2e03311e63de3f1f8f920
2011-07-18 11:48:05 -04:00
James Berry 6b6f367c3d bug fix vpx_copy_and_extend_frame size issue
vpx_copy_and_extend_frame could incorrectly
resize uv frames which could result in a crash.

Change-Id: Ie96f7078b1e328b3907a06eebeee44ca39a2e898
2011-07-14 15:58:15 -04:00
Johann 211694f67e Merge "update x86 asm for loopfilter" 2011-07-13 04:10:03 -07:00
Johann 8f910594bd Merge "Update armv6 loopfilter to new interface" 2011-07-13 04:09:55 -07:00
Johann 1a219c22b1 Merge "Update armv7 loopfilter to new interface" 2011-07-13 04:09:42 -07:00
Johann d9b825cff2 Merge "New loop filter interface" 2011-07-13 04:09:26 -07:00
Attila Nagy c231b0175d Update armv6 loopfilter to new interface
Change-Id: I5fe581d797571a7a9432fbd17fc557591d0c1afa
2011-07-12 12:14:51 +03:00
Attila Nagy 283b0e25ac Update armv7 loopfilter to new interface
Change-Id: I65105a9c63832669237e6a6a7fcb4ea3ea683346
2011-07-12 12:12:25 +03:00
Johann 01433c5043 update x86 asm for loopfilter
Change-Id: I1ed739522db7c00c189851c7095c1b64ef6412ce
2011-07-08 09:23:38 -04:00
Johann 6ae12c415e Merge "clean up warnings when building arm with rtcd" 2011-07-08 05:16:09 -07:00
Attila Nagy 622958449b New loop filter interface
Separate simple filter with reduced no. of parameters.
MB filter level picking based on precalculated table. Level table updated for
each frame. Inside and edge limits precalculated and updated just when
sharpness changes. HEV threshhold is constant.
ARM targets use scalars and others vectors.

Change works only with --target=generic-gnu
All other targets have to be updated!

Change-Id: I6b73aca6b525075b20129a371699b2561bd4d51c
2011-07-08 09:31:41 +03:00
John Koleszar b4f70084cc Merge "Properly use GET_GOT/RESTORE_GOT when using GLOBAL()." 2011-07-01 07:14:34 -07:00
Ronald S. Bultje c8a23ad3f4 Properly use GET_GOT/RESTORE_GOT when using GLOBAL().
This should fix binaries using PIC on x86-32. Also should
fix issue 343.

Change-Id: I591de3ad68c8a8bb16054bd8f987a75b4e2bad02
2011-06-30 14:04:27 -07:00
Johann 6611f66978 clean up warnings when building arm with rtcd
Change-Id: I3683cb87e9cb7c36fc22c1d70f0799c7c46a21df
2011-06-29 10:51:41 -04:00
John Koleszar f3a13cb236 Merge "Use MAX_ENTROPY_TOKENS and ENTROPY_NODES more consistently" 2011-06-29 07:29:59 -07:00
Johann dc004e8c17 Merge "Avoid text relocations in ARM vp8 decoder" 2011-06-28 16:34:10 -07:00
Johann 02c30cdeef Merge "utilize preload in ARMv6 MC/LPF/Copy routines" 2011-06-28 16:33:45 -07:00
John Koleszar b32da7c3da Use MAX_ENTROPY_TOKENS and ENTROPY_NODES more consistently
There were many instances in the code of vp8_coef_tokens and
vp8_coef_tokens-1, which was a preprocessor macro despite the naming
convention. Replace these with MAX_ENTROPY_TOKENS and ENTROPY_NODES,
respectively.

Change-Id: I72c4f6c7634c94e1fa066cd511471e5592c748da
2011-06-28 17:03:55 -04:00
Stefan Holmer 7296b3f922 New ways of passing encoded data between encoder and decoder.
With this commit frames can be received partition-by-partition
from the encoder and passed partition-by-partition to the
decoder.

At the encoder-side this makes it easier to split encoded
frames at partition boundaries, useful when packetizing
frames. When VPX_CODEC_USE_OUTPUT_PARTITION is enabled,
several VPX_CODEC_CX_FRAME_PKT packets will be returned
from vpx_codec_get_cx_data(), containing one partition
each. The partition_id (starting at 0) specifies the decoding
order of the partitions. All partitions but the last has
the VPX_FRAME_IS_FRAGMENT flag set.

At the decoder this opens up the possibility of decoding partition
N even though partition N-1 was lost (given that independent
partitioning has been enabled in the encoder) if more info
about the missing parts of the stream is available through
external signaling.

Each partition is passed to the decoder through the
vpx_codec_decode() function, with the data pointer pointing
to the start of the partition, and with data_sz equal to the
size of the partition. Missing partitions can be signaled to
the decoder by setting data != NULL and data_sz = 0. When
all partitions have been given to the decoder "end of data"
should be signaled by calling vpx_codec_decode() with
data = NULL and data_sz = 0.

The first partition is the first partition according to the
VP8 bitstream + the uncompressed data chunk + DCT address
offsets if multiple residual partitions are used.

Change-Id: I5bc0682b9e4112e0db77904755c694c3c7ac6e74
2011-06-28 11:10:17 -04:00
Stefan Holmer 4cb0ebe5b2 Adding support for independent partitions
Adding support in the encoder for generating
independent residual partitions by forcing
equal probabilities over the prev coef entropy
contexts.

Change-Id: I402f5c353255f3ca20eae2620af739f6a498cd21
2011-06-28 11:10:17 -04:00
Mike Hommey e3f850ee05 Avoid text relocations in ARM vp8 decoder
The current code stores pointers to coefficient tables and loads them to
access the tables contents. As these pointers are stored in the code
sections, it means we end up with text relocations. eu-findtextrel will
thus complain about code not compiled with -fpic/-fPIC.

Since the pointers are stored in the code sections, we can actually cheat
and let the assembler generate relative addressing when accessing the
coefficient tables, and just load their location with adr.

Change-Id: Ib74ae2d3f2bab80b29991355f2dbe6955f38f6ae
2011-06-28 09:11:40 +02:00
Fritz Koenig be99868bd1 Fix after removal of B_MODE_INFO
Change Ieb746989: Removed B_MODE_INFO missed this.

Change-Id: I32202555581cc2a5d45e729c6650ada4d2df55d3
2011-06-27 09:43:21 -07:00
Adrian Grange deca8cfc44 Fixed initialization of frame buffer ref counters
Only the first frame buffer ref counter was being initialized
because the index was fixed at 0 rather than using i.

Change-Id: Ib842298be4a5e3607f9e21c2cd4bfbee4054ffc4
2011-06-24 08:43:40 -07:00
James Berry 2bd90c13a0 get/set reference buffer dimension check added
vp8_yv12_copy_frame_ptr() expects same size
buffers which was not previously gaurenteed.
Using an improperly allocated buffer would
cause a crash before.

Change-Id: I904982313ce9352474f80de842013dcd89f48685
2011-06-22 13:36:24 -04:00
Taekhyun Kim 458fb8f491 utilize preload in ARMv6 MC/LPF/Copy routines
About 9~10% decoding perf improvement on non-Neon ARM cpus

Change-Id: I7dc2a026764e84e9c2faf282b4ae113090326837
2011-06-17 14:04:53 -07:00
Scott LaVarnway 223d1b54cf Populate bmi for B_PRED only
Small decode performance gain (~1%) on keyframes.  No
noticeable gains on encode.  Also changed pick_intra4x4mby_modes()
to read the above and left block modes for keyframes only.

Change-Id: I1f4885252f5b3e9caf04d4e01e643960f910aba5
2011-06-13 17:14:11 -04:00
Scott LaVarnway e71a010646 Calc ref_frame_cost once per frame
instead of every macro block.

Change-Id: I2604e94c6b89e3a8457777e21c8c38406d55b165
2011-06-13 09:58:03 -04:00
Johann 79327be6c7 use GCC inline magic
Better fix for #326. ICC happens to support the inline magic

Change-Id: Ic367eea608c88d89475cb7b05d73500d2a1bc42b
2011-06-08 16:19:37 -04:00
Scott LaVarnway 773768ae27 Removed B_MODE_INFO
Declared the bmi in BLOCKD as a union instead of B_MODE_INFO.
Then removed B_MODE_INFO completely.

Change-Id: Ieb7469899e265892c66f7aeac87b7f2bf38e7a67
2011-06-02 13:46:41 -04:00
Scott LaVarnway 4f586f7bd0 Broken EC after MODE_INFO size reduction
This patch fixes the compiler errors and the seg fault
when running decode_with_partial_drops.

Change-Id: I7c75369e2fef81d53b790d5dabc327218216838b
2011-05-26 15:13:00 -04:00
John Koleszar 1fe5070b76 Merge "Do not copy data between encoder reference buffers." 2011-05-26 09:58:26 -07:00
Scott LaVarnway a39321f37e Use int_mv instead of MV in vp8_mv_cont
Less operations.

Change-Id: Ibb9cd5ae66b8c7c681c9a654d551c8729c31c3ae
2011-05-24 16:01:12 -04:00
Scott LaVarnway e11f21af9a MODE_INFO size reduction
Declared the bmi in MODE_INFO as a union instead of B_MODE_INFO.
This reduced the memory footprint by 518,400 bytes for 1080
resolutions.  The decoder performance improved by ~4% for the
clip used and the encoder showed very small improvements. (0.5%)
This reduction was first mentioned to me by John K. and in a
later discussion by Yaowu.
This is WIP.

Change-Id: I8e175fdbc46d28c35277302a04bee4540efc8d29
2011-05-24 13:24:52 -04:00
Johann 6d82d2d22e Merge "Fixed iwalsh_neon build problems with RVDS4.1" 2011-05-20 07:51:11 -07:00
Scott LaVarnway 914f7c36d7 Merge "Make hor UV predict ~2x faster (73 vs 132 cycles) using SSSE3." 2011-05-19 11:22:01 -07:00
John Koleszar c684d5e5f2 Merge "changed configure option name to reduce confusion" 2011-05-19 11:17:08 -07:00
John Koleszar 7def902261 Fix segv without --enable-error-concealment
Missed wrapping one function call in #if CONFIG_ERROR_CONCEALMENT.

Change-Id: I5746b1e6e4531670dbed1130467331fe309bdcae
2011-05-19 13:57:45 -04:00
Stefan Holmer d04f852368 Adding error-concealment to the decoder.
The error-concealer is plugged in after any motion vectors have been
decoded. It tries to estimate any missing motion vectors from the
motion vectors of the previous frame. Intra blocks with missing
residual are replaced with inter blocks with estimated motion vectors.

This feature was developed in a separate sandbox
(sandbox/holmer/error-concealment).

Change-Id: I5c8917b031078d79dbafd90f6006680e84a23412
2011-05-19 13:46:33 -04:00
Attila Nagy f96d56c4aa Fixed iwalsh_neon build problems with RVDS4.1
rvct 4.1 was complaining about vstmia.16, store multiple expects 64 data type.
optimized the implementation.

Change-Id: I0701052cabd685c375637bbc3796ff6d88f5972c
2011-05-19 10:27:26 +03:00
Scott LaVarnway 6b25501bf1 Using int_mv instead of MV
The compiler produces better assembly when using int_mv
for assignments.  The compiler shifts and ors the two 16bit
values when assigning MV.

Change-Id: I52ce4bc2bfbfaf3f1151204b2f21e1e0654f960f
2011-05-12 11:08:16 -04:00
Johann df2023a6cb set up Global Offset Table in recon
global values were being referenced, but the GOT was not being set up.
as the GOT is only required for PIC, this issue wasn't caught in the
default configuration.

Change-Id: I8006e53776139362a76f2c80cf9d0f8458602b2f
http://code.google.com/p/webm/issues/detail?id=328
2011-05-10 15:58:56 -04:00
Johann a7d4d3c550 clean up unused variable warnings
Change-Id: I9467d7a50eac32d8e8f3a2f26db818e47c93c94b
2011-05-09 12:56:20 -04:00
Aron Rosenberg eeb8117303 Fix semaphore emulation on Windows
The existing emulation of posix semaphores on Windows uses SetEvent()
and WaitForSingleObject(), which implements a binary semaphore, not a
counting semaphore as implemented by posix. This causes deadlock when
used with the expected posix semantics. Instead, this patch uses the
CreateSemaphore() and ReleaseSemaphore() calls (introduced in Windows
2000) which have the expected behavior.

This patch also reverts commit eb16f00, which split a semaphore that
was being used with counting semantics into two binary semaphores.
That commit is unnecessary with corrected emulation.

Change-Id: If400771536a27af4b0c3a31aa4c4e9ced89ce6a0
2011-05-06 00:13:59 -04:00
Johann ca5c1b17a2 Merge "Loopfilter NEON: Use VMOV for constant vectors instead of VLD." 2011-05-05 06:16:21 -07:00
Yunqing Wang aeb86d615c Merge "Runtime detection of available processor cores." 2011-05-05 04:59:54 -07:00
Attila Nagy a6aa389d2f Loopfilter NEON: Use VMOV for constant vectors instead of VLD.
Change-Id: I562b6e01c32bb51d00f3b95faf757fc7dc29a3a3
2011-05-04 11:29:23 +03:00
John Koleszar c09d8c1419 Merge "Fix documentation typos" 2011-05-02 06:50:22 -07:00
John Koleszar a66d8d33dd Fix compile error with --enable-postproc-visualizer
Typo.

Change-Id: I9cc6a4587c3d93c9f0da5e101d376741fc9622a4
2011-05-02 09:28:37 -04:00
Thijs Vermeir 8942f70cdf Fix documentation typos
Change-Id: I97124670926433bf1593c91660d8b8f8482ea9ce
2011-04-30 09:34:59 +02:00
Ronald S. Bultje 5a23352c03 Make hor UV predict ~2x faster (73 vs 132 cycles) using SSSE3.
Change-Id: I658a1df7d825f820573cb2d11ad402f9d2791035
2011-04-29 11:52:09 -07:00
Yaowu Xu 57ad189129 changed configure option name to reduce confusion
Renamed configure option "enable-psnr" to "enable-internal-stats" to
better reflect the purpose of the option and eliminate the confusion
reported in http://code.google.com/p/webm/issues/detail?id=35

Change-Id: If72df6fdb9f1e33dab1329240ba4d8911d2f1f7a
2011-04-29 09:39:05 -07:00
Scott LaVarnway 1b2abc5f49 Merge "Consolidated build inter predictors" 2011-04-29 07:13:49 -07:00
James Berry f10732554b bug fix removed inline from recon_wrapper_sse2.c
removed inline from recon_wrapper_sse2.c to build
for visual stuido

Change-Id: I74a3482950448e2cdb30e9cd7087145b440d8a22
2011-04-28 15:12:00 -04:00
Scott LaVarnway 219ba87a93 Merge "Use psadbw to get the sum of bytes in a line." 2011-04-28 07:58:20 -07:00
Scott LaVarnway ccd6f7ed77 Consolidated build inter predictors
Code cleanup.

Change-Id: Ic8b0167851116c64ddf08e8a3d302fb09ab61146
2011-04-28 10:53:59 -04:00
Ronald S. Bultje 1e7ded69cf Use psadbw to get the sum of bytes in a line.
Thanks Jason for pointing that out on #vp8. ;-).

Change-Id: I5330a753e752a8704b78a409597472628e0b26a5
2011-04-27 13:49:21 -07:00
Scott LaVarnway 2e102855f4 Removed unused code in reconinter
The skip flag is never set by the encoder for SPLITMV.

Change-Id: I5ae6457edb3a1193cb5b05a6d61772c13b1dc506
2011-04-27 15:25:32 -04:00
Ronald S. Bultje 1083fe4999 SSE2/SSSE3 optimizations for build_predictors_mbuv{,_s}().
decoding

before
10.425
10.432
10.423
=10.426

after:
10.405
10.416
10.398
=10.406, 0.2% faster

encoding

before
14.252
14.331
14.250
14.223
14.241
14.220
14.221
=14.248

after
14.095
14.090
14.085
14.095
14.064
14.081
14.089
=14.086, 1.1% faster

Change-Id: I483d3d8f0deda8ad434cea76e16028380722aee2
2011-04-27 11:31:27 -07:00
Johann d5c46bdfc0 Merge "remove simpler_lpf" 2011-04-25 14:51:07 -07:00
Johann 01527e743f remove simpler_lpf
the decision to run the regular or simple loopfilter is made outside the
function and managed with pointers

stop tracking the option in two places. use filter_type exclusively

Change-Id: I39d7b5d1352885efc632c0a94aaf56b72cc2fe15
2011-04-25 17:37:41 -04:00
John Koleszar cfbfd39de8 Merge "Change rc undershoot/overshoot semantics" 2011-04-25 10:49:32 -07:00
John Koleszar aa926fbd27 Add rc_max_intra_bitrate_pct control
Adds a control to limit the maximum size of a keyframe, as a function of
the per-frame bitrate. See this thread[1] for more detailed discussion:

[1]: http://groups.google.com/a/webmproject.org/group/codec-devel/browse_thread/thread/271b944a5e47ca38

Change-Id: I7337707642eb8041d1e593efc2edfdf66db02a94
2011-04-25 13:47:14 -04:00
Scott LaVarnway 3698c1f620 Removed dc_diff from MB_MODE_INFO
The dc_diff flag is used to skip loopfiltering.  Instead
of setting this flag in the decoder/encoder, we now check
for this condition in the loopfilter.

Change-Id: Ie2b9cdf9e0f4e8b932bbd36e0878c05bffd28931
2011-04-21 14:38:36 -04:00
Scott LaVarnway 7a49accd0b Removed force_no_skip
force_no_skip is always set to zero.

Change-Id: I89b61c5e0bee34627a9c07c05f3517e1db76af77
2011-04-20 15:45:12 -04:00
Scott LaVarnway 09c933ea80 Removed redundant checks of the mode_info_context flags
Code cleanup.  The build inter predictor functions are
redundantly checking the mode_info_context for either
INTRA_FRAME or SPLITMV.

Change-Id: I4d58c3a5192a4c2cec5c24ab1caf608bf13aebfb
2011-04-20 14:06:40 -04:00
Attila Nagy 43464e94ed Do not copy data between encoder reference buffers.
Golden and ALT reference buffers were refreshed by copying from
the new buffer. Replaced this by index manipulation.
Also moved all the reference frame updates to one function for
easier tracking.

Change-Id: Icd3e534e7e2c8c5567168d222e6a64a96aae24a1
2011-04-20 15:26:55 +03:00
Johann 4a2b684ef4 modify SAVE_XMM for potential 64bit use
the win64 abi requires saving and restoring xmm6:xmm15. currently
SAVE_XMM and RESTORE XMM only allow for saving xmm6:xmm7. allow
specifying the highest register used and if the stack is unaligned.

Change-Id: Ica5699622ffe3346d3a486f48eef0206c51cf867
2011-04-19 10:42:45 -04:00
Johann a9b465c5c9 Merge "Add save/restore xmm registers in x86 assembly code" 2011-04-19 06:32:10 -07:00
Johann c7cfde42a9 Add save/restore xmm registers in x86 assembly code
Went through the code and fixed it. Verified on Windows.

Where possible, remove dependencies on xmm[67]

Current code relies on pushing rbp to the stack to get 16 byte
alignment. This broke when rbp wasn't pushed
(vp8/encoder/x86/sad_sse3.asm). Work around this by using unaligned
memory accesses. Revisit this and the offsets in
vp8/encoder/x86/sad_sse3.asm in another change to SAVE_XMM.

Change-Id: I5f940994d3ebfd977c3d68446cef20fd78b07877
2011-04-18 16:30:38 -04:00
Yunqing Wang d5069b5af0 Merge "Handle long delay between video frames in multi-thread decoder(issue 312)" 2011-04-18 10:11:41 -07:00
Yunqing Wang 8ba58951e9 Handle long delay between video frames in multi-thread decoder(issue 312)
This is reported by m...@hesotech.de (see issue 312):
"The decoder causes an access violation
when you decode the first frame, then make a pause of about
60 seconds and then decode further frames. But only if
vpx_codec_dec_cfg_t.threads> 1.

This is caused by a timeout of WaitForSingleObject.
When I change the definition of VPXINFINITE to INFINITE(0xFFFFFFFF),
the problem is solved."

Reproduced the crash and verified the changes on Windows platform.
This brings the behavior inline with the other platforms using sem_wait().

Change-Id: I27b32f90bce05846ef2684b50f7a88f292299da1
2011-04-15 17:27:26 -04:00
Johann 487c0299c9 remove dead code, add missing RESTORE_XMM
vp8_filter_block1d16_h4_ssse3 was never called

because UNSHADOW_ARGS moves the stack by 'mov rsp, rbp', the issue was
masked. however, if/when win64 used those registers for persistant data,
issues could/will arise.

Change-Id: I56d6effca0aeba1f86082689771cb10145d39651
2011-04-15 10:11:53 -04:00
John Koleszar a3399291ad Fix off-by-one in copy_and_extend_plane
Should only copy h lines, not h+1.

Change-Id: I802a85686635900459c6dc79596189033e5298d8
2011-04-15 08:44:39 -04:00
John Koleszar 88841f1059 Refactor lookahead ring buffer
This patch cleans up the source buffer storage and copy mechanism to
allow access through a standard push/pop/peek interface. This approach
also avoids an extra copy in the case where the source is not a
multiple of 16, fixing issue #102.

Change-Id: I05808c39f5743625cb4c7af54cc841b9b10fdbd9
2011-04-13 14:26:45 -04:00
John Koleszar c99f9d7abf Change rc undershoot/overshoot semantics
This patch changes the rc_undershoot_pct and rc_overshoot_pct controls
to set the "aggressiveness" of rate adaptation, by limiting the
amount of difference between the target buffer level and the actual
buffer level which is applied to the target frame rate for this frame.

This patch was initially provided by arosenberg at logitech.com as
an attachment to issue #270. It was modified to separate these controls
from the other unrelated modifications in that patch, as well as to
use the pre-existing variables rather than introducing new ones.

Change-Id: Id542e3f5667dd92d857d5eabf29878f2fd730a62
2011-04-12 20:49:33 -04:00
John Koleszar a9ce3e3834 Remove unused files
Change-Id: I36ca3f2f4620358033da34daf764f0b388dacd08
2011-04-11 10:34:40 -04:00
Yunqing Wang 3d6815817c Use full-pixel MV in mvsadcost calculation
MV sad cost error is only used in full-pixel motion search,
which only need full-pixel resolution instead of quarter-pixel
resolution. This change reduced mvsadcost table size, and
removed unneccessary pamameter passing since this table is
constant once it is generated.

Change-Id: I9f931e55f6abc3c99011321f1dfb2f3562e6f6b0
2011-04-01 16:41:58 -04:00
Attila Nagy 297b27655e Runtime detection of available processor cores.
Detect the number of available cores and limit the thread allocation
accordingly. On decoder side limit the number of threads to the max
number of token partition.

Core detetction works on Windows and
Posix platforms, which define _SC_NPROCESSORS_ONLN or _SC_NPROC_ONLN.

Change-Id: I76cbe37c18d3b8035e508b7a1795577674efc078
2011-03-31 10:23:01 +03:00
John Koleszar 769c74c0ac Merge "Increase static linkage, remove unused functions" 2011-03-21 04:51:51 -07:00
John Koleszar 429dc676b1 Increase static linkage, remove unused functions
A large number of functions were defined with external linkage, even
though they were only used from within one file. This patch changes
their linkage to static and removes the vp8_ prefix from their names,
which should make it more obvious to the reader that the function is
contained within the current translation unit. Functions that were
not referenced were removed.

These symbols were identified by:

  $ nm -A libvpx.a | sort -k3 | uniq -c -f2 | grep ' [A-Z] ' \
    | sort | grep '^ *1 '

Change-Id: I59609f58ab65312012c047036ae1e0634f795779
2011-03-17 20:53:47 -04:00
John Koleszar de50520a8c apple: include proper mach primatives
Fixes implicit declaration warning for 'mach_task_self'. This change
is an update to Change I9991dedd1ccfddc092eca86705ecbc3b764b799d,
which fixed this issue for the decoder but not the encoder.

Change-Id: I9df033e81f9520c4f975b7a7cf6c643d12e87c96
2011-03-16 13:59:32 -04:00
John Koleszar 27972d2c1d Move build_intra_predictors_mby to RTCD framework
The vp8_build_intra_predictors_mby and vp8_build_intra_predictors_mby_s
functions had global function pointers rather than using the RTCD
framework. This can show up as a potential data race with tools such as
helgrind. See https://bugzilla.mozilla.org/show_bug.cgi?id=640935
for an example.

Change-Id: I29c407f828ac2bddfc039f852f138de5de888534
2011-03-11 13:04:50 -05:00
Scott LaVarnway 861175ef00 Removed vp8_block2type
and used defines instead.

Change-Id: Idb56e0295d004793f406dfd2d8d8c546aad62e03
2011-02-24 14:35:18 -05:00
John Koleszar c764c2a20f Merge "clean up unused files" 2011-02-18 06:33:05 -08:00
John Koleszar 3ed8fe8778 remove unused vp8_predict_dc function
Change-Id: I64fa47889c54cfed094a674c49ef0996d49bdd42
2011-02-18 09:12:20 -05:00
John Koleszar cbf923b12c clean up unused files
Removed a number of files that were unused or little-used.

Change-Id: If9ae5e5b11390077581a9a879e8a0defe709f5da
2011-02-18 09:09:49 -05:00
John Koleszar ac10665ad8 Merge "Removed unused vp8_recon_intra4x4mb function" 2011-02-17 11:30:13 -08:00
Scott LaVarnway 07f7b66fae Removed unused vp8_recon_intra4x4mb function
Change-Id: I4a328ce152d9dbe6b0d1606d1b523e8e7bfb468e
2011-02-17 13:34:38 -05:00
John Koleszar 02321de0f2 Fix relative include paths
Allow compiling without adding vp8/{common,encoder,decoder} to the
include paths.

Change-Id: Ifeb5dac351cdfadcd659736f5158b315a0030b6c
2011-02-10 15:09:44 -05:00
Johann 7d8199f0c3 Merge "Adds armv6 optimized variance calculation" 2011-02-10 06:06:46 -08:00
John Koleszar a39b5af10b Merge "Put more code under #if CONFIG_MULTITHREAD." 2011-02-09 08:31:36 -08:00
Gaute Strokkenes 315e3c2518 Put more code under #if CONFIG_MULTITHREAD.
Change-Id: Icf4b692099d7d249fe3553852b1022b027b28e4b
2011-02-09 11:21:18 -05:00
Tero Rintaluoma cb14764fab Adds armv6 optimized variance calculation
Adds vp8_sub_pixel_variance16x16_armv6 function to encoder. Integrates
ARMv6 optimized bilinear interpolations from vp8/common/arm/armv6
and adds new assembly file for variance16x16 calculation.
 - vp8_filter_block2d_bil_first_pass_armv6   (integrated)
 - vp8_filter_block2d_bil_second_pass_armv6  (integrated)
 - vp8_variance16x16_armv6 (new)
 - bilinearfilter_arm.h (new)
Change-Id: I18a8331ce7d031ceedd6cd415ecacb0c8f3392db
2011-02-09 10:23:43 -05:00
Johann e5aaac24bb clean up bilinear filter
make reference version of bilinear_filters short.
use reference versions of bilinear_filters and sub_pel_filters when
possible.

recognize that Width was being passed into
filter_block2d_bil_first_pass multiple times. ARM version had already
fixed this. propegate to C.

change references to src_pixels_per_line to src_pitch and standardize on
src/dst (instead of input/output).

recognize that first_pass is only run in the verticle and second_pass
only horizontal. ARM version had already fixed this. propegate to C

Change-Id: I292d376d239a9a7ca37ec2bf03cc0720606983e2
2011-02-08 17:42:54 -05:00
Johann 40dcae9c2e clarify *_offsets.asm differences
it's difficult to mux the *_offsets.c files because of header conflicts.
make three instead, name them consistently and partititon the contents
to allow building them as required.

Change-Id: I8f9768c09279f934f44b6c5b0ec363f7943bb796
2011-02-08 16:35:43 -05:00
Johann 3273c7b679 move one of the offset files
common/arm/vpx_asm_offsets moves up a level. prepare for muxing with
encoder/arm/vpx_vp8_enc_asm_offsets

Change-Id: I89a04a5235447e66571995c9d9b4b6edcb038e24
2011-02-07 11:35:30 -05:00
Johann bb9c95ea53 remove unused dboolhuff code
we were holding on to this "just in case." purge it instead

Change-Id: I77a367b36d0821d731019f2566ecfffdae1d4b8a
2011-02-04 16:00:00 -05:00
Yunqing Wang 350ffe8dae Merge "Improve MV prediction in vp8_pick_inter_mode() for speed>3" 2011-02-04 10:10:15 -08:00
Gaute Strokkenes ffc6aeef14 Remove duplicate loopfilter parameters.
Change-Id: I0d41415e3961c2c9492d342290c1999f9d02e6d8
2011-02-04 14:55:02 +00:00
Gaute Strokkenes bf5f585b0d Make vp8_adjust_mb_lf_value return the updated value rather than
manipulating it in situ via a pointer.

Change-Id: If4a87a4eccd84f39577c0e91e171245f4954c5cf
2011-02-03 19:24:16 +00:00
Johann f3cb9ae459 Merge "Adds "armvX-none-rvct" targets" 2011-01-28 09:03:58 -08:00
Yunqing Wang 7cbe684ef5 Improve MV prediction in vp8_pick_inter_mode() for speed>3
Applied same method used in vp8_rd_pick_inter_mode() to improve
the accuracy of MV prediction.

Change-Id: Ia50ae26208b18482695601f32febd99fe89fbc17
2011-01-28 10:00:20 -05:00
Tero Rintaluoma 11a222f5d9 Adds "armvX-none-rvct" targets
Adds following targets to configure script to support RVCT compilation
without operating system support (for Profiler or bare metal images).
 - armv5te-none-rvct
 - armv6-none-rvct
 - armv7-none-rvct

To strip OS specific parts from the code "os_support"-config was added
to script and CONFIG_OS_SUPPORT flag is used in the code to exclude OS
specific parts such as OS specific includes and function calls for
timers and threads etc. This was done to enable RVCT compilation for
profiling purposes or running the image on bare metal target with
Lauterbach.

Removed separate AREA directives for READONLY data in armv6 and neon
assembly files to fix the RVCT compilation. Otherwise
"ldr <reg>, =label" syntax would have been needed to prevent linker
errors. This syntax is not supported by older gnu assemblers.

Change-Id: I14f4c68529e8c27397502fbc3010a54e505ddb43
2011-01-28 12:47:39 +02:00
Yunqing Wang cac54404b9 Remove copies of same functions
Reduce the code size.

Change-Id: I2e1998557a3c8776e262c442fd758c25e17aff7a
2011-01-26 15:37:00 -05:00
John Koleszar 2f0331c90c Merge "Implement error tracking in the decoder" 2011-01-19 05:51:00 -08:00
Henrik Lundin 67fb3a5155 Implement error tracking in the decoder
A new vpx_codec_control called VP8D_GET_FRAME_CORRUPTED. The output
from the function is non-zero if the last decoded frame contains
corruption due to packet losses.

The decoder is also modified to accept encoded frames of zero length.
A zero length frame indicates to the decoder that one or more frames
have been completely lost. This will mark the last decoded reference
buffer as corrupted. The data pointer can be NULL if the length is
zero.

Change-Id: Ic5902c785a281c6e05329deea958554b7a6c75ce
2011-01-19 09:53:21 +01:00
Henrik Lundin 48c28fc42c Remove unused local variables
Removing unused local variables causing compiler warnings in
Visual Studio.

Change-Id: I0e2096303be1fdbc01428a6e57cca9796bb32c8a
2011-01-11 15:22:19 +01:00
Paul Wilkins e0846c9c8c CQ Mode
The merge includes hooks to for CQ mode and other code
changes merged from the test branch.

CQ mode attempts to maintain a more stable quantizer within a clip
whilst also trying to adhere to a guidline maximum bitrate.

The existing target data rate parameter is used to specify the
guideline maximum bitrate.

A new parameter allows the user to specify a target CQ level.

For normal (non kf/gf/arf) frames, the quantizer will not drop BELOW the
user specified value (0-63). However, in some cases the encoder may
choose to impose a target CQ that is above that specified by the user,
if it estimates that consistent use of the target value is not compatible
with guideline maximum bitrate.

Change-Id: I2221f9eecae8cc3c431d36caf83503941b25e4c1
2011-01-07 18:46:29 +00:00
Yunqing Wang a864678cdb Always update last_frame_type
Scott pointed out that last_frame_type only gets updated while
loopfilter exists. Since last_frame_type is also needed in
motion search now, it needs to be updated every frame.

Change-Id: I9203532fd67361588d4024628d9ddb8e391ad912
2010-12-29 10:28:35 -05:00
John Koleszar b0da9b399d Add psnr/ssim tuning option
Add a new encoder control, VP8E_SET_TUNING, to allow the application
to inform the encoder that the material will benefit from certain
tuning. Expose this control as the --tune option to vpxenc. The args
helper is expanded to support enumerated arguments by name or value.

Two tunings are provided by this patch, PSNR (default) and SSIM.
Activity masking is made dependent on setting --tune=ssim, as the
current implementation hurts speed (10%) and PSNR (2.7% avg,
10% peak) too much for it to be a default yet.

Change-Id: I110d969381c4805347ff5a0ffaf1a14ca1965257
2010-12-17 10:01:05 -05:00
Johann 825adc464f shrink TOKENEXTRA and vp8_extra_bit_struct
Per John's previous change, shrink TOKENEXTRA from 20 to 8 bytes
original: b7b1e6fb
reverted: 41f4458a

Also drop unused field from vp8_extra_bit_struct

Update ARM ASM to deal with this change. In particular, Extra is signed
and needs to be sign-extended when loaded.

Change-Id: Ibd0ddc058432bc7bb09222d6ce4ef77e93a30b41
2010-12-14 10:32:50 -05:00
John Koleszar b1aa54ab26 remove unused temporal preproc code
This code is unused, as the current preproc implementation uses the
same spatial filter that postproc uses.

Change-Id: Ia06d5664917d67283f279e2480016bebed602ea7
2010-12-13 16:47:59 -05:00
Fritz Koenig e0cf330cde vp8 fast quantizer sse2 optimizations for eob.
Changed the end of block computation to use pmaxw.  Removed
additional pushing and popping of registers that was not needed.

Change-Id: I08cb9b424513cd8a2c7ad8cea53b4e2adc66ef98
2010-12-09 15:00:30 -08:00
Yaowu Xu d49da085c0 correct errors in token alphabet descriptions
There were a few errors in the comment section that describe VP8 token
alphabet table.

Change-Id: Ie6728a0e08bc3798893221b60408d5b201064bdc
2010-11-16 10:51:43 -08:00
Fritz Koenig 647df00f30 postproc : Re-work posproc calling to allow more flags.
Debugging in postproc needs more flags to allow for specific
block types to be turned on or off in the visualizations.

Must be enabled with --enable-postproc-visualizer during
configuration time.

Change-Id: Ia74f357ddc3ad4fb8082afd3a64f62384e4fcb2d
2010-11-10 14:14:46 -08:00
Fritz Koenig 0e7b60617f postproc : Update visualizations.
Change color reference frame to blend the macro block edge.
This helps with layering of visualizations.

Add block coloring for intra prediction modes.

Change-Id: Icefe0e189e26719cd6937cebd6727efac0b4d278
2010-11-04 10:35:02 -07:00
Fritz Koenig 0a29bd9793 postproc : Fix display of motion vectors.
Split motion vectors were all being treated as 4x4
blocks.  Now correctly handle 16x8, 8x16, 8x8, 4x4
blocks.

Change-Id: Icf345c5e69b5e374e12456877ed7c41213ad88cc
2010-11-02 13:29:13 -07:00
Fritz Koenig 9f61a83bf9 postproc : Added SPLITMV visualization, fix line constrain.
Now draw 16 vectors for SPLITMV mode.

Fixed constrain line to block divide by zero issues.

Blend block was not centering the shaded area correctly.

Change-Id: I1edabd8b4e553aac8d980f7b45c80159e9202434
2010-11-01 13:27:13 -07:00
Timothy B. Terriberry c4d7e5e67e Eliminate more warnings.
This eliminates a large set of warnings exposed by the Mozilla build
 system (Use of C++ comments in ISO C90 source, commas at the end of
 enum lists, a couple incomplete initializers, and signed/unsigned
 comparisons).
It also eliminates many (but not all) of the warnings expose by newer
 GCC versions and _FORTIFY_SOURCE (e.g., calling fread and fwrite
 without checking the return values).
There are a few spurious warnings left on my system:

../vp8/encoder/encodemb.c:274:9: warning: 'sz' may be used
 uninitialized in this function
gcc seems to be unable to figure out that the value shortcut doesn't
 change between the two if blocks that test it here.

../vp8/encoder/onyx_if.c:5314:5: warning: comparison of unsigned
 expression >= 0 is always true
../vp8/encoder/onyx_if.c:5319:5: warning: comparison of unsigned
 expression >= 0 is always true
This is true, so far as it goes, but it's comparing against an enum, and the C
 standard does not mandate that enums be unsigned, so the checks can't be
 removed.

Change-Id: Iaf689ae3e3d0ddc5ade00faa474debe73b8d3395
2010-10-27 18:08:04 -07:00
Fritz Koenig a097e18964 postproc: Tweaks to line drawing and blending.
Turned down the blending level to make colored blocks obscure
the video less.
Not blending the entire block to give distinction to macro
block edges.
Added configuration so that macro block blending function can
be optimized.
Change to constrain line as to when dx and dy are computed.
Now draw two lines to form an arrow.

Change-Id: Id3ef0fdeeab2949a6664b2c63e2a3e1a89503f6c
2010-10-27 13:20:03 -07:00
Johann 787733d855 Merge "RTCD build is bringing old errors to light" 2010-10-27 09:59:01 -07:00
Fritz Koenig cf127474d8 vpxdec : Change --pp-debug-info to be a bit field.
This allows multiple post processor debug levels to be overlayed.
i.e. can show colored reference blocks and visual motion vectors.

Change-Id: Ic4a1df438445b9f5780fe73adb3126e803472e53
2010-10-27 09:53:37 -07:00
Fritz Koenig 36ff6a6743 Merge "postproc: Add mode and refrence frame visualizers." 2010-10-27 09:04:39 -07:00
Johann abcf36c758 RTCD build is bringing old errors to light
needs to be _recon_ not _recon_recon_

Change-Id: I7a8b9ddcb4fb72c2b723c563932c9ea52ff15982
2010-10-27 10:47:48 -04:00
Fritz Koenig a0ccc97d8a postproc: Add mode and refrence frame visualizers.
Post process option to color the block for either the mode
of the macro block, or the frame that the macro block references.

Change-Id: Ie498175497f2d20e3319924d352dc4ddc16f4134
2010-10-26 16:00:14 -07:00
John Koleszar d6c67f02c9 make vp8_recon16x16mb{,y} RTCD functions
ARM NEON has a platform specific version of vp8_recon16x16mb, though
it's just a stub to extract the various parameters from the
MACROBLOCKD struct and pass them to vp8_recon16x16mb_neon(). Using
that function's prototype directly will be a better long term solution,
but it's quite an invasive change.

Change-Id: I04273149e2ade34749e2d09e7edb0c396e1dd620
2010-10-26 13:23:36 -04:00
John Koleszar 19638c2309 arm: move unrolled loops back to generic code
Some of the ARM functions differed from their generic counterparts
only by unrolling their loops. Since this change may be useful
on other platforms, or might even supercede the looped version
in the generic case, move it back to the generic file.

This code is left under #if ARCH_ARM for now, but it may be worth
considering a different (possibly new) conditional for these. If
it turns out that this should be runtime selectable, these
functions will have to move to the RTCD infrastructure. Don't want
to take that step at this time without more profile data.

Change-Id: I4612fdbc606fbebba4971a690fb743ad184ff15f
2010-10-26 09:51:35 -04:00
John Koleszar d330a5876b arm: remove duplicate functions
These functions were true duplicates of functions present in the
generic code. This fixes some of the link errors when building
with --enable-shared --enable-pic.

Change-Id: Idff26599d510d954e439207883607ad6b74df20c
2010-10-26 09:37:44 -04:00
Fritz Koenig 1d70aaf08b Merge "Debug option for drawing motion vectors." 2010-10-25 15:40:22 -07:00
Fritz Koenig d1a4cce809 Debug option for drawing motion vectors.
Postproc level that uses Bresenham's line algorithm
to draw motion vectors onto the postproc buffer.

Change-Id: I34c7daa324f2bdfee71e84fcb1c50b90fa06f6fb
2010-10-25 15:39:04 -07:00
Johann 1376f061da reuse common loopfilter code
there were four versions for the regular and
macroblock loopfilters:
horizontal [y|uv]
vertical [y|uv]

this moves all the common code into 2 functions:
vp8_loop_filter_neon
vp8_mbloop_filter_neon

this provides no gain in performance. there's a bit
of jitter, but it trends down ~0.25-0.5%. however,
this is a huge gain maintenance. also, there is the
potential to drop some stack usage in the macroblock
loopfilter.

Change-Id: I91506f07d2f449631ff67ad6f1b3f3be63b81a92
2010-10-25 09:48:50 -04:00
Timothy B. Terriberry b71962fdc9 Add runtime CPU detection support for ARM.
The primary goal is to allow a binary to be built which supports
 NEON, but can fall back to non-NEON routines, since some Android
 devices do not have NEON, even if they are otherwise ARMv7 (e.g.,
 Tegra).
The configure-generated flags HAVE_ARMV7, etc., are used to decide
 which versions of each function to build, and when
 CONFIG_RUNTIME_CPU_DETECT is enabled, the correct version is chosen
 at run time.
In order for this to work, the CFLAGS must be set to something
 appropriate (e.g., without -mfpu=neon for ARMv7, and with
 appropriate -march and -mcpu for even earlier configurations), or
 the native C code will not be able to run.
The ASFLAGS must remain set for the most advanced instruction set
 required at build time, since the ARM assembler will refuse to emit
 them otherwise.
I have not attempted to make any changes to configure to do this
 automatically.
Doing so will probably require the addition of new configure options.

Many of the hooks for RTCD on ARM were already there, but a lot of
 the code had bit-rotted, and a good deal of the ARM-specific code
 is not integrated into the RTCD structs at all.
I did not try to resolve the latter, merely to add the minimal amount
 of protection around them to allow RTCD to work.
Those functions that were called based on an ifdef at the calling
 site were expanded to check the RTCD flags at that site, but they
 should be added to an RTCD struct somewhere in the future.
The functions invoked with global function pointers still are, but
 these should be moved into an RTCD struct for thread safety (I
 believe every platform currently supported has atomic pointer
 stores, but this is not guaranteed).

The encoder's boolhuff functions did not even have _c and armv7
 suffixes, and the correct version was resolved at link time.
The token packing functions did have appropriate suffixes, but the
 version was selected with a define, with no associated RTCD struct.
However, for both of these, the only armv7 instruction they actually
 used was rbit, and this was completely superfluous, so I reworked
 them to avoid it.
The only non-ARMv4 instruction remaining in them is clz, which is
 ARMv5 (not even ARMv5TE is required).
Considering that there are no ARM-specific configs which are not at
 least ARMv5TE, I did not try to detect these at runtime, and simply
 enable them for ARMv5 and above.

Finally, the NEON register saving code was completely non-reentrant,
 since it saved the registers to a global, static variable.
I moved the storage for this onto the stack.
A single binary built with this code was tested on an ARM11 (ARMv6)
 and a Cortex A8 (ARMv7 w/NEON), for both the encoder and decoder,
 and produced identical output, while using the correct accelerated
 functions on each.
I did not test on any earlier processors.

Change-Id: I45cbd63a614f4554c3b325c45d46c0806f009eaa
2010-10-25 09:23:29 -04:00
Timothy B. Terriberry 8f75ea6b5c Convert [4][4] matrices to [16] arrays.
Most of the code that actually uses these matrices indexes them as
 if they were a single contiguous array, and coverity produces
 reports about the resulting accesses that overflow the static
 bounds of the first row.
This is perfectly legal in C, but converting them to actual [16]
 arrays should eliminate the report, and removes a good deal of
 extraneous indexing and address operators from the code.

Change-Id: Ibda479e2232b3e51f9edf3b355b8640520fdbf23
2010-10-21 17:04:30 -07:00
Yaowu Xu fc2f8dafaf Merge "fixed a typo that mis-used Y plane stride for UV blocks." 2010-10-19 16:23:31 -07:00
Yunqing Wang 7804befb55 Fix one gcc compiler warning
../libvpx/vp8/encoder/bitstream.c: In function ‘pack_inter_mode_mvs’:
../libvpx/vp8/encoder/bitstream.c:1026: warning: array subscript has type ‘char’

Change-Id: Ic77491e0a172fa1821e5b3e914d0dc41fe87c00f
2010-10-14 15:15:35 -04:00
Jan Kratochvil 1fc294116a nasm: movhps compatibility QWORD->MMWORD
Filed for nasm as:
https://sourceforge.net/tracker/?func=detail&atid=106208&aid=3081103&group_id=6208

nasm just does not accept any size parameter for movhps:
1.asm:2: error: mismatch in operand sizes

Some parts of libvpx already use MMWORD for movhps and MMWORD is
defined-out so it is compatible both with yasm and nasm.

Provide nasm compatibility. No binary change by this patch with yasm on
{x86_64,i686}-fedora13-linux-gnu.

Change-Id: I4008a317ca87ec07c9ada958fcdc10a0cb589bbc
2010-10-04 20:47:19 -04:00
Jan Kratochvil 5cdc3a4c29 nasm: address labels 'rel label' vice 'wrt rip'
nasm does not support `label wrt rip', it requires `rel label'. It is
still fully compatible with yasm.

Provide nasm compatibility. No binary change by this patch with yasm on
{x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on
{x86_64,i686}-fedora13-linux-gnu have been checked as safe.

Change-Id: I488773a4e930a56e43b0cc72d867ee5291215f50
2010-10-04 19:47:54 -04:00
Jan Kratochvil e114f699f6 nasm: match instruction length (movd/movq) to parameters
nasm requires the instruction length (movd/movq) to match to its
parameters. I find it more clear to really use 64bit instructions when
we use 64bit registers in the assembly.

Provide nasm compatibility. No binary change by this patch with yasm on
{x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on
{x86_64,i686}-fedora13-linux-gnu have been checked as safe.

Change-Id: Id9b1a5cdfb1bc05697e523c317a296df43d42a91
2010-10-04 23:36:29 +02:00
Yaowu Xu 49fdb7c41e fixed a typo that mis-used Y plane stride for UV blocks.
Raised by Lei Yang, the Y plane stride was used for UV blocks.
This is clearly a typo. But as the comments in the code suggested
that this port of code has not been used yet, so the typo should
not have created any damage yet.

Change-Id: Iea895edc17469a51c803a8cc6d0fce65a1a7fc2f
2010-10-04 11:31:14 -07:00
Johann f143a81191 Merge "Fix valgrind errors in the NEON loop filters." 2010-10-01 06:18:53 -07:00
Timothy B. Terriberry a465076e02 Fix valgrind errors in the NEON loop filters.
Like the ARMv6 code, these functions were accessing values below
 the stack pointer, which can be corrupted by signal delivery at
 any time.
2010-09-30 20:40:45 -07:00
John Koleszar a047fee606 Merge "Fix loopfilter delta zero transitions" 2010-09-30 10:26:10 -07:00
Fritz Koenig 439b2ecd74 Merge "Optimizations on the loopfilters." 2010-09-29 10:47:01 -07:00
John Koleszar b9be7a464f Fix loopfilter delta zero transitions
Loopfilter deltas are initialized to zero on keyframes in the decoder.
The values then persist from the previous frame unless an update bit
is set in the bitstream. This data is not included in the entropy
data saved by the 'refresh entropy' bit in the bitstream, so it is
effectively an additional contextual element beyond the 3 ref-frames
and the entropy data.

The encoder was treating this delta update bit as update-if-nonzero,
meaning that the value would be refreshed even if it hadn't changed,
and more significantly, if the correct value for the delta changed
to zero, the update wouldn't be sent, and the decoder would preserve
the last (presumably non-zero) value.

This patch updates the encoder to send an update only if the value
has changed from the previously transmitted value. It also forces the
value to be transmitted in error resilient mode, to account for lost
context in the event of lost frames.

Change-Id: I56671d5b42965d0166ac226765dbfce3e5301868
2010-09-29 13:04:04 -04:00
Fritz Koenig 0964ef0e71 Optimizations on the loopfilters.
- Scheduling for Atom processors
- Combining of macros to allow for better interleaving
- Change from multiplies to adds for main filter
- Use of movhps/movlps to fill xmm registers without
  shifting and orring

Change-Id: I0b3500a5f58abf7085253ec92d64c8a96723040b
2010-09-28 12:01:34 -07:00
Timothy B. Terriberry 18dc92fd66 Add 4-tap version of 2nd-pass ARMv6 MC filter.
The existing code applied a 6-tap filter with 0's on either end.
We're already paying the branch penalty to avoid computing the two
 extra columns needed as input to this filter.
We might as well save time computing the filter as well.
This reduces the inner loop from 21 instructions to 16, the number
 of loads per iteration from 4 to 1, and the number of multiplies
 from 7 to 4.
The gain in overall decoding performance, however, is small (less
 than 1%).

This change also means we now valgrind clean on ARMv6, which is
 its real purpose.
The errors reported here were valgrind's fault (it does not detect
 that 0 times an uninitialized value is initialized), but Julian
 Seward says it would slow down valgrind considerably to make such
 checks.
Speeding up libvpx rather, even by a small amount, seems a much
 better idea if only to enable proper valgrind checking of the
 rest of the codec.

Change-Id: Ifb376ea195e086b60f61daf1097d8910c4d8ff16
2010-09-27 18:25:45 -07:00
John Koleszar 2b521ab551 move reconintra_mt to decoder (fixup)
Missed the .h file in the move.

Change-Id: Ib408183fbb4d019fd46394b362f89ca6ea9d10bc
2010-09-27 12:48:31 -04:00
Johann 063be9b82a Merge "combine max values and compare once" 2010-09-27 06:39:20 -07:00
Timothy B. Terriberry e2795e9978 Fix valgrind errors in vp8_sixtap_predict8x4_armv6().
This function was accessing values below the stack pointer, which
 can be corrupted by signal delivery at any time.

Change-Id: I92945b30817562eb0340f289e74c108da72aeaca
2010-09-24 14:34:18 -07:00
Johann f30e8dd7bd combine max values and compare once
previous implementation compared each set of values to limit and then
&'d them together, requiring a compare and & for each value.

this does the accumulation first, requiring only one compare

Change-Id: Ia5e3a1a50e47699c88470b8c41964f92a0dc1323
2010-09-24 15:42:50 -04:00
John Koleszar 48e76ff4fd move reconintra_mt to decoder (for now)
reconintra_mt.c is only required for building the decoder right now.
It could definitely be used for the encoder in the future, but it
currently depends on decoder only data structures. (onyxd_int.h,
VP8D_COMP, etc). Move it from common/ to decoder/ until the
necessary changes to the common multithread code are complete.

This patch is needed to build with --disable-vp8-decoder.

Change-Id: I568c52221a2b309234d269675cba97131ce35c86
2010-09-24 11:23:06 -04:00
Johann 7fed3832e7 Remove dead code
The new loopfilter was originally introduced as an experimental change.
It's permanent now.

Change-Id: I25dbedb6ceff3e9f9c04e18bb29f84c3ecb7e546
2010-09-22 11:07:34 -04:00
Yunqing Wang a23ccf8f8c Merge "Restructure multi-threaded decoder" 2010-09-21 05:00:30 -07:00
Fritz Koenig b7dc9398f2 Use movq instead of movdqu.
Movdqu is more expensive (throughput, uops) than movq.  Minimal
impact for newer big cores, but ~2.25% gain on Atom.

Change-Id: I62c80bb1cc01d8a91c350c4c7719462809a4ef7f
2010-09-20 11:34:26 -07:00
Fritz Koenig 8eae7fe7e8 Better choice of instruction filter mask comparision.
Use pmaxub instead of a combination of psubusb/por to
determine if any comparisons go over the limit.

Change-Id: I3f0bd7d2aabe5fee9ba6620508e2b60605abcb82
2010-09-20 10:20:38 -07:00
Yunqing Wang f857a85088 Restructure multi-threaded decoder
On each MB, loopfiltering is done right after MB decoding. This
combines two loops in multi-threaded code into one, which reduces
number of synchronizations to half.

The above-row/left-col data are saved in temp buffers for
next-row/next MB decoding.

Tests on 4-core gLucid machine showed 10% decoder performance
gain with threads=4 (tulip clip). Testing on other platforms
isn't done yet.

Change-Id: Id18ea7c1e84965dabea65d4c01ca5bc056ddeac9
2010-09-17 09:56:05 -04:00
Fritz Koenig 769f2424cc Removed unnecessary pxor.
There is no need to make sure that the lower byte of the
register is 0 because the downshift by 11 overwrites that byte.

Change-Id: I89cbf004b2ff532a2c68e0dc399c45a49cdad5a1
2010-09-13 18:34:34 -07:00
Fritz Koenig a65cd3def0 Make block access to frame buffer sequential
Sequentially accessing memory from a low address to a high
address should make it easier for the processor to predict
the cache.

Change-Id: I1921ce996bdd547144fe864fea6435f527f5842d
2010-09-10 16:27:28 -07:00
Fritz Koenig 6d90f867e4 Merge branch 'master' of git://review.webmproject.org/libvpx 2010-09-09 08:54:21 -07:00
John Koleszar c2140b8af1 Use WebM in copyright notice for consistency
Changes 'The VP8 project' to 'The WebM project', for consistency
with other webmproject.org repositories.

Fixes issue #97.

Change-Id: I37c13ed5fbdb9d334ceef71c6350e9febed9bbba
2010-09-09 10:01:21 -04:00
Fritz Koenig 3fb37162a8 Bilinear subpixel optimizations for ssse3.
Used pmaddubsw for multiply and add of two filter taps
at once for 16x16 and 8x8 blocks.

Change-Id: Idccf2d6e094561624407b109fa7e80ba799355ea
2010-09-07 17:19:40 -07:00
Scott LaVarnway 0de458f6b9 Reduced the size of MB_MODE_INFO
Moved partition_bmi and partition_count out of MB_MODE_INFO and
placed into MACROBLOCK.  Also reduced the size of other members
of the MB_MODE_INFO struct.  For 1080p, the memory was reduced
by 1,209,516 bytes.  The decoder performance appeared to improve
by 3% for the clip used.
Note:  The main goal for this change is to improve the decoder
performance.  The encoder will be revisited at a later date for
further structure cleanup.

Change-Id: I4733621292ee9cc3fffa4046cb3fd4d99bd14613
2010-09-03 16:43:23 -04:00
James Zern 76640f85da encoder: remove postproc dependency
Remove the dependency on postproc.c for the encoder in general, the only
unchecked need for it is when CONFIG_PSNR is enabled. All other cases
are already wrapped in CONFIG_POSTPROC. In the CONFIG_PSNR case the file
will still be included.

Additionally, when VP8_SET_POSTPROC is used with the encoder when post
processing has been disabled an error will be returned.

This addresses issue #153.

Change-Id: Ia6dfe20167f7077734a6058cbd1d794550346089
2010-09-02 11:52:37 -04:00
Yunqing Wang 0e78efad0b Replace sleep(0) calls in multi-threaded decoder
This is a workaround for gLucid problem.

Change-Id: I188a016a07e4c2ea212444c5a6284ff3c48a5caa
2010-08-31 20:37:11 -04:00
Johann 0b94f5d6e8 followup arm patch
make the arm asm detokenizer work with the new structures

Change-Id: I7cd92c2a018ec24032bb1cfd1bb9739bc84b444a
2010-08-31 11:41:10 -04:00
Scott LaVarnway e85e631504 Changed above and left context data layout
The main reason for the change was to reduce cycles in the token
decoder. (~1.5% gain for 32 bit)  This layout should be more
cache friendly.

As a result of this change, the encoder had to be updated.

Change-Id: Id5e804169d8889da0378b3a519ac04dabd28c837
Note: dixie uses a similar layout
2010-08-31 11:24:30 -04:00
Timothy B. Terriberry 7a8e0a2935 Fix harmless off-by-1 error.
The memory being zeroed in vp8_update_mode_info_border() was just
 allocated with calloc, and so the entire function is actually
 redundant, but it should be made correct in case someone expects
 it to actually work in the future.

Change-Id: If7a84e489157ab34ab77ec6e2fe034fb71cf8c79
2010-08-27 16:07:54 -07:00
Fritz Koenig 93c32a55c2 Rework idct calling structure.
Moving the eob structure allows for a non-struct based
function to handle decoding an entire mb of
idct/dequant/recon data.  This allows for SIMD functions
to idct/dequant/recon multiple blocks at once.

SSE2 implementation gives 3% gain on Atom.

Change-Id: I8a8f3efd546ea4e0535f517d94f347cfb737c9c2
2010-08-23 08:58:54 -07:00
Jim Bankoski b0660457fe Revert "Removed ssse3 sixtap code"
This reverts commit 6ea5bb85cd.
2010-08-19 15:58:27 -04:00
Johann 52852da7c9 cleanup simple loop filter
move some things around, reorder some instructions

constant 0 is used several times. load it once per call in horiz,
once per loop in vert.

separate saturating instructions to avoid stalls.

just use one usub8 call to set GE flags, rather than uqsub8 followed by
usub8 w/ 0

document some stalls for further consideration

Change-Id: Ic3877e0ddbe314bb8a17fd5db73501a7d64570ec
2010-08-19 13:37:40 -04:00
Johann a522be2941 Merge "fix armv6 simpleloop filter" 2010-08-19 08:31:57 -07:00
Johann 467a0b99ab fix armv6 simpleloop filter
test cases were causing a crash because the count was being read
incorrectly. after fixing that, noticed that the output was not
matching. fixed that.

Change-Id: Idb0edb887736bd566a3cf6d4aa1a03ea8d20eb27
2010-08-19 11:29:21 -04:00
Scott LaVarnway 6ea5bb85cd Removed ssse3 sixtap code
Change-Id: I0f20fbb898ee31eb94a143471aa6f1ca17a229a4
2010-08-18 15:34:09 -04:00
Johann c75f3993c0 store more vars than we removed
only saved r4-11+lr, but were storing r4-r12+lr

Change-Id: If77df1998af50e9badee7d99ef53543046434675
2010-08-16 10:32:15 -04:00
John Koleszar 80d3923a78 move segmentation_common to encoder
vp8_update_gf_useage_maps() is only used by the encoder. This patch
fixes the ability to build in decode-only or encode-only
configurations.

Change-Id: I3a5211428e539886ba998e09e8abd747ac55c9aa
2010-08-13 14:54:24 -04:00
Johann 633646b73b update structure
mode_info_context->mbmi no longer gets copied up a level

Change-Id: Icd2d27d381909721326c34594a1ccdc26d48a995
2010-08-12 16:37:55 -04:00
Johann 1ec7981c34 remove unused definition
asm_offsets contains some definitions which are no longer used. this
was one of them. v6 build works now

Change-Id: If370cfa8acd145de4fead2d9a11b048fccc090df
2010-08-12 16:37:55 -04:00
Scott LaVarnway 9c7a0090e0 Removed unnecessary MB_MODE_INFO copies
These copies occurred for each macroblock in the encoder and decoder.
Thetemp MB_MODE_INFO mbmi was removed from MACROBLOCKD.  As a result,
a large number compile errors had to be fixed.

Change-Id: I4cf0ffae3ce244f6db04a4c217d52dd256382cf3
2010-08-12 16:25:43 -04:00
Scott LaVarnway f5615b6149 Merge "Finished vp8_sixtap_predict4x4_ssse3 function" 2010-08-11 12:23:24 -07:00
John Koleszar 392a958274 avoid negative array subscript warnings
The mv_ref and sub_mv_ref token encodings are indexed from NEARESTMV
and LEFT4X4, respectively, rather than being zero-based like the
other token encodings.

Change-Id: I3699c3f84111209ecfb91097c4b900773e9a3ad5
2010-08-11 13:49:12 -04:00
Scott LaVarnway b07e5b6fa1 Finished vp8_sixtap_predict4x4_ssse3 function
Added vp8_filter_block1d4_h6_ssse3 and vp8_filter_block1d4_v6_ssse3
assembly routines.  Also removed unused assembly.

Change-Id: I01c1021835f2edda9da706822345f217087ca0d0
2010-08-11 13:49:00 -04:00
Johann c0ba42d3c0 rename DETOK_[AL]
everything else uses lowercase detok

Change-Id: I9671e2e90eb2961208dfa81c00b3accb5749ec04
2010-08-11 13:36:35 -04:00
Scott LaVarnway 99f46d62d9 Moved gf_active code to encoder only
The gf_active code is only used by the encoder, so it was moved from
common and decoder.

Change-Id: Iada15acd5b2b33ff70c34668ca87d4cfd0d05025
2010-08-11 11:54:25 -04:00
Scott LaVarnway e4fe866949 Added ssse3 version of sixtap filters
Improved decoder performance by 9% for the clip used.

Change-Id: I8fc5609213b7bef10248372595dc85b29f9895b9
2010-08-10 17:33:49 -04:00
John Koleszar 618c7d27a0 Mark loopfilter C functions as static
Clang defaults to C99 mode, and inline works differently in C99.
(gcc, on the other hand, defaults to a special gnu-style inlining,
which uses different syntax.)   Making the functions static makes sure
clang doesn't decide to discard a function because it's too large to
inline.

Thanks to eli.friedman for the patch.

Fixes http://code.google.com/p/webm/issues/detail?id=114

Change-Id: If3c1c3c176eb855a584a60007237283b0cc631a4
2010-08-09 09:36:44 -04:00
John Koleszar cfb204eaf7 Merge "Issue 150: Fixing linker warning in extend.c." 2010-08-02 09:35:05 -07:00
Jan Kratochvil 0e8f108fb0 nasm: avoid space before the :data symbol type.
global label:data
           ^^

Provide nasm compatibility.  No binary change by this patch with yasm
on {x86_64,i686}-fedora13-linux-gnu.  Few longer opcodes with nasm on
{x86_64,i686}-fedora13-linux-gnu have been checked as safe.

Change-Id:	I10f17eb1e4d4a718d4ebd1d0ccddc807c365e021
2010-08-02 09:20:42 -04:00
Frank Galligan 062e6c1886 Removed two unused global variables.
Removed the global variables vp8_an and vp8_cd. vp8_an was causing problems
because it was increasing the .bss by 1572864 bytes.

Change-Id: I6c12e294133c7fb6e770c0e4536d8287a5720a87
2010-07-28 17:25:09 -04:00
Johann 56f5a9a060 update arm idct functions
Jeff Muizelaar posted some changes to the idct/reconstruction c code.
This is the equivalent update for the arm assembly.

This shows a good boost on v6, and a minor boost on neon.
Here are some numbers for highway in qcif, 2641 frames:
HEAD neon: ~161 fps
new neon:  ~162 fps
HEAD v6:   ~102 fps
new v6:    ~106 fps

The following functions have been updated for armv6 and neon:
vp8_dc_only_idct_add
vp8_dequant_idct_add
vp8_dequant_dc_idct_add

Conflicts:

	vp8/decoder/arm/armv6/dequantdcidct_v6.asm
	vp8/decoder/arm/armv6/dequantidct_v6.asm

Resolved by removing these files. When I rewrote the functions, I also
moved the files to dequant_dc_idct_v6.asm/dequant_idct_v6.asm

Change-Id: Ie3300df824d52474eca1a5134cf22d8b7809a5d4
2010-07-26 08:55:19 -04:00
Justin Lebar 1d8277f8e8 Issue 150: Fixing linker warning in extend.c. 2010-07-23 16:42:25 -07:00
Jeff Muizelaar 98fcccfe97 Change the x86 idct functions to do reconstruction at the same time
Change-Id: I896fe6f9664e6849c7cee2cc6bb4e045eb42540f
2010-07-23 15:21:36 -04:00
Jeff Muizelaar b2fa74ac18 Combine idct and reconstruction steps
This moves the prediction step before the idct and combines the idct and
reconstruction steps into a single step. Combining them seems to give an
overall decoder performance improvement of about 1%.

Change-Id: I90d8b167ec70d79c7ba2ee484106a78b3d16e318
2010-07-23 15:21:36 -04:00
Fritz Koenig 0ce3901282 Swap alt/gold/new/last frame buffer ptrs instead of copying.
At the end of the decode, frame buffers were being copied.
The frames are not updated after the copy, they are just
for reference on later frames.  This change allows multiple
references to the same frame buffer instead of copying it.

Changes needed to be made to the encoder to handle this.  The
encoder is still doing frame buffer copies in similar places
where pointer reference could be done.

Change-Id: I7c38be4d23979cc49b5f17241ca3a78703803e66
2010-07-23 14:53:59 -04:00
Michael Kohler 1e23f45119 Fix misspelled "skiped" in onyxc_int.h to "skipped".
Signed-off-by: Michael Kohler <michaelkohler@live.com>
2010-07-07 20:06:04 +02:00
Yunqing Wang 29d586b462 Add loopfilter initialization fix in multithreading code
Modified loopfilter initialization to avoid unnecessary operations.

Change-Id: I9fd1a5a49edc1cb8116c2a72a6908b1e437459ec
2010-06-30 09:42:39 -04:00
Yunqing Wang bead039d4d Improve SSE2 loopfilter functions
Restructured and rewrote SSE2 loopfilter functions. Combined u and
v into one function to take advantage of SSE2 128-bit registers.
Tests on test clips showed a 4% decoder performance improvement on
Linux desktop.

Change-Id: Iccc6669f09e17f2224da715f7547d6f93b0a4987
2010-06-29 15:23:14 -04:00
Timothy B. Terriberry 9f81463454 Fix a linker error on x86-64 Linux when not using a version script.
If the version script produced by the libvpx build system is not
 used when linking a shared library on x86-64 Linux, the constant
 data in the subpel filters produces R_X86_64_32 relocation errors
 due to the use of wrt rip addressing instead of
 wrt rip wrt ..gotpcrel.
Instead of adding a new macro for this addressing mode, this patch
 sets the ELF visibility of these symbols to "hidden", which
 allows wrt rip addressing to work without a text relocation.
This allows building a shared library without using the provided
 build system or a separate version script.
Fixes http://code.google.com/p/webm/issues/detail?id=46

Change-Id: Ie108f9d9a4352e5af46938bf4750d2302c1b2dc2
2010-06-21 08:19:12 -04:00
John Koleszar 94c52e4da8 cosmetics: trim trailing whitespace
When the license headers were updated, they accidentally contained
trailing whitespace, so unfortunately we have to touch all the files
again.

Change-Id: I236c05fade06589e417179c0444cb39b09e4200d
2010-06-18 13:06:11 -04:00
John Koleszar c65e8e8e46 Merge "Change bitreader to use a larger window." 2010-06-17 18:08:36 -07:00