Граф коммитов

29 Коммитов

Автор SHA1 Сообщение Дата
Johann 4a2b684ef4 modify SAVE_XMM for potential 64bit use
the win64 abi requires saving and restoring xmm6:xmm15. currently
SAVE_XMM and RESTORE XMM only allow for saving xmm6:xmm7. allow
specifying the highest register used and if the stack is unaligned.

Change-Id: Ica5699622ffe3346d3a486f48eef0206c51cf867
2011-04-19 10:42:45 -04:00
Johann a9b465c5c9 Merge "Add save/restore xmm registers in x86 assembly code" 2011-04-19 06:32:10 -07:00
Johann c7cfde42a9 Add save/restore xmm registers in x86 assembly code
Went through the code and fixed it. Verified on Windows.

Where possible, remove dependencies on xmm[67]

Current code relies on pushing rbp to the stack to get 16 byte
alignment. This broke when rbp wasn't pushed
(vp8/encoder/x86/sad_sse3.asm). Work around this by using unaligned
memory accesses. Revisit this and the offsets in
vp8/encoder/x86/sad_sse3.asm in another change to SAVE_XMM.

Change-Id: I5f940994d3ebfd977c3d68446cef20fd78b07877
2011-04-18 16:30:38 -04:00
Yaowu Xu 05d9421e8b Merge "Add spin-wait pause intrinsic for Windows x64 platform." 2011-04-18 09:53:26 -07:00
Rafael Ávila de Espíndola 52f6e28e9e Fix build with xcode4 and simplify GLOBAL.
Without this change I get link errors in firefox's libxul. It looks
like the linker expect a particular pattern for getting the GOT. This
patch changes webm to use the same pattern used by the compiler.

Change-Id: Iea8c2e134ad45c1dc7d221ff885a8429bfa4e057
2011-03-12 10:45:22 -05:00
Aron Rosenberg 8e87d58712 Add spin-wait pause intrinsic for Windows x64 platform.
Change-Id: I7504370c67a3c551627c6bb7e67c65f83d88b78e
2011-03-04 14:49:50 -08:00
James Zern f42d52e6bd documentation: minor cosmetics
- correct spelling
- remove explicit file name w/\file (unnecessary when contained in the
  same file and prone to desync)

Change-Id: I68a3960ac5ab84d0f2e5c9b2e29799f26dfccf23
2011-02-16 17:59:33 -08:00
Tero Rintaluoma 11a222f5d9 Adds "armvX-none-rvct" targets
Adds following targets to configure script to support RVCT compilation
without operating system support (for Profiler or bare metal images).
 - armv5te-none-rvct
 - armv6-none-rvct
 - armv7-none-rvct

To strip OS specific parts from the code "os_support"-config was added
to script and CONFIG_OS_SUPPORT flag is used in the code to exclude OS
specific parts such as OS specific includes and function calls for
timers and threads etc. This was done to enable RVCT compilation for
profiling purposes or running the image on bare metal target with
Lauterbach.

Removed separate AREA directives for READONLY data in armv6 and neon
assembly files to fix the RVCT compilation. Otherwise
"ldr <reg>, =label" syntax would have been needed to prevent linker
errors. This syntax is not supported by older gnu assemblers.

Change-Id: I14f4c68529e8c27397502fbc3010a54e505ddb43
2011-01-28 12:47:39 +02:00
Yunqing Wang 71ecb5d7d9 Full search SAD function optimization in SSE4.1
Use mpsadbw, and calculate 8 sad at once. Function list:
vp8_sad16x16x8_sse4
vp8_sad16x8x8_sse4
vp8_sad8x16x8_sse4
vp8_sad8x8x8_sse4
vp8_sad4x4x8_sse4

(test clip: tulip)
For best quality mode, this gave encoder a 5% performance boost.
For good quality mode with speed=1, this gave encoder a 3%
performance boost.

Change-Id: I083b5a39d39144f88dcbccbef95da6498e490134
2010-10-27 13:36:31 -04:00
Timothy B. Terriberry b71962fdc9 Add runtime CPU detection support for ARM.
The primary goal is to allow a binary to be built which supports
 NEON, but can fall back to non-NEON routines, since some Android
 devices do not have NEON, even if they are otherwise ARMv7 (e.g.,
 Tegra).
The configure-generated flags HAVE_ARMV7, etc., are used to decide
 which versions of each function to build, and when
 CONFIG_RUNTIME_CPU_DETECT is enabled, the correct version is chosen
 at run time.
In order for this to work, the CFLAGS must be set to something
 appropriate (e.g., without -mfpu=neon for ARMv7, and with
 appropriate -march and -mcpu for even earlier configurations), or
 the native C code will not be able to run.
The ASFLAGS must remain set for the most advanced instruction set
 required at build time, since the ARM assembler will refuse to emit
 them otherwise.
I have not attempted to make any changes to configure to do this
 automatically.
Doing so will probably require the addition of new configure options.

Many of the hooks for RTCD on ARM were already there, but a lot of
 the code had bit-rotted, and a good deal of the ARM-specific code
 is not integrated into the RTCD structs at all.
I did not try to resolve the latter, merely to add the minimal amount
 of protection around them to allow RTCD to work.
Those functions that were called based on an ifdef at the calling
 site were expanded to check the RTCD flags at that site, but they
 should be added to an RTCD struct somewhere in the future.
The functions invoked with global function pointers still are, but
 these should be moved into an RTCD struct for thread safety (I
 believe every platform currently supported has atomic pointer
 stores, but this is not guaranteed).

The encoder's boolhuff functions did not even have _c and armv7
 suffixes, and the correct version was resolved at link time.
The token packing functions did have appropriate suffixes, but the
 version was selected with a define, with no associated RTCD struct.
However, for both of these, the only armv7 instruction they actually
 used was rbit, and this was completely superfluous, so I reworked
 them to avoid it.
The only non-ARMv4 instruction remaining in them is clz, which is
 ARMv5 (not even ARMv5TE is required).
Considering that there are no ARM-specific configs which are not at
 least ARMv5TE, I did not try to detect these at runtime, and simply
 enable them for ARMv5 and above.

Finally, the NEON register saving code was completely non-reentrant,
 since it saved the registers to a global, static variable.
I moved the storage for this onto the stack.
A single binary built with this code was tested on an ARM11 (ARMv6)
 and a Cortex A8 (ARMv7 w/NEON), for both the encoder and decoder,
 and produced identical output, while using the correct accelerated
 functions on each.
I did not test on any earlier processors.

Change-Id: I45cbd63a614f4554c3b325c45d46c0806f009eaa
2010-10-25 09:23:29 -04:00
Fritz Koenig 0f5c63e4f6 Add processor dectection for x86.
Use cpuid to check the vendor string against known
architectures.

Change-Id: I3fbd7f73638d71857a0c4a44a6275eb295fb4cef
2010-10-13 09:58:15 -07:00
Fritz Koenig e50f5d4037 GCC inline restrictions were not adequate.
=r was not restrictive enough and the compiler was not returning
ebx correctly.

Change-Id: I7606e384067bd5fb69189802f1ff64ccc5aa02d6
2010-10-12 10:12:23 -07:00
Jan Kratochvil fc2b06c625 nasm: avoid relative include paths
nasm does not automatically assume the source's directory also for its
include files.

Provide nasm compatibility.  No binary change by this patch with yasm on
{x86_64,i686}-fedora13-linux-gnu.  Few longer opcodes with nasm on
{x86_64,i686}-fedora13-linux-gnu have been checked as safe.

Change-Id:	I386efa0cca5d401193416c11bd7363a283541645
2010-10-04 19:50:08 -04:00
Jan Kratochvil 5cdc3a4c29 nasm: address labels 'rel label' vice 'wrt rip'
nasm does not support `label wrt rip', it requires `rel label'. It is
still fully compatible with yasm.

Provide nasm compatibility. No binary change by this patch with yasm on
{x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on
{x86_64,i686}-fedora13-linux-gnu have been checked as safe.

Change-Id: I488773a4e930a56e43b0cc72d867ee5291215f50
2010-10-04 19:47:54 -04:00
Jan Kratochvil e114f699f6 nasm: match instruction length (movd/movq) to parameters
nasm requires the instruction length (movd/movq) to match to its
parameters. I find it more clear to really use 64bit instructions when
we use 64bit registers in the assembly.

Provide nasm compatibility. No binary change by this patch with yasm on
{x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on
{x86_64,i686}-fedora13-linux-gnu have been checked as safe.

Change-Id: Id9b1a5cdfb1bc05697e523c317a296df43d42a91
2010-10-04 23:36:29 +02:00
Fritz Koenig 746439ef6c Modify GET_GOT macro for performance.
GET_GOT was producing a zero length call.  This resulted in
pipeline flushes occuring when returing from the assembly
functions.  Masked on out of order cores, but evident on
Atom cores.

Change-Id: I8c375af313e8a169c77adbaf956693c0cfeb5ccd
2010-09-15 12:41:15 -07:00
John Koleszar c2140b8af1 Use WebM in copyright notice for consistency
Changes 'The VP8 project' to 'The WebM project', for consistency
with other webmproject.org repositories.

Fixes issue #97.

Change-Id: I37c13ed5fbdb9d334ceef71c6350e9febed9bbba
2010-09-09 10:01:21 -04:00
John Koleszar daab4bcba6 Use native win32 timers on mingw
Changed to use QueryPerformanceCounter on Windows rather than only
when building with MSVC, so that MSVC can link libs built with
MinGW.

Fixes issue #149.

Change-Id: Ie2dc7edc8f4d096cf95ec5ffb1ab00f2d67b3e7d
2010-09-02 12:03:51 -04:00
Jan Kratochvil 0e8f108fb0 nasm: avoid space before the :data symbol type.
global label:data
           ^^

Provide nasm compatibility.  No binary change by this patch with yasm
on {x86_64,i686}-fedora13-linux-gnu.  Few longer opcodes with nasm on
{x86_64,i686}-fedora13-linux-gnu have been checked as safe.

Change-Id:	I10f17eb1e4d4a718d4ebd1d0ccddc807c365e021
2010-08-02 09:20:42 -04:00
John Koleszar 5e34461448 Remove INLINE/FORCEINLINE
These are mostly vestigial, it's up to the compiler to decide what
should be inlined, and this collided with certain Windows platform SDKs.

Change-Id: I80dd35de25eda7773156e355b5aef8f7e44e179b
2010-06-24 09:24:33 -04:00
Timothy B. Terriberry 9f81463454 Fix a linker error on x86-64 Linux when not using a version script.
If the version script produced by the libvpx build system is not
 used when linking a shared library on x86-64 Linux, the constant
 data in the subpel filters produces R_X86_64_32 relocation errors
 due to the use of wrt rip addressing instead of
 wrt rip wrt ..gotpcrel.
Instead of adding a new macro for this addressing mode, this patch
 sets the ELF visibility of these symbols to "hidden", which
 allows wrt rip addressing to work without a text relocation.
This allows building a shared library without using the provided
 build system or a separate version script.
Fixes http://code.google.com/p/webm/issues/detail?id=46

Change-Id: Ie108f9d9a4352e5af46938bf4750d2302c1b2dc2
2010-06-21 08:19:12 -04:00
John Koleszar 94c52e4da8 cosmetics: trim trailing whitespace
When the license headers were updated, they accidentally contained
trailing whitespace, so unfortunately we have to touch all the files
again.

Change-Id: I236c05fade06589e417179c0444cb39b09e4200d
2010-06-18 13:06:11 -04:00
Scott LaVarnway 48c84d138f sse2 version of vp8_regular_quantize_b
Added sse2 version of vp8_regular_quantize_b which improved encode
performance(for the clip used) by ~10% for 32 bit builds and ~3% for
64 bit builds.

Also updated SHADOW_ARGS_TO_STACK to allow for more than 9 arguments.

Change-Id: I62f78eabc8040b39f3ffdf21be175811e96b39af
2010-06-14 14:07:56 -04:00
Makoto Kato 63ea8705eb some XMM registers are non-volatile on windows x64 ABI
XMM6 to XMM15 are non-volatile on Windows x64 ABI.  We have to save
these registers.

Change-Id: I4676309f1350af25c8a35f0c81b1f0499ab99076
2010-06-11 12:11:15 -04:00
John Koleszar 09202d8071 LICENSE: update with latest text
Change-Id: Ieebea089095d9073b3a94932791099f614ce120c
2010-06-04 16:19:40 -04:00
John Koleszar c3c870ed74 x86: tell gnu ld that we don't require an executable stack
Fixes #2

Change-Id: Ie15c57ccf2f9721cb35102765d759817f2607cd7
2010-05-27 08:56:34 -04:00
John Koleszar b7492341ac install includes in DIST_DIR/include/vpx, move vpx_codec/ to vpx/
This renames the vpx_codec/ directory to vpx/, to allow applications
to more consistently reference these includes with the vpx/ prefix.
This allows the includes to be installed in /usr/local/include/vpx
rather than polluting the system includes directory with an
excessive number of includes.

Change-Id: I7b0652a20543d93f38f421c60b0bbccde4d61b4f
2010-05-24 20:27:42 -04:00
John Koleszar 1df0314e7b configure: remove HAVE_CONFIG_H
This doesn't play well with autotools, and the preprocessor magic is
confusing and unhelpful in the vp8-only context.

Change-Id: I2fcb57e6eb7876ecb58509da608dc21f26077ff1
2010-05-21 05:53:48 -04:00
John Koleszar 0ea50ce9cb Initial WebM release 2010-05-18 11:58:33 -04:00