Currently, the build scripts enable AltiVec unconditionally on all ppc*
targets. However, there some ppc* targets which do not support AltiVec
instruction set extensions, these are often embedded systems like the
PowerPC e500 or similar which have their own type of instruction set
extensions like SPE. Trying to enable Altivec support on these targets
results in a compiler error, hence we need to add an autoconf test for
AltiVec support before trying to enable it on ppc* targets.
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
--HG--
extra : rebase_source : 6c4df813d97d95046fd2269c876313ce7591c5cf
This patch fixes various warnings from MSVC.
- Several "truncation from 'double' to 'float'" warnings, easily fixed by
appending 'f' to literals.
- Some "signed/unsigned mismatch" warnings. In read_tag_lutType(), MSVC is
apparently promoting the multiplication of a uint8_t and a uint16_t to an
int32_t, oddly enough. A uint32_t cast fixes the warning.
- |offset| was unused in qcms_data_create_rbg_with_gamma().
- A couple of "overflow in floating-point constant arithmetic" warnings
involving INFINITY in transform_util.c. There is some type confusion here --
in C99 HUGE_VAL is a double and INFINITY is a float. So the HUGE_VAL here
should actualy be HUGE_VALF. But, strangely enough, that isn't enough to
avoid the warning, I don't know why. However, it turns out that any
non-positive value for |interval| will have the same effect, so I just
removed all the INFINITY/HUGE_VAL stuff and used -1 instead.
It also fixes an ARM-only GCC warning.
- "'__force_align_arg_pointer__' attribute directive ignored". This is an
x86-only attribute. Instead of disabling it on x86-64, instead enable it on
i386 (which avoids enabling it uselessly on ARM).
--HG--
extra : rebase_source : 61015b7e48aebd58035fc222abf076e79a99a972
We currently use a larger output lookup table than we probably need. Switch to
a common define for the table size and lower it. The should also give a small
improvement to startup time because we have fewer lookup table entries to
compute.
Switch from pow() to powf() because it's faster and we don't need the
additional precision. Also avoid unnecessary conversion to and from doubles by
using float constants instead of doubles.
This patch greatly improves the performance of QCMS transformations on x86 &
x86_64 systems. Some notes:
0. On 32-bit x86 systems it does runtime selection between non-SIMD, SSE, and
SSE2 code paths.
1. On x86_64 systems the SSE2 code path is always taken. The non-SIMD and SSE
code paths are left intact, but contemporary versions of the GCC and MSVC
compilers will see that they cannot be reached and optimize them away.
2. The execution of the SSE2 code path is reduced by 67%, relative to the
original Intel/Microsoft formatted ASM code. The relative performance is seen
on a Pentium4 (Northwood) 2.4GHz CPU with DDR1 RAM.
3. The SSE code path provides a 80% reduction in execution time, relative to
the non-SIMD code path. The relative performance is seen on a Pentium3
(Coppermine) 1.26GHz CPU with SDRAM.
4. The code has been split out into separate files so that it can be built
with different cflags (-msse, and -msse2) when using gcc.
5. Try to land again, this time with __attribute__((__force_align_arg_pointer__))
to avoid crashes on linux.
This patch greatly improves the performance of QCMS transformations on x86 &
x86_64 systems. Some notes:
0. On 32-bit x86 systems it does runtime selection between non-SIMD, SSE, and
SSE2 code paths.
1. On x86_64 systems the SSE2 code path is always taken. The non-SIMD and SSE
code paths are left intact, but contemporary versions of the GCC and MSVC
compilers will see that they cannot be reached and optimize them away.
2. The execution of the SSE2 code path is reduced by 67%, relative to the
original Intel/Microsoft formatted ASM code. The relative performance is seen
on a Pentium4 (Northwood) 2.4GHz CPU with DDR1 RAM.
3. The SSE code path provides a 80% reduction in execution time, relative to
the non-SIMD code path. The relative performance is seen on a Pentium3
(Coppermine) 1.26GHz CPU with SDRAM.
4. The code has been split out into separate files so that it can be built
with different cflags (-msse, and -msse2) when using gcc.
Makes the number of output entries produced by invert_lut() a parameter and
changes all callers to use a minimum of 256 entries when computing the inverse.