According to MSVC manual (*1), cl.exe can skip including a header file
when that:
- contains #pragma once, or
- starts with #ifndef, or
- starts with #if ! defined.
GCC has a similar trick (*2), but it acts more stricter (e. g. there
must be _no tokens_ outside of #ifndef...#endif).
Sun C lacked #pragma once for a looong time. Oracle Developer Studio
12.5 finally implemented it, but we cannot assume such recent version.
This changeset modifies header files so that each of them include
strictly one #ifndef...#endif. I believe this is the most portable way
to trigger compiler optimizations. [Bug #16770]
*1: https://docs.microsoft.com/en-us/cpp/preprocessor/once
*2: https://gcc.gnu.org/onlinedocs/cppinternals/Guard-Macros.html
ISO/IEC 9899:1999 section 6.5.7 states that "If the value of the right
operand is negative or is greater than or equal to the width of the
promoted left operand, the behavior is undefined". So we have to take
care of such situations.
This has not been a problem because contemporary C compilers are
extraordinary smart to compile the series of shifts into a single
ROTLQ/ROTRQ machine instruction. In contrast to what C says those
instructions have fully defined behaviour for all possible inputs.
Hence it has been quite difficult to observe the undefined-ness of such
situations. But undefined is undefined. We should not rely on such
target-specific assumptions.
We are fixing the situation by carefully avoiding shifts with out-of-
range values. At least GCC since 4.6.3 and Clang since 8.0 can issue
the exact same instructions like before the changeset.
Also in case of Intel processors, there supposedly be intrinsics named
_rotr/_rotl that do exactly what we need. They, in practice, are absent
on Clang before 9.x so we cannot blindly use. But we can at least save
MSVC.
See also:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57157https://bugs.llvm.org/show_bug.cgi?id=17332
Replaces `#ifdef _MSC_VER` with more accurate version checks. Also,
`defined(_WIN64) && defined(__AVX2__)` is redundant because there is no
such tihng like a 32bit AVX2 machine.
Improving readability by converting some macros into inline functions.
Also improved support for recent x86_64 processors, which have better
instructions for the purposes.
One day, I could not resist the way it was written. I finally started
to make the code clean. This changeset is the beginning of a series of
housekeeping commits. It is a simple refactoring; split internal.h into
files, so that we can divide and concur in the upcoming commits. No
lines of codes are either added or removed, except the obvious file
headers/footers. The generated binary is identical to the one before.