Based on patch from David Rientjes <rientjes@google.com>, but
changed by AK.
Optimizes the 64-bit hamming weight for x86_64 processors assuming they
have fast multiplication. Uses five fewer bitops than the generic
hweight64. Benchmark on one EMT64 showed ~25% speedup with 2^24
consecutive calls.
Define a new ARCH_HAS_FAST_MULTIPLIER that can be set by other
architectures that can also multiply fast.
Signed-off-by: Andi Kleen <ak@suse.de>