Bug 1352889 - Ensure that PLDHashTable's second hash doesn't have padding with 0 bits for tables with capacity larger than 2^16. r=njn

PLDHashTable takes the result of the hash function and multiplies it by kGoldenRatio to ensure that it has a good distribution of bits across the 32-bit hash value, and then zeroes out the low bit so that it can be used for the collision flag. This result is called hash0. From hash0 it computes two different numbers used to find entries in the table storage: hash1 is used to find an initial position in the table to begin searching for an entry; hash2 is then used to repeatedly offset that position (mod the size of the table) to build a chain of positions to search. In a table with capacity 2^c entries, hash1 is simply the upper c bits of hash0. This patch does not change this. Prior to this patch, hash2 was the c bits below hash1, padded at the low end with zeroes when c > 16. (Note that bug 927705, changeset 1a02bec165e16f370cace3da21bb2b377a0a7242, increased the maximum capacity from 2^23 to 2^26 since 2^23 was sometimes insufficient!) This manner of computing hash2 is problematic because it increases the risk of long chains for very large tables, since there is less variation in the hash2 result due to the zero padding. So this patch changes the hash2 computation by using the low bits of hash0 instead of shifting it around, thus avoiding 0 bits in parts of the hash2 value that are significant. Note that this changes what hash2 is in all cases except when the table capacity is exactly 2^16, so it does change our hashing characteristics. For tables with capacity less than 2^16, it should be using a different second hash, but with the same amount of random-ish data. For tables with capacity greater than 2^16, it should be using more random-ish data. MozReview-Commit-ID: JvnxAMBY711 --HG-- extra : transplant_source : %8A%25%FB%E3H%B8_%F1G%F6%3E%0B%29%DF%20%FF%D8%E1%AEw
2017-04-03 20:43:30 -07:00 · 2017-04-03 20:43:30 -07:00 · 4d700b54f1
--- a/xpcom/ds/PLDHashTable.cpp
+++ b/xpcom/ds/PLDHashTable.cpp
@ -253,15 +253,27 @@ PLDHashTable::Hash1(PLDHashNumber aHash0)
  return aHash0 >> mHashShift;
 }

-// Double hashing needs the second hash code to be relatively prime to table
-// size, so we simply make hash2 odd.
 void
-PLDHashTable::Hash2(PLDHashNumber aHash,
+PLDHashTable::Hash2(PLDHashNumber aHash0,
                    uint32_t& aHash2Out, uint32_t& aSizeMaskOut)
 {
  uint32_t sizeLog2 = kHashBits - mHashShift;
-  aHash2Out = ((aHash << sizeLog2) >> mHashShift) | 1;
-  aSizeMaskOut = (PLDHashNumber(1) << sizeLog2) - 1;
+  uint32_t sizeMask = (PLDHashNumber(1) << sizeLog2) - 1;
+  aSizeMaskOut = sizeMask;
+
+  // The incoming aHash0 always has the low bit unset (since we leave it
+  // free for the collision flag), and should have reasonably random
+  // data in the other 31 bits.  We used the high bits of aHash0 for
+  // Hash1, so we use the low bits here.  If the table size is large,
+  // the bits we use may overlap, but that's still more random than
+  // filling with 0s.
+  //
+  // Double hashing needs the second hash code to be relatively prime to table
+  // size, so we simply make hash2 odd.
+  //
+  // This also conveniently covers up the fact that we have the low bit
+  // unset since aHash0 has the low bit unset.
+  aHash2Out = (aHash0 & sizeMask) | 1;
 }

 // Reserve mKeyHash 0 for free entries and 1 for removed-entry sentinels. Note