Bug 1352889 - Ensure that PLDHashTable's second hash doesn't have padding with 0 bits for tables with capacity larger than 2^16. r=njn

PLDHashTable takes the result of the hash function and multiplies it by
kGoldenRatio to ensure that it has a good distribution of bits across
the 32-bit hash value, and then zeroes out the low bit so that it can be
used for the collision flag.  This result is called hash0.  From hash0
it computes two different numbers used to find entries in the table
storage:  hash1 is used to find an initial position in the table to
begin searching for an entry; hash2 is then used to repeatedly offset
that position (mod the size of the table) to build a chain of positions
to search.

In a table with capacity 2^c entries, hash1 is simply the upper c bits
of hash0.  This patch does not change this.

Prior to this patch, hash2 was the c bits below hash1, padded at the low
end with zeroes when c > 16.  (Note that bug 927705, changeset
1a02bec165e16f370cace3da21bb2b377a0a7242, increased the maximum capacity
from 2^23 to 2^26 since 2^23 was sometimes insufficient!)  This manner
of computing hash2 is problematic because it increases the risk of long
chains for very large tables, since there is less variation in the hash2
result due to the zero padding.

So this patch changes the hash2 computation by using the low bits of
hash0 instead of shifting it around, thus avoiding 0 bits in parts of
the hash2 value that are significant.

Note that this changes what hash2 is in all cases except when the table
capacity is exactly 2^16, so it does change our hashing characteristics.
For tables with capacity less than 2^16, it should be using a different
second hash, but with the same amount of random-ish data.  For tables
with capacity greater than 2^16, it should be using more random-ish
data.

MozReview-Commit-ID: JvnxAMBY711

--HG--
extra : transplant_source : %8A%25%FB%E3H%B8_%F1G%F6%3E%0B%29%DF%20%FF%D8%E1%AEw
This commit is contained in:
L. David Baron 2017-04-03 20:43:30 -07:00
Родитель db2f1da78f
Коммит 4d700b54f1
1 изменённых файлов: 17 добавлений и 5 удалений

Просмотреть файл

@ -253,15 +253,27 @@ PLDHashTable::Hash1(PLDHashNumber aHash0)
return aHash0 >> mHashShift;
}
// Double hashing needs the second hash code to be relatively prime to table
// size, so we simply make hash2 odd.
void
PLDHashTable::Hash2(PLDHashNumber aHash,
PLDHashTable::Hash2(PLDHashNumber aHash0,
uint32_t& aHash2Out, uint32_t& aSizeMaskOut)
{
uint32_t sizeLog2 = kHashBits - mHashShift;
aHash2Out = ((aHash << sizeLog2) >> mHashShift) | 1;
aSizeMaskOut = (PLDHashNumber(1) << sizeLog2) - 1;
uint32_t sizeMask = (PLDHashNumber(1) << sizeLog2) - 1;
aSizeMaskOut = sizeMask;
// The incoming aHash0 always has the low bit unset (since we leave it
// free for the collision flag), and should have reasonably random
// data in the other 31 bits. We used the high bits of aHash0 for
// Hash1, so we use the low bits here. If the table size is large,
// the bits we use may overlap, but that's still more random than
// filling with 0s.
//
// Double hashing needs the second hash code to be relatively prime to table
// size, so we simply make hash2 odd.
//
// This also conveniently covers up the fact that we have the low bit
// unset since aHash0 has the low bit unset.
aHash2Out = (aHash0 & sizeMask) | 1;
}
// Reserve mKeyHash 0 for free entries and 1 for removed-entry sentinels. Note