This patch reduces sizeof(PLDHashTable) as follows.
- 64-bit: from 40 bytes to 32
- 32-bit: from 28 bytes to 20
It does this by doing the following.
- It moves mGeneration from EntryStore to PLDHashTable, to avoid unnecessary
padding on 64-bit. This requires tweaking EntryStore::Set() as explained in a
comment.
- It also shrinks mGeneration from uint32_t to uint16_t, saving 2 bytes of
data.
- It shrinks mEntrySize from uint32_t to uint8_t, to cut 3 bytes of data.
- It shrinks mHashShift from int16_t to uint8_t, trimming another byte of data,
and moves it, saving another 2 bytes of padding.
And it reorders the fields so the word-sized ones are at the start, which makes
it easier to imagine the memory layout.
The patch also adds a test, and fixes some misordered function arguments in
existing tests.
--HG--
extra : rebase_source : 6ed6f7be68477fd4a82f07dd2f51c1f1d9b92dcc
The current code replaces one PLDHashTable's mGeneration with another. If the
two generation values happen to be equal, it will look as though the storage
hasn't changed when really it has.
The fix is to use EntryStore::Set() to update mEntryStore, which increments
mGeneration appropriately.
--HG--
extra : rebase_source : d1779a143746c8d1a3e67bc3685e703496206b0f
The simplistic shift-based hashing function creates a lot of collisions
for pointers pointing to arrays as it doesn't do a great job at distributing
the data randomly based on the input bytes.
PLDHashTable takes the result of the hash function and multiplies it by
kGoldenRatio to ensure that it has a good distribution of bits across
the 32-bit hash value, and then zeroes out the low bit so that it can be
used for the collision flag. This result is called hash0. From hash0
it computes two different numbers used to find entries in the table
storage: hash1 is used to find an initial position in the table to
begin searching for an entry; hash2 is then used to repeatedly offset
that position (mod the size of the table) to build a chain of positions
to search.
In a table with capacity 2^c entries, hash1 is simply the upper c bits
of hash0. This patch does not change this.
Prior to this patch, hash2 was the c bits below hash1, padded at the low
end with zeroes when c > 16. (Note that bug 927705, changeset
1a02bec165e16f370cace3da21bb2b377a0a7242, increased the maximum capacity
from 2^23 to 2^26 since 2^23 was sometimes insufficient!) This manner
of computing hash2 is problematic because it increases the risk of long
chains for very large tables, since there is less variation in the hash2
result due to the zero padding.
So this patch changes the hash2 computation by using the low bits of
hash0 instead of shifting it around, thus avoiding 0 bits in parts of
the hash2 value that are significant.
Note that this changes what hash2 is in all cases except when the table
capacity is exactly 2^16, so it does change our hashing characteristics.
For tables with capacity less than 2^16, it should be using a different
second hash, but with the same amount of random-ish data. For tables
with capacity greater than 2^16, it should be using more random-ish
data.
Note that this patch depends on the patch for bug 1353458 in order to
avoid causing test failures.
MozReview-Commit-ID: JvnxAMBY711
--HG--
extra : transplant_source : 2%D2%C2%CE%E1%92%C8%F8H%D7%15%A4%86%5B%3Ac%0B%08%3DA
PLDHashTable's entry store has two types of unoccupied entries: free
entries and removed entries. The search of a chain of entries
(determined by the hash value) in the entry store to search for an entry
can stop at free entries, but it continues across removed entries,
because removed entries are entries that may have been skipped over when
we were adding the value we're searching for to the hash, but have since
been removed. For live entries, we also maintain this distinction by
using one bit of storage for a collision flag, which notes that if the
hashtable entry is removed, its place in the entry store must become a
removed entry rather than a free entry.
When we add a new entry to the table, Add's semantics require that we
return an existing entry if there is one, and only create a new entry if
no existing entry exists. (Bug 1352198 suggests the possibility of a
faster alternative Add API where the caller guarantees that the key is
not already in the hashtable.) When we search for the existing entry,
we must thus continue the search across removed entries, even though we
record the first removed entry found to return if the search for an
existing entry fails.
The existing code adds the collision flag through the entire table
search during an Add. This patch changes that behavior so that we only
add the collision flag prior to finding the first removed entry. Adding
it after we find the first removed entry is unnecessary, since we are
not making that entry part of a path to a new entry. If it is part of a
path to an existing entry, it will already have the collision flag set.
This patch effectively puts an if (!firstRemoved) around the else branch
of the if (MOZ_UNLIKELY(EntryIsRemoved(entry))), and then refactors that
condition outwards since it is now around the contents of both the if
and else branches.
MozReview-Commit-ID: CsXnMYttHVy
--HG--
extra : transplant_source : %80%9E%83%EC%CCY%B4%B0%86%86%18%99%B6U%21o%5D%29%AD%04
PLDHashTable takes the result of the hash function and multiplies it by
kGoldenRatio to ensure that it has a good distribution of bits across
the 32-bit hash value, and then zeroes out the low bit so that it can be
used for the collision flag. This result is called hash0. From hash0
it computes two different numbers used to find entries in the table
storage: hash1 is used to find an initial position in the table to
begin searching for an entry; hash2 is then used to repeatedly offset
that position (mod the size of the table) to build a chain of positions
to search.
In a table with capacity 2^c entries, hash1 is simply the upper c bits
of hash0. This patch does not change this.
Prior to this patch, hash2 was the c bits below hash1, padded at the low
end with zeroes when c > 16. (Note that bug 927705, changeset
1a02bec165e16f370cace3da21bb2b377a0a7242, increased the maximum capacity
from 2^23 to 2^26 since 2^23 was sometimes insufficient!) This manner
of computing hash2 is problematic because it increases the risk of long
chains for very large tables, since there is less variation in the hash2
result due to the zero padding.
So this patch changes the hash2 computation by using the low bits of
hash0 instead of shifting it around, thus avoiding 0 bits in parts of
the hash2 value that are significant.
Note that this changes what hash2 is in all cases except when the table
capacity is exactly 2^16, so it does change our hashing characteristics.
For tables with capacity less than 2^16, it should be using a different
second hash, but with the same amount of random-ish data. For tables
with capacity greater than 2^16, it should be using more random-ish
data.
Note that this patch depends on the patch for bug 1353458 in order to
avoid causing test failures.
MozReview-Commit-ID: JvnxAMBY711
--HG--
extra : rebase_source : dbde93b9312dd10fb3a549ee4098b57e1df5cadf
PLDHashTable's entry store has two types of unoccupied entries: free
entries and removed entries. The search of a chain of entries
(determined by the hash value) in the entry store to search for an entry
can stop at free entries, but it continues across removed entries,
because removed entries are entries that may have been skipped over when
we were adding the value we're searching for to the hash, but have since
been removed. For live entries, we also maintain this distinction by
using one bit of storage for a collision flag, which notes that if the
hashtable entry is removed, its place in the entry store must become a
removed entry rather than a free entry.
When we add a new entry to the table, Add's semantics require that we
return an existing entry if there is one, and only create a new entry if
no existing entry exists. (Bug 1352198 suggests the possibility of a
faster alternative Add API where the caller guarantees that the key is
not already in the hashtable.) When we search for the existing entry,
we must thus continue the search across removed entries, even though we
record the first removed entry found to return if the search for an
existing entry fails.
The existing code adds the collision flag through the entire table
search during an Add. This patch changes that behavior so that we only
add the collision flag prior to finding the first removed entry. Adding
it after we find the first removed entry is unnecessary, since we are
not making that entry part of a path to a new entry. If it is part of a
path to an existing entry, it will already have the collision flag set.
This patch effectively puts an if (!firstRemoved) around the else branch
of the if (MOZ_UNLIKELY(EntryIsRemoved(entry))), and then refactors that
condition outwards since it is now around the contents of both the if
and else branches.
MozReview-Commit-ID: CsXnMYttHVy
--HG--
extra : transplant_source : W4%B8%BA%D5p%102%1B%8D%83%23%E0s%B3%B0f%0D%05%AE
PLDHashTable takes the result of the hash function and multiplies it by
kGoldenRatio to ensure that it has a good distribution of bits across
the 32-bit hash value, and then zeroes out the low bit so that it can be
used for the collision flag. This result is called hash0. From hash0
it computes two different numbers used to find entries in the table
storage: hash1 is used to find an initial position in the table to
begin searching for an entry; hash2 is then used to repeatedly offset
that position (mod the size of the table) to build a chain of positions
to search.
In a table with capacity 2^c entries, hash1 is simply the upper c bits
of hash0. This patch does not change this.
Prior to this patch, hash2 was the c bits below hash1, padded at the low
end with zeroes when c > 16. (Note that bug 927705, changeset
1a02bec165e16f370cace3da21bb2b377a0a7242, increased the maximum capacity
from 2^23 to 2^26 since 2^23 was sometimes insufficient!) This manner
of computing hash2 is problematic because it increases the risk of long
chains for very large tables, since there is less variation in the hash2
result due to the zero padding.
So this patch changes the hash2 computation by using the low bits of
hash0 instead of shifting it around, thus avoiding 0 bits in parts of
the hash2 value that are significant.
Note that this changes what hash2 is in all cases except when the table
capacity is exactly 2^16, so it does change our hashing characteristics.
For tables with capacity less than 2^16, it should be using a different
second hash, but with the same amount of random-ish data. For tables
with capacity greater than 2^16, it should be using more random-ish
data.
MozReview-Commit-ID: JvnxAMBY711
--HG--
extra : transplant_source : %8A%25%FB%E3H%B8_%F1G%F6%3E%0B%29%DF%20%FF%D8%E1%AEw
PLDHashTable's entry store has two types of unoccupied entries: free
entries and removed entries. The search of a chain of entries
(determined by the hash value) in the entry store to search for an entry
can stop at free entries, but it continues across removed entries,
because removed entries are entries that may have been skipped over when
we were adding the value we're searching for to the hash, but have since
been removed. For live entries, we also maintain this distinction by
using one bit of storage for a collision flag, which notes that if the
hashtable entry is removed, its place in the entry store must become a
removed entry rather than a free entry.
When we add a new entry to the table, Add's semantics require that we
return an existing entry if there is one, and only create a new entry if
no existing entry exists. (Bug 1352198 suggests the possibility of a
faster alternative Add API where the caller guarantees that the key is
not already in the hashtable.) When we search for the existing entry,
we must thus continue the search across removed entries, even though we
record the first removed entry found to return if the search for an
existing entry fails.
The existing code adds the collision flag through the entire table
search during an Add. This patch changes that behavior so that we only
add the collision flag prior to finding the first removed entry. Adding
it after we find the first removed entry is unnecessary, since we are
not making that entry part of a path to a new entry. If it is part of a
path to an existing entry, it will already have the collision flag set.
This patch effectively puts an if (!firstRemoved) around the else branch
of the if (MOZ_UNLIKELY(EntryIsRemoved(entry))), and then refactors that
condition outwards since it is now around the contents of both the if
and else branches.
MozReview-Commit-ID: CsXnMYttHVy
--HG--
extra : transplant_source : 0T%B0%FA%C0%85v%8B%16%E7%81%03p%F5K%97%B1%9E%92%27