зеркало из https://github.com/mozilla/gecko-dev.git
Bug 1058335 (part 2) - Remove unneeded comments and always-ignored warnings about chaining. r=roc.
--HG-- extra : rebase_source : d96d6beabd48da66ae991274b66e93f5d193c61e
This commit is contained in:
Родитель
00a572368f
Коммит
98d51d576f
|
@ -243,16 +243,6 @@ MOZ_ALWAYS_INLINE bool
|
|||
PLDHashTable::Init(const PLDHashTableOps* aOps, void* aData,
|
||||
uint32_t aEntrySize, const fallible_t&, uint32_t aLength)
|
||||
{
|
||||
#ifdef DEBUG
|
||||
if (aEntrySize > 16 * sizeof(void*)) {
|
||||
printf_stderr(
|
||||
"pldhash: for the table at address %p, the given aEntrySize"
|
||||
" of %lu definitely favors chaining over double hashing.\n",
|
||||
(void*)this,
|
||||
(unsigned long) aEntrySize);
|
||||
}
|
||||
#endif
|
||||
|
||||
if (aLength > PL_DHASH_MAX_INITIAL_LENGTH) {
|
||||
return false;
|
||||
}
|
||||
|
|
|
@ -167,72 +167,14 @@ typedef size_t (*PLDHashSizeOfEntryExcludingThisFun)(
|
|||
* on most architectures, and may be allocated on the stack or within another
|
||||
* structure or class (see below for the Init and Finish functions to use).
|
||||
*
|
||||
* To decide whether to use double hashing vs. chaining, we need to develop a
|
||||
* trade-off relation, as follows:
|
||||
*
|
||||
* Let alpha be the load factor, esize the entry size in words, count the
|
||||
* entry count, and pow2 the power-of-two table size in entries.
|
||||
*
|
||||
* (PLDHashTable overhead) > (PLHashTable overhead)
|
||||
* (unused table entry space) > (malloc and .next overhead per entry) +
|
||||
* (buckets overhead)
|
||||
* (1 - alpha) * esize * pow2 > 2 * count + pow2
|
||||
*
|
||||
* Notice that alpha is by definition (count / pow2):
|
||||
*
|
||||
* (1 - alpha) * esize * pow2 > 2 * alpha * pow2 + pow2
|
||||
* (1 - alpha) * esize > 2 * alpha + 1
|
||||
*
|
||||
* esize > (1 + 2 * alpha) / (1 - alpha)
|
||||
*
|
||||
* This assumes both tables must keep keyHash, key, and value for each entry,
|
||||
* where key and value point to separately allocated strings or structures.
|
||||
* If key and value can be combined into one pointer, then the trade-off is:
|
||||
*
|
||||
* esize > (1 + 3 * alpha) / (1 - alpha)
|
||||
*
|
||||
* If the entry value can be a subtype of PLDHashEntryHdr, rather than a type
|
||||
* that must be allocated separately and referenced by an entry.value pointer
|
||||
* member, and provided key's allocation can be fused with its entry's, then
|
||||
* k (the words wasted per entry with chaining) is 4.
|
||||
*
|
||||
* To see these curves, feed gnuplot input like so:
|
||||
*
|
||||
* gnuplot> f(x,k) = (1 + k * x) / (1 - x)
|
||||
* gnuplot> plot [0:.75] f(x,2), f(x,3), f(x,4)
|
||||
*
|
||||
* For k of 2 and a well-loaded table (alpha > .5), esize must be more than 4
|
||||
* words for chaining to be more space-efficient than double hashing.
|
||||
*
|
||||
* Solving for alpha helps us decide when to shrink an underloaded table:
|
||||
*
|
||||
* esize > (1 + k * alpha) / (1 - alpha)
|
||||
* esize - alpha * esize > 1 + k * alpha
|
||||
* esize - 1 > (k + esize) * alpha
|
||||
* (esize - 1) / (k + esize) > alpha
|
||||
*
|
||||
* alpha < (esize - 1) / (esize + k)
|
||||
*
|
||||
* Therefore double hashing should keep alpha >= (esize - 1) / (esize + k),
|
||||
* assuming esize is not too large (in which case, chaining should probably be
|
||||
* used for any alpha). For esize=2 and k=3, we want alpha >= .2; for esize=3
|
||||
* and k=2, we want alpha >= .4. For k=4, esize could be 6, and alpha >= .5
|
||||
* would still obtain.
|
||||
*
|
||||
* The current implementation uses a lower bound of 0.25 for alpha when
|
||||
* deciding whether to shrink the table (while still respecting
|
||||
* PL_DHASH_MIN_CAPACITY).
|
||||
*
|
||||
* Note a qualitative difference between chaining and double hashing: under
|
||||
* chaining, entry addresses are stable across table shrinks and grows. With
|
||||
* double hashing, you can't safely hold an entry pointer and use it after an
|
||||
* ADD or REMOVE operation, unless you sample aTable->mGeneration before adding
|
||||
* or removing, and compare the sample after, dereferencing the entry pointer
|
||||
* only if aTable->mGeneration has not changed.
|
||||
*
|
||||
* The moral of this story: there is no one-size-fits-all hash table scheme,
|
||||
* but for small table entry size, and assuming entry address stability is not
|
||||
* required, double hashing wins.
|
||||
* There used to be a long, math-heavy comment here about the merits of
|
||||
* double hashing vs. chaining; it was removed in bug 1058335. In short, double
|
||||
* hashing is more space-efficient unless the element size gets large (in which
|
||||
* case you should keep using double hashing but switch to using pointer
|
||||
* elements). Also, with double hashing, you can't safely hold an entry pointer
|
||||
* and use it after an ADD or REMOVE operation, unless you sample
|
||||
* aTable->mGeneration before adding or removing, and compare the sample after,
|
||||
* dereferencing the entry pointer only if aTable->mGeneration has not changed.
|
||||
*/
|
||||
struct PLDHashTable
|
||||
{
|
||||
|
|
Загрузка…
Ссылка в новой задаче