Bug 1058335 (part 2) - Remove unneeded comments and always-ignored warnings about chaining. r=roc.

--HG--
extra : rebase_source : d96d6beabd48da66ae991274b66e93f5d193c61e
This commit is contained in:
Nicholas Nethercote 2014-08-25 17:43:57 -07:00
Родитель 00a572368f
Коммит 98d51d576f
2 изменённых файлов: 8 добавлений и 76 удалений

Просмотреть файл

@ -243,16 +243,6 @@ MOZ_ALWAYS_INLINE bool
PLDHashTable::Init(const PLDHashTableOps* aOps, void* aData,
uint32_t aEntrySize, const fallible_t&, uint32_t aLength)
{
#ifdef DEBUG
if (aEntrySize > 16 * sizeof(void*)) {
printf_stderr(
"pldhash: for the table at address %p, the given aEntrySize"
" of %lu definitely favors chaining over double hashing.\n",
(void*)this,
(unsigned long) aEntrySize);
}
#endif
if (aLength > PL_DHASH_MAX_INITIAL_LENGTH) {
return false;
}

Просмотреть файл

@ -167,72 +167,14 @@ typedef size_t (*PLDHashSizeOfEntryExcludingThisFun)(
* on most architectures, and may be allocated on the stack or within another
* structure or class (see below for the Init and Finish functions to use).
*
* To decide whether to use double hashing vs. chaining, we need to develop a
* trade-off relation, as follows:
*
* Let alpha be the load factor, esize the entry size in words, count the
* entry count, and pow2 the power-of-two table size in entries.
*
* (PLDHashTable overhead) > (PLHashTable overhead)
* (unused table entry space) > (malloc and .next overhead per entry) +
* (buckets overhead)
* (1 - alpha) * esize * pow2 > 2 * count + pow2
*
* Notice that alpha is by definition (count / pow2):
*
* (1 - alpha) * esize * pow2 > 2 * alpha * pow2 + pow2
* (1 - alpha) * esize > 2 * alpha + 1
*
* esize > (1 + 2 * alpha) / (1 - alpha)
*
* This assumes both tables must keep keyHash, key, and value for each entry,
* where key and value point to separately allocated strings or structures.
* If key and value can be combined into one pointer, then the trade-off is:
*
* esize > (1 + 3 * alpha) / (1 - alpha)
*
* If the entry value can be a subtype of PLDHashEntryHdr, rather than a type
* that must be allocated separately and referenced by an entry.value pointer
* member, and provided key's allocation can be fused with its entry's, then
* k (the words wasted per entry with chaining) is 4.
*
* To see these curves, feed gnuplot input like so:
*
* gnuplot> f(x,k) = (1 + k * x) / (1 - x)
* gnuplot> plot [0:.75] f(x,2), f(x,3), f(x,4)
*
* For k of 2 and a well-loaded table (alpha > .5), esize must be more than 4
* words for chaining to be more space-efficient than double hashing.
*
* Solving for alpha helps us decide when to shrink an underloaded table:
*
* esize > (1 + k * alpha) / (1 - alpha)
* esize - alpha * esize > 1 + k * alpha
* esize - 1 > (k + esize) * alpha
* (esize - 1) / (k + esize) > alpha
*
* alpha < (esize - 1) / (esize + k)
*
* Therefore double hashing should keep alpha >= (esize - 1) / (esize + k),
* assuming esize is not too large (in which case, chaining should probably be
* used for any alpha). For esize=2 and k=3, we want alpha >= .2; for esize=3
* and k=2, we want alpha >= .4. For k=4, esize could be 6, and alpha >= .5
* would still obtain.
*
* The current implementation uses a lower bound of 0.25 for alpha when
* deciding whether to shrink the table (while still respecting
* PL_DHASH_MIN_CAPACITY).
*
* Note a qualitative difference between chaining and double hashing: under
* chaining, entry addresses are stable across table shrinks and grows. With
* double hashing, you can't safely hold an entry pointer and use it after an
* ADD or REMOVE operation, unless you sample aTable->mGeneration before adding
* or removing, and compare the sample after, dereferencing the entry pointer
* only if aTable->mGeneration has not changed.
*
* The moral of this story: there is no one-size-fits-all hash table scheme,
* but for small table entry size, and assuming entry address stability is not
* required, double hashing wins.
* There used to be a long, math-heavy comment here about the merits of
* double hashing vs. chaining; it was removed in bug 1058335. In short, double
* hashing is more space-efficient unless the element size gets large (in which
* case you should keep using double hashing but switch to using pointer
* elements). Also, with double hashing, you can't safely hold an entry pointer
* and use it after an ADD or REMOVE operation, unless you sample
* aTable->mGeneration before adding or removing, and compare the sample after,
* dereferencing the entry pointer only if aTable->mGeneration has not changed.
*/
struct PLDHashTable
{