Bug 1058335 (part 2) - Remove unneeded comments and always-ignored warnings about chaining. r=roc.

--HG-- extra : rebase_source : d96d6beabd48da66ae991274b66e93f5d193c61e
2014-08-25 17:43:57 -07:00 · 2014-08-25 17:43:57 -07:00 · 98d51d576f
--- a/xpcom/glue/pldhash.cpp
+++ b/xpcom/glue/pldhash.cpp
@ -243,16 +243,6 @@ MOZ_ALWAYS_INLINE bool
 PLDHashTable::Init(const PLDHashTableOps* aOps, void* aData,
                   uint32_t aEntrySize, const fallible_t&, uint32_t aLength)
 {
-#ifdef DEBUG
-  if (aEntrySize > 16 * sizeof(void*)) {
-    printf_stderr(
-      "pldhash: for the table at address %p, the given aEntrySize"
-      " of %lu definitely favors chaining over double hashing.\n",
-      (void*)this,
-      (unsigned long) aEntrySize);
-  }
-#endif
-
  if (aLength > PL_DHASH_MAX_INITIAL_LENGTH) {
    return false;
  }
--- a/xpcom/glue/pldhash.h
+++ b/xpcom/glue/pldhash.h
@ -167,72 +167,14 @@ typedef size_t (*PLDHashSizeOfEntryExcludingThisFun)(
 * on most architectures, and may be allocated on the stack or within another
 * structure or class (see below for the Init and Finish functions to use).
 *
- * To decide whether to use double hashing vs. chaining, we need to develop a
- * trade-off relation, as follows:
- *
- * Let alpha be the load factor, esize the entry size in words, count the
- * entry count, and pow2 the power-of-two table size in entries.
- *
- *   (PLDHashTable overhead)    > (PLHashTable overhead)
- *   (unused table entry space) > (malloc and .next overhead per entry) +
- *                                (buckets overhead)
- *   (1 - alpha) * esize * pow2 > 2 * count + pow2
- *
- * Notice that alpha is by definition (count / pow2):
- *
- *   (1 - alpha) * esize * pow2 > 2 * alpha * pow2 + pow2
- *   (1 - alpha) * esize        > 2 * alpha + 1
- *
- *   esize > (1 + 2 * alpha) / (1 - alpha)
- *
- * This assumes both tables must keep keyHash, key, and value for each entry,
- * where key and value point to separately allocated strings or structures.
- * If key and value can be combined into one pointer, then the trade-off is:
- *
- *   esize > (1 + 3 * alpha) / (1 - alpha)
- *
- * If the entry value can be a subtype of PLDHashEntryHdr, rather than a type
- * that must be allocated separately and referenced by an entry.value pointer
- * member, and provided key's allocation can be fused with its entry's, then
- * k (the words wasted per entry with chaining) is 4.
- *
- * To see these curves, feed gnuplot input like so:
- *
- *   gnuplot> f(x,k) = (1 + k * x) / (1 - x)
- *   gnuplot> plot [0:.75] f(x,2), f(x,3), f(x,4)
- *
- * For k of 2 and a well-loaded table (alpha > .5), esize must be more than 4
- * words for chaining to be more space-efficient than double hashing.
- *
- * Solving for alpha helps us decide when to shrink an underloaded table:
- *
- *   esize                     > (1 + k * alpha) / (1 - alpha)
- *   esize - alpha * esize     > 1 + k * alpha
- *   esize - 1                 > (k + esize) * alpha
- *   (esize - 1) / (k + esize) > alpha
- *
- *   alpha < (esize - 1) / (esize + k)
- *
- * Therefore double hashing should keep alpha >= (esize - 1) / (esize + k),
- * assuming esize is not too large (in which case, chaining should probably be
- * used for any alpha).  For esize=2 and k=3, we want alpha >= .2; for esize=3
- * and k=2, we want alpha >= .4.  For k=4, esize could be 6, and alpha >= .5
- * would still obtain.
- *
- * The current implementation uses a lower bound of 0.25 for alpha when
- * deciding whether to shrink the table (while still respecting
- * PL_DHASH_MIN_CAPACITY).
- *
- * Note a qualitative difference between chaining and double hashing: under
- * chaining, entry addresses are stable across table shrinks and grows.  With
- * double hashing, you can't safely hold an entry pointer and use it after an
- * ADD or REMOVE operation, unless you sample aTable->mGeneration before adding
- * or removing, and compare the sample after, dereferencing the entry pointer
- * only if aTable->mGeneration has not changed.
- *
- * The moral of this story: there is no one-size-fits-all hash table scheme,
- * but for small table entry size, and assuming entry address stability is not
- * required, double hashing wins.
+ * There used to be a long, math-heavy comment here about the merits of
+ * double hashing vs. chaining; it was removed in bug 1058335. In short, double
+ * hashing is more space-efficient unless the element size gets large (in which
+ * case you should keep using double hashing but switch to using pointer
+ * elements). Also, with double hashing, you can't safely hold an entry pointer
+ * and use it after an ADD or REMOVE operation, unless you sample
+ * aTable->mGeneration before adding or removing, and compare the sample after,
+ * dereferencing the entry pointer only if aTable->mGeneration has not changed.
 */
 struct PLDHashTable
 {