Correctness improvements:
* UTF errors are handled safely per spec instead of dangerously truncating
strings.
* There are fewer converter implementations.
Performance improvements:
* The old code did exact buffer length math, which meant doing UTF math twice
on each input string (once for length calculation and another time for
conversion). Exact length math is more complicated when handling errors
properly, which the old code didn't do. The new code does UTF math on the
string content only once (when converting) but risks allocating more than
once. There are heuristics in place to lower the probability of
reallocation in cases where the double math avoidance isn't enough of a
saving to absorb an allocation and memcpy.
* Previously, in UTF-16 <-> UTF-8 conversions, an ASCII prefix was optimized
but a single non-ASCII code point pessimized the rest of the string. The
new code tries to get back on the fast ASCII path.
* UTF-16 to Latin1 conversion guarantees less about handling of out-of-range
input to eliminate an operation from the inner loop on x86/x86_64.
* When assigning to a pre-existing string, the new code tries to reuse the
old buffer instead of first releasing the old buffer and then allocating a
new one.
* When reallocating from the new code, the memcpy covers only the data that
is part of the logical length of the old string instead of memcpying the
whole capacity. (For old callers old excess memcpy behavior is preserved
due to bogus callers. See bug 1472113.)
* UTF-8 strings in XPConnect that are in the Latin1 range are passed to
SpiderMonkey as Latin1.
New features:
* Conversion between UTF-8 and Latin1 is added in order to enable faster
future interop between Rust code (or otherwise UTF-8-using code) and text
node and SpiderMonkey code that uses Latin1.
MozReview-Commit-ID: JaJuExfILM9
Summary:
This is a streamlined version of perfecthash.py from xpcom/reflect/xptinfo.
There are a few major changes here:
Instead of providing '(key, value)' pairs to the constructor, callers provide a
list of 'entries'. An optional 'key' parameter allows overriding the function
used to get an entry's key. The default behaviour is as before, destructuring a
(key, value) tuple.
Keys can now be anything which supports the 'memoryview' protocol with 1-byte
elements, rather than being forced to be a 'bytearray'. This allows passing in
types such as bytestrings.
A new 'cxx_codegen' method is now exposed which generates a series of
fully-contained C++ declarations to perform entry lookups. It is possible to
override many parts of this codegen to make it the best fit for your component.
PerfectHash remembers the 'key' function, meaning lookup methods automatically
verify a valid key was passed in, and return 'None' otherwise.
Depends On D2615
Reviewers: froydnj!
Tags: #secure-revision
Bug #: 1479484
Differential Revision: https://phabricator.services.mozilla.com/D2616
Summary:
The plan is to also expose perfecthash.py from this module on the python path.
This also allows us to stop using explicit module loading to load make_dafsa.py.
make_dafsa.py was moved into tools/ to avoid any extra python files from
accidentally ending up on the python path.
Reviewers: froydnj!
Tags: #secure-revision
Bug #: 1479484
Differential Revision: https://phabricator.services.mozilla.com/D2614
--HG--
rename : xpcom/ds/make_dafsa.py => xpcom/ds/tools/make_dafsa.py
Summary:
This is done so we can use Array as the name for the new nsTArray-based
type, rather than having to come up with a new name.
LegacyArray was chosen as the [array] attribute is now effectively deprecated,
and we'd like to remove it ASAP.
Depends On D2334
Reviewers: mccr8!
Tags: #secure-revision
Bug #: 1474369
Differential Revision: https://phabricator.services.mozilla.com/D2335
Currently we have three ways of representing hash values.
- uint32_t: used in HashFunctions.h.
- PLDHashNumber: defined in PLDHashTable.{h,cpp}.
- js::HashNumber: defined in js/public/Utility.h.
Functions that create hash values with functions from HashFunctions.h use a mix
of these three types. It's a bit of a mess.
This patch introduces mozilla::HashNumber, and redefines PLDHashNumber and
js::HashNumber as synonyms. It also changes HashFunctions.h to use
mozilla::HashNumber throughout instead of uint32_t.
This leaves plenty of places that still use uint32_t that should use
mozilla::HashNumber or one of its synonyms, but I didn't want to tackle that
now.
The patch also:
- Does similar things for the constants defining the number of bits in each
hash number type.
- Moves js::HashNumber from Utility.h to HashTable.h, which is a better spot
for it. (This required changing the signature of ScrambleHashCode(); that's
ok, it'll get moved by the next patch anyway.)
MozReview-Commit-ID: EdoWlCm7OUC
--HG--
extra : rebase_source : 5b92c0c3560eb56850cd8832f8ee514d25e3c16f
Everything that goes in a PLDHashtable (and its derivatives, like
nsTHashtable) needs to inherit from PLDHashEntryHdr. But through a lack
of enforcement, copy constructors for these derived classes didn't
explicitly invoke the copy constructor for PLDHashEntryHdr (and the
compiler didn't invoke the copy constructor for us). Instead,
PLDHashTable explicitly copied around the bits that the copy constructor
would have.
The current setup has two problems:
1) Derived classes should be using move construction, not copy
construction, since anything that's shuffling hash table keys/entries
around will be using move construction.
2) Derived classes should take responsibility for transferring bits of
superclass state around, and not rely on something else to handle
that.
The second point is not a huge problem for PLDHashTable (PLDHashTable
only has to copy PLDHashEntryHdr's bits in a single place), but future
hash table implementations that might move entries around more
aggressively would have to insert compensation code all over the place.
Additionally, if moving entries is implemented via memcpy (which is
quite common), PLDHashTable copying around bits *again* is inefficient.
Let's fix all these problems in one go, by:
1) Explicitly declaring the set of constructors that PLDHashEntryHdr
implements (and does not implement). In particular, the copy
constructor is deleted, so any derived classes that attempt to make
themselves copyable will be detected at compile time: the compiler
will complain that the superclass type is not copyable.
This change on its own will result in many compiler errors, so...
2) Change any derived classes to implement move constructors instead
of copy constructors. Note that some of these move constructors are,
strictly speaking, unnecessary, since the relevant classes are moved
via memcpy in nsTHashtable and its derivatives.
Summary:
This is done so we can use Array as the name for the new nsTArray-based
type, rather than having to come up with a new name.
LegacyArray was chosen as the [array] attribute is now effectively deprecated,
and we'd like to remove it ASAP.
Depends On D2334
Reviewers: mccr8!
Tags: #secure-revision
Bug #: 1474369
Differential Revision: https://phabricator.services.mozilla.com/D2335
When removing an element that exists in the array, there's no need to
perform extra bounds checks for actually removing the element; the
presence of the element guarantees that we have a valid index and a
valid range to remove. So split RemoveElementsAt into a safe interface
and an unsafe interface, and let RemoveElement call the latter as an
internal optimization. The same logic applies to RemoveElementSorted.
We call RemoveElement depressingly often, so this is a nice little win.
Summary:
This is done so we can use Array as the name for the new nsTArray-based
type, rather than having to come up with a new name.
LegacyArray was chosen as the [array] attribute is now effectively deprecated,
and we'd like to remove it ASAP.
Depends On D2334
Reviewers: mccr8!
Tags: #secure-revision
Bug #: 1474369
Differential Revision: https://phabricator.services.mozilla.com/D2335
This speeds up BenchCollections.PLDHash as follows:
> succ_lookups fail_lookups insert_remove iterate
> 42.8 ms 51.6 ms 21.0 ms 34.9 ms
> 41.8 ms 51.9 ms 20.0 ms 34.6 ms
> 41.6 ms 51.3 ms 19.3 ms 34.5 ms
> 41.6 ms 51.3 ms 19.8 ms 35.0 ms
> 41.7 ms 50.8 ms 20.5 ms 35.2 ms
After:
> succ_lookups fail_lookups insert_remove iterate
> 37.7 ms 33.1 ms 19.7 ms 35.0 ms
> 37.0 ms 32.5 ms 19.1 ms 35.4 ms
> 37.6 ms 33.6 ms 19.2 ms 36.2 ms
> 36.7 ms 33.3 ms 19.1 ms 35.3 ms
> 37.1 ms 33.1 ms 19.1 ms 35.0 ms
Successful lookups are about 1.13x faster, failing lookups are about 1.54x
faster, and insertions/removals are about 1.05x faster.
On Linux64, this increases the size of libxul (as measured by `size`) by a mere
16 bytes.
--HG--
extra : rebase_source : 4bfdee68ba40a0cdd5396964d1647a70fb10736a
Our factory registrations already require that we store nsID pointers, which
we generally handle by using pointers to static data, or arena allocating a
copy of a dynamic ID.
Since we already have viable pointers to these IDs, there's no reason to store
an entire second copy for our hash key. We can use the pointer, instead, which
saves 16 bytes per entry.
MozReview-Commit-ID: 6MgKrXRSHv4
--HG--
extra : source : 5fb0b7746a5d56563b471e3061ccca124ea45485
extra : absorb_source : 275f5d4dc2c02e3d0391ed16e8690dac1e601758