Some profiling of tcampbell's React benchmark [1] shows 6.1 million calls to
js::frontend::GeneralTokenStreamChars<..>::getFullAsciiCodePoint, using 110.57
million instructions (x86_64). That comes out to only 18 insns per call,
which suggests the method is a good candidate for inlining, but it isn't
inlined.
Merely parking an inline annotation on it doesn't help much, because it gets
inlined into js::frontend::TokenStreamSpecific<..>::getCodePoint, but that
isn't inlined into *its* caller(s), so the 6.1 million calls move to
::getCodePoint instead.
This patch requests inlining for both ::getFullAsciiCodePoint and
::getCodePoint and adds some MOZ_NEVER_INLINE annotations to ensure that cold
paths *don't* get inlined into these two, to reduce code bloat and avoid
excessive register pressure.
IsAscii functions in mfbt/TextUtils.h have been marked inline as a precaution;
this probably isn't necessary.
Run time with config [2] is reduced from 0.390 seconds to 0.379 seconds
(2.8% speedup, best of 50 runs), and from 0.402 to 0.396 seconds
(median of 50 runs).
Instruction count falls from 3511.8 million to 3395.8 million, and the number
of data accesses from 1563.7 million to 1487.4 million -- a 4.8% reduction
that is probably caused by avoidance of save/restore sequences in the inlined
fns.
[1] https://github.com/mozilla-spidermonkey/matrix-react-bench
[2] Fedora 35, x86_64, Intel Core i5 1135G7 at 4 ish GHz
configure: --disable-debug --enable-optimize="-g -O2"
run: --no-threads
Differential Revision: https://phabricator.services.mozilla.com/D159500
The only uses of this method were removed in Part 1, meaning that it can
now be removed. Support for this method adds a significant amount of
complexity to `BufferList` and IPC serialization.
Differential Revision: https://phabricator.services.mozilla.com/D154439
Deletion of mutation observers from a list resulted in O(n^2) behavior and could lead to massive freezes.
This is resolved by using a LinkedList instead, reducing complexity to O(n).
A safely iterable doubly linked list was implemented based on `mozilla::DoublyLinkedList`,
allowing to insert and remove elements while iterating the list.
Due to the nature of `mozilla::DoublyLinkedList`, every Mutation Observer now inherits `mozilla::DoublyLinkedListElement<T>`.
This implies that a Mutation Observer can only be part of one DoublyLinkedList.
This conflicts with some Mutation Observers, which are being added to multiple `nsINode`s.
To continue supporting this, new MutationObserver base classes `nsMultiMutationObserver` and `nsStubMultiMutationObserver` are introduced,
which create `MutationObserverWrapper` objects each time they are added to a `nsINode`.
The wrapper objects forward every call to the actual observer.
Differential Revision: https://phabricator.services.mozilla.com/D157031
The next patch is creating a cache which is capable of looking up different kind
of string types. However, each string type need some contextual information to
be able to compare them against each others, which adds complexity to the lookup
type. In addition, the keys are of only one string type, and therefore we try to
avoid storing this context as part of each key, but instead provide it with the
contextual information coming with the Lookup type.
Therefore, when we want to insert a key, which might already be present, using
`put`. We have to provide a `aLookup` argument which knows how to compare keys.
This also make the interface similar to `putNew` which already has the
distinctions between the `Lookup` argument and the `KeyInput` argument.
Differential Revision: https://phabricator.services.mozilla.com/D154512
All JSONWriteFuncs are effectively final, this patch enforces that, hopefully
helping the compiler to de-virtualize some calls.
Differential Revision: https://phabricator.services.mozilla.com/D154619
mWriter is now a reference, and the ownership is optional through a separate
member variable that could stay null.
User can now choose to keep the JSONWriteFunc on their stack, which saves a
heap allocation, and makes it easier to access the concrete JSONWriteFunc
implementation directly (instead of through WriteFunc()).
Differential Revision: https://phabricator.services.mozilla.com/D154617
mWriter is never null (and lots of calls just dereference it without checking),
so we may as well enforce it:
- The constructor MOZ_RELEASE_ASSERTs that it's not null.
- The accessor WriteFunc() returns a reference instead of a scary raw pointer.
(Note that we can't make mWriter a NotNull<...>, because the next patch will
give the option to keep that owning pointer null.)
Differential Revision: https://phabricator.services.mozilla.com/D154616
All JSONWriteFuncs are effectively final, this patch enforces that, hopefully
helping the compiler to de-virtualize some calls.
Depends on D154618
Differential Revision: https://phabricator.services.mozilla.com/D154619
mWriter is now a reference, and the ownership is optional through a separate
member variable that could stay null.
User can now choose to keep the JSONWriteFunc on their stack, which saves a
heap allocation, and makes it easier to access the concrete JSONWriteFunc
implementation directly (instead of through WriteFunc()).
Depends on D154616
Differential Revision: https://phabricator.services.mozilla.com/D154617
mWriter is never null (and lots of calls just dereference it without checking),
so we may as well enforce it:
- The constructor MOZ_RELEASE_ASSERTs that it's not null.
- The accessor WriteFunc() returns a reference instead of a scary raw pointer.
(Note that we can't make mWriter a NotNull<...>, because the next patch will
give the option to keep that owning pointer null.)
Differential Revision: https://phabricator.services.mozilla.com/D154616
This patch moves EqualsIgnoreCase to ns[T]StringObsolete, and removes
the aCount argument, instead migrating callers to use `StringBeginsWith`
with a case-insensitive comparator.
In addition, nsTStringRepr::Compare was removed and replaced with either
calls to methods like `StringBeginsWith` or the global `Compare` method.
These changes required some modifications at call-sites but should make
the behaviour less surprising and more consistent.
Differential Revision: https://phabricator.services.mozilla.com/D148299
I split this out into its own commit because it's a bit awkward to go back and
shuffle the old code around. If you'd like me to apply it to the history
though, just let me know.
This patch just moves all of the AVX2 code out from SIMD.cpp into SIMD_avx2.cpp
and removes the -mavx2 flag when compiling SIMD.cpp. On try this removes the
failure on M1 hardware when running the x64 binary.
Differential Revision: https://phabricator.services.mozilla.com/D152920
This only makes sense for AVX2, because widening it from a 64-bit comparison
to a 128-bit comparison is hardly worth it, and there are gaps in the SSE2
instruction set (missing _mm_cmpeq_epi64, which is introduced in SSE4.1) that
would require us to compensate and probably take a sizeable perf hit.
Differential Revision: https://phabricator.services.mozilla.com/D152297
This showed a modest improvement in the geomean of my benchmarking, but
importantly it showed a consistent and relatively strong improvement across
all of the cases which I would guess are more realistic. Notably this change
makes it perform better at iteratively searching for the next occurrence of X
in the HTML of a large web page.
Differential Revision: https://phabricator.services.mozilla.com/D152296
These were the last remaining JSON whitespace characters, so we can now our
regression tests can check that there are non of these left.
Differential Revision: https://phabricator.services.mozilla.com/D152607
This only makes sense for AVX2, because widening it from a 64-bit comparison
to a 128-bit comparison is hardly worth it, and there are gaps in the SSE2
instruction set (missing _mm_cmpeq_epi64, which is introduced in SSE4.1) that
would require us to compensate and probably take a sizeable perf hit.
Differential Revision: https://phabricator.services.mozilla.com/D152297
This showed a modest improvement in the geomean of my benchmarking, but
importantly it showed a consistent and relatively strong improvement across
all of the cases which I would guess are more realistic. Notably this change
makes it perform better at iteratively searching for the next occurrence of X
in the HTML of a large web page.
Differential Revision: https://phabricator.services.mozilla.com/D152296
A custom defintion wrapping fu2::function_base is used to customize the
inline buffer's size and alignment to make it compatible with nsTArray.
Without the custom wrapper, `alignof(max_align_t)` is used, which is
larger than nsTArray's max alignment on some platforms.
Differential Revision: https://phabricator.services.mozilla.com/D145691