Various bits depend on RefPtr.h to provide RefCounted<T> and RefPtr<T>.
It will be easier to manage an automatic conversion from RefPtr<T> to
nsRefPtr<T> if we split out the dependency on RefCounted<T> first.
Jemalloc 4 purges dirty pages regularly during free() when the ratio of dirty
pages compared to active pages is higher than 1 << lg_dirty_mult. We set
lg_dirty_mult in jemalloc_config to limit RSS usage, but it also has an impact
on performance.
So instead of enforcing a high ratio to force more pages being purged, we keep
jemalloc's default ratio of 8, and force a regular purge of all dirty pages,
after cycle collection.
Keeping jemalloc's default ratio avoids cycle-collection-triggered purge to
have to go through really all dirty pages when there are a lot, in which case
the normal jemalloc purge during free() will already have kicked in. It also
takes care of everything that doesn't run the cycle collector still having
a level of purge, like plugins in the plugin-container.
At the same time, since jemalloc_purge_freed_pages does nothing with jemalloc 4,
repurpose the MEMORY_FREE_PURGED_PAGES_MS telemetry probe to track the time
spent in this cycle-collector-triggered purge.
The patch removes 455 occurrences of FAIL_ON_WARNINGS from moz.build files, and
adds 78 instances of ALLOW_COMPILER_WARNINGS. About half of those 78 are in
code we control and which should be removable with a little effort.
--HG--
extra : rebase_source : 82e3387abfbd5f1471e953961d301d3d97ed2973
Now that we have moz.build, we can be guaranteed that any flags we add
in moz.build will be added after everything else has been setup. So any
uses of MODULE_OPTIMIZE_FLAGS can be moved to moz.build's
CFLAGS/CXXFLAGS without any unusual repercussions. We do have to verify
that MOZ_OPTIMIZE is in effect, though.
The bulk of this commit was generated by running:
run-clang-tidy.py \
-checks='-*,llvm-namespace-comment' \
-header-filter=^/.../mozilla-central/.* \
-fix
This adds a new option --clamp-contents to dmd.py. This replaces every value
contained in the memory contents in the log with a pointer to the start of a live
block, if the value is a pointer into the middle of that block. All other values
are replaced with 0. This conservative analysis makes it easier to determine
which blocks point to other blocks.
This implements a new "scan" mode for DMD that records the address
and contents of every live unsampled block in the DMD log. This
enables the low-level analysis of references from one block to
another, which can help leak investigations.
Bug 1168719 added a generic replace malloc library which name happened to be
the same as the existing dummy library used to link replace malloc on OSX.
Change the name of that dummy library.
They are kept around for the sake of the standalone glue, which is used
for e.g. webapprt, which doesn't have direct access to jemalloc, and thus
still needs a wrapper to go through the xpcom function list and get to
jemalloc from there.
Sometimes, at least on Linux, DMDFuncs::sSingleton's static initializer
(in libxul) was being called before sDMDBridge's (in libdmd).
Thus sDMDBridge wasn't constructed yet in the path where its
address is taken, passed down through {replace_,}get_bridge to
ReplaceMallocBridge::Get, and its mVersion field is read.
This patch uses dynamic allocation, following what's done for other
globals in the same situation in this file.
Also, naming convention fix: leading "s" is for C++ class statics;
C-style static globals should be "g".
--HG--
extra : rebase_source : 4a6447760555aa11109749c612094ba1694b41f6
We need to use _impl variants within mozalloc.h when they are defined because
of how mozglue.dll is linked on Windows, where using malloc/free would use
the symbols from the MSVCRT instead of ours.
TlsGetValue has a semantic difference with pthread_getspecific, in that it
can return a non-error NULL value, so it always sets the LastError.
But allocator callers may not be expecting calling e.g. free() to change
the value of the last error, so preserve it.
The new "num" property lets identical blocks be aggregated in the output. This
patch only uses the "num" property for dead blocks, because that's where the
greatest potential benefit lies, but it could be used for live blocks as well.
On one test case (a complex PDF file) running with --mode=cumulative
--sample-below=1 this patch had the following effects.
- Change in running speed was negligible.
- Compressed output file size dropped from 8.8 to 5.0 MB.
- Compressed output file size dropped from 297 to 50 MB.
- dmd.py runtime (without stack fixing) dropped from 30 to 8 seconds.
--HG--
extra : rebase_source : 46a32058cd5c31cd823fe3f1accb5e68bcd320f3
The DEFINES and XPCOM_API changes are needed to get rid of "inconsistent dll
linkage" warnings on Windows builds.
--HG--
extra : rebase_source : 00756f51ebee85c70f65d51dbac17b4835262697
The infinite loop happens if chunk_alloc_arena needs to be called when a0malloc
is called. It in turn calls chunk_alloc_default, which uses tsd, which calls
a0malloc if it's the first time the tsd is being gotten from the current thread.
tsd only uses a0malloc on platforms where there is no native thread local storage
support, which, for Mozilla, essentially means anything that is not Linux.
But the tsd is only neededto get the dss precedence setting of the given arena.
That setting has no effect when dss is disabled, which it is on Windows and Mac.
Moreover, the default setting for dss precedence is "secondary", which means
jemalloc only tries dss after it failed to get memory with mmap/VirtualAlloc.
Considering the cases where mmap/VirtualAlloc would fail essentially means
there is shortage of address space, sbrk() is not going to have much more
success, so we might as well disable dss support on all platforms, avoiding
the infinite loop problem on Android and B2G as well.