We create a new helper class that rounds up allocations sizes and
categorizes them. Compilers are smart enough to elide what they don't
need, like in malloc_good_size, they elide the code related to the
class type enum.
--HG--
extra : rebase_source : 61381e600587b045e720a85a7b46673edeb691b9
Because of alignment issues due to the system glibc when running the
SSE2 gcov code generated during the PGO profile gen phase, Firefox
crashes when running the PGO profile. We work around the issue by
disabling SSE2 when building mozjemalloc during that phase. That
shouldn't affect the coverage data anyways, which is bound to the
original C++ code, and the profile-use code generation will still emit
SSE2 based on the coverage data if it needs to.
--HG--
extra : rebase_source : 3596fdc795cdef0789f3a2dd8f10b42cde00430f
We introduce the notion of private arenas, separate from other arenas
(main and thread-local). They are kept in a separate arena tree, and
arena lookups from moz_arena_* functions only access the tree of
private arenas. Iteration still goes through all arenas, private and
non-private.
--HG--
extra : rebase_source : 86c43c7c920b01eb6fa1fa214d612fd9220eac3e
We create the ArenaCollection class to handle operations on the
arena tree. Ideally, iter() would trigger locking, but the
prefork/postfork code complicates things, so we leave this for later.
--HG--
extra : rebase_source : bd7021098baf0ec01c14063294098edea4473d36
Note we use a local variable for fallible allocator because using plain
`new (fallible)` would require some figuring out for non-Firefox builds
(e.g. standalone js).
--HG--
extra : rebase_source : 2132f98ebc7e37a139b673f80631e672bcf8ed15
RedBlackTree::{Insert,Remove} allocate an object on the stack for its
RedBlackTreeNode, and that shouldn't have side effects if the type
happens to have a constructor. This will allow to add constructors to
some of the mozjemalloc types.
--HG--
extra : rebase_source : 14dbb7d73c86921701d83156186df5d645530dda
We introduce the notion of private arenas, separate from other arenas
(main and thread-local). They are kept in a separate arena tree, and
arena lookups from moz_arena_* functions only access the tree of
private arenas. Iteration still goes through all arenas, private and
non-private.
--HG--
extra : rebase_source : ec48631a4a65520892331c1fcd62db37ed35ba1d
We create the ArenaCollection class to handle operations on the
arena tree. Ideally, iter() would trigger locking, but the
prefork/postfork code complicates things, so we leave this for later.
--HG--
extra : rebase_source : 90c96575d65c920f75aa621ba119d354d1ce252a
Note we use the deprecated `new (fallible_t())` form because using `new
(fallible)` would require some figuring out for non-Firefox builds (e.g.
standalone js).
--HG--
extra : rebase_source : 0159d8476a1b5509330517c00af6c387d522722d
RedBlackTree::{Insert,Remove} allocate an object on the stack for its
RedBlackTreeNode, and that shouldn't have side effects if the type
happens to have a constructor. This will allow to add constructors to
some of the mozjemalloc types.
--HG--
extra : rebase_source : 14dbb7d73c86921701d83156186df5d645530dda
- In the cases where it's used on powers of 2, replace it with
FloorLog2() + 1.
- In the cases where it's used on any kind of number, replace it with
CountTrailingZeroes, which is `ffs(x) - 1`.
- In the case of tiny allocations in arena_t::MallocSmall, we rearrange
the code so that the intent is clearer, which also simplifies the
expression for the mBins offset: mBins[0] is the first tiny bucket,
for allocations of sizes 1 << TINY_MIN_2POW, mBins[1] for allocations
of size 1 << (TINY_MIN_2POW + 1), etc. up to small_min. So the offset
is really the log2 of the normalized size.
--HG--
extra : rebase_source : 954a655dcaa93857dc976078e133704bb141de0d
Comparing ffs(x) == ffs(y), when x and y are guaranteed to be powers of
2 (or 0, or 1), is the same as x == y.
--HG--
extra : rebase_source : d6cc3399d85fa9fda2559435e99adbfb82ac8da0
The sentinel was taking as much space as one element of the tree, while
only really used for its RedBlackTreeNode, wasting space.
This results in some decrease in struct sizes, for example on 64-bits
linux:
- arena_bin_t: 80 -> 56
- arena_t (excluding mBins): 224 -> 144
- arena_t + dynamic size of mBins: 3024 -> 2104
It also decreases the size of several globals:
- gChunksBySize, gChunksByAddress, huge: 64 -> 8
- gArenaTree: 312 -> 8
--HG--
extra : rebase_source : d5bb52f93e064ab4cca3fb07b2c5a77ce57fb7db
Interestingly, this turns single-instruction checks into
two-instructions checks (at least with GCC, from one cmpb to a movl
followed by a testl), but this is due to Atomic<bool> being actually
backed by a uint32_t, not by the use of atomics.
--HG--
extra : rebase_source : cfc0bec2113b44635120216b4abbbbbe9028b286
- First, MOZ_DIAGNOSTIC_ASSERT_ENABLED is always true when MOZ_DEBUG is
set, so don't check for MOZ_DEBUG.
- Second, all the magic number assertions should be
MOZ_DIAGNOSTIC_ASSERTs instead of MOZ_ASSERTs.
--HG--
extra : rebase_source : 5601cd13604e21c46a9f0ad8b0b4d6fc399b853e
Some need initialization to happen, some can be skipped when the
allocator was not initialized, and others should crash.
--HG--
extra : rebase_source : d6c2697ca27f6110fe52a067440a0583e0ed0ccd
Also rearrange some code accordingly, but don't fix indentation issues
just yet.
Also apply changes from the google-readability-braces-around-statements
check.
But don't apply the modernize-use-nullptr recommendation about
strerror_r because it's wrong (bug #1412214).
--HG--
extra : histedit_source : 2d61af7074fbdc5429902d9c095c69ea30261769
Backed out changeset f75f427fde1d
Backed out changeset 6278aa5fec1d
Backed out changeset eefc284bbf13
Backed out changeset e2b391ae4688
Backed out changeset 58070c2511c6
--HG--
extra : histedit_source : d14fa171a5cf4d9400cae7f94d5cc64a1e58b98d%2C856ad5b650074a1dcff2edb0b95adc20aaf38db3
Also rearrange some code accordingly, but don't fix indentation issues
just yet.
Also apply changes from the google-readability-braces-around-statements
check.
But don't apply the modernize-use-nullptr recommendation about
strerror_r because it's wrong (bug #1412214).
--HG--
extra : rebase_source : 28b07aca01fd0275127341233755852cccda22fe
This will make allocation operations return nullptr in the face of OOM,
allowing callers to either handle the allocation error or for the normal
OOM machinery, which also records the requested size, to kick in.
--HG--
extra : rebase_source : 723048645cb3f0db269c91f9d023bb06825a817b
On its own, this change is unnecessary. But it prepares for pages_commit
becoming fallible in next commit, allowing a possibly early return while
leaving metadata in a consistent state: runs are still available for
future use even if they couldn't be committed immediately.
--HG--
extra : rebase_source : 971e5c49882409c00ac61ec641d469dcc94e5cc2
Bug 1403843 made more things constant, but missed a few that don't
depend on the page size.
--HG--
extra : rebase_source : 036722744ff7054de9d081bde1f4c7b035fd9501
- Move variable declarations to their initialization.
- Remove gotos and use RAII.
--HG--
extra : rebase_source : 9d983452681edf63593d033727ba6faebe418afe
The chunk_recycle and chunk_record functions are never called with
different red-black trees than the globals, so just use them directly
instead of passing them as argument. The functions were already using
the associated global mutex anyways.
At the same time, rename them.
--HG--
extra : rebase_source : c45bb2e584c61b458eab4343562eb3a5a64543a3
Instead of calling it with a boolean indicating whether the call was for
base allocations or not, and return immediately if it was, avoid the
call altogether.
--HG--
extra : rebase_source : abb2a3d0eaefc16efd2e828f09a330ab2a3b8b1f
jemalloc_ptr_info takes an outparam, which makes it harder to use in a
debugger: you'd need to find some memory to use as outparam and pass
that in.
So for convenience, we add a non-exported symbol for use in debuggers,
which just returns a pointer to a static buffer for the result.
lldb:
(lldb) print *Debug::jemalloc_ptr_info($0)
(jemalloc_ptr_info_t) $1 = (tag = TagLiveSmall, addr=0x000000011841dd80, size = 160)
gdb:
(gdb) print *Debug::jemalloc_ptr_info($0)
$1 = {tag = TagLiveSmall, addr = 0x7f8e7ebd0dc0, size = 96}
windbg:
0:040> .call Debug::jemalloc_ptr_info(0x6187880)
Thread is set up for call, 'g' will execute.
WARNING: This can have serious side-effects,
including deadlocks and corruption of the debuggee.
0:040> g
.call returns:
struct jemalloc_ptr_info_t * 0x7501f3f4
+0x000 tag : 1 ( TagLiveSmall )
+0x004 addr : 0x06187880 Void
+0x008 size : 0x20
--HG--
extra : rebase_source : 09aedd48aabee3e273a17000a61b1d09cdd619b9
Bug 1378258 removed malloc_print_stats and bug 1379890 further removed
the subsequently unused arena stats. It turns out there are also some
huge stats that have been unused since bug 1378258, and that are still
there, so remove them.
--HG--
extra : rebase_source : ae71c7507143503dff8d2e517352a97eb53e4676
The way inlining is disabled in mozjemalloc is via a #define of "inline"
to nothing, which is a dubious way to do that. This makes the compiler
trigger warnings we -Werror on for some static functions. While there
are such functions in mozjemalloc.cpp that could be fixed by wrapping
them in the right #ifdefs, there are also others coming from headers,
and it's not something that can be fixed in a satisfactory way.
The right way to disable inlining is to pass the right compiler flags
for that. But inlining is the least of the problems to debug optimized
C++ code, so it feels like if debugging requires some optimization
tweaking, it should be done manually with compile flags when needed,
instead of fiddling with #defines to remove keywords.
--HG--
extra : rebase_source : 962c3409f86060c4d5ddf966778b58b64f89c31d
Bug 1403444 massively refactored the red-black tree code, with the
result of removing the warnings the old code was triggering. We can thus
remove the exceptions for those warnings now.
--HG--
extra : rebase_source : 76c7ce7a7282471399c7592601f6986bfb33b256
Ideally, we'd be reusing some Mutex class we have in Gecko, the base one
in mozglue/misc being the best candidate. However, the contraints in
mozjemalloc make that unconvenient:
- Can't have a constructor because malloc_init() would likely run before
it, and that would mean the mutexes would be re-initialized.
- Can't have a destructor because code will run after static
destructors, and some of that code likely will invoke the allocator,
and we can't have destructed mutexes by then.
- Can't use pthread_mutex on OSX because that loops back into the
allocator.
Accomodating the use of Gecko mutexes around those constraints would
mean much more code than just implementing a new mutex class, so the
latter is preferred.
--HG--
extra : rebase_source : d2e180a5007390c620aa6d7921340b9784c7699f
The malloc_spin_* functions have ended up being strictly identical to
the malloc_mutex_* functions, so use the latter instead of the former.
--HG--
extra : rebase_source : 746bdf57cb4a33fd65335174a748cb567630e05b
Now that the radix tree structure has a fixed size, we can just allocate
the chunk radix tree object statically.
--HG--
extra : rebase_source : 6a5f022d46da1b24401b197751e594903987b7f6
All the parameters of the radix tree (bits per level, height) are
derived from the aBits argument to ::Create in a straightforward way.
aBits itself is a constant at the call point, making them all constants,
so we can turn all of them as constants at compile time instead of
storing as data.
--HG--
extra : rebase_source : aa1be8e97ed4133d7fc106fb3ea678a759476bef
All levels except the first are using the same size, and in some cases,
even the first uses the same size. Only storing those two different
sizes allows to fix the class size, while not making the code
significantly more complex.
--HG--
extra : rebase_source : 8028c18de2fa84060c5baff7c95cd0a70e7a3c6b
The tree height was defined as:
height = aBits / bits_per_level;
if (height * bits_per_level != aBits) {
height++;
}
What's wanted here is a height that covers all the bits, where the first
level might cover less than bits_per_level.
So aBits / bits_per_level gets us the height covered by levels with
exactly bits_per_level bits. The tree height is one more when there
are remaining bits.
Put differently, we can write aBits as:
aBits = bits_per_level * x + y
with y < bits_per_level.
We have:
aBits / bits_per_level = x.
height = x when y = 0, and x + 1 when y > 0.
We're looking for a number z such that
height = (aBits + z) / bits_per_level.
Or:
height = (bits_per_level * x + y + z) / bits_per_level.
= x + (y + z) / bits_per_level.
So we're looking for a z such that
(y + z) / bits_per_level = 0 when y = 0
= 1 when y > 0
The properties of the integer division are such that the above means:
0 <= y + z < bits_per_level when y = 0
bits_per_level <= y + z < 2 * bits_per_level when y > 0
Which gives us:
0 <= z < bits_per_level
bits_per_level - y <= z < 2 * bits_per_level - y when y > 0
y being < bit_per_level per the constraint further above,
2 * bits_per_level - y > bits_per_level.
So all in all, we want a z such that
bits_per_level - y <= z < bits_per_level with 0 < y < bits_per_level
The largest value where this is true is z = bits_per_level - 1.
In summary,
height = (aBits + bits_per_level - 1) / bits_per_level
is the same as the height as originally defined.
With that formula, it's self evident that height * bits_per_level is
always >= aBits, so we remove the assertion.
--HG--
extra : rebase_source : 8ca2e5fbad7d4ad537f26508af5aa250483f1f08
bits_per_level was defined as:
ffs(pow2_ceil((kNodeSize / sizeof(void*)))) - 1
kNodeSize is (1U << 14) when SIZEOF_PTR is 4 (sizeof(void*) being the
same). Otherwise, it's CACHELINE, which is (1U << 6).
The most important part, though, is that it's always a power of 2.
And it's divided by sizeof(void*) which is always a power or 2.
The result of that division is thus always a power of 2, as long as
kNodeSize is larger than the size of a pointer, which it is.
The argument to pow2_ceil being a power of 2, pow2_ceil is a noop,
so it can go away. And the argument to ffs being a power of 2, it
returns one more than n that matches 1 << n == value. So overall
the expression returns the number of shifts for
kNodeSize / SIZEOF_PTR.
Transforming kNodeSize to a number of shifts/power of 2, the expression
can then be simplified as kNodeSize2Pow - SIZEOF_PTR_2POW.
--HG--
extra : rebase_source : a22a378ba6622e2a4fbcf28811c7042cea9da24a
The only semantic change is in the value returned by Set, which now
returns whether the value could be set or not.
--HG--
extra : rebase_source : a80f5d6fdb3672715887e69215f55df0cedb231e
There is a lot of redundancy between malloc_rtree_get and
malloc_rtree_set. Essentially, they both look up a slot, and either get
a value or set a value in that slot. malloc_rtree_get doesn't create a
tree path for the slot when it doesn't exist. And the
MALLOC_RTREE_GET_GENERATE macro machinery makes malloc_rtree_get retry
with a lock and validate both results agree in debug builds.
By introducing a malloc_rtree_get_slot function that returns a slot,
optionally creating a tree path to it, we remove the redundancy between
_get and _set, and we can avoid the macro machinery as well.
--HG--
extra : rebase_source : bbbdd33e81e8bfdc11c028f882ab877bba26f7f3
Bug 571209 made many different kinds of sizes static at build time, as
opposed to configurable at run-time. While the dynamic sizes can be
useful to quickly test tweaks to e.g. quantum sizes, a
replace-malloc-built allocator could just as well do the same. This
need, however, is very rare, and doesn't justify keeping the sizes
dynamic on platforms where static sizes can't be used for the page size
because page size may vary depending on kernel options.
So we make every size that doesn't depend on the page size static,
whether MALLOC_STATIC_SIZES is enabled or not.
This makes no practical difference on tier-1 platforms, except Android
aarch64, which will benefit from more static sizes.
--HG--
extra : rebase_source : 28243a67e4fe41154c23dc39b45405479854d31d
This was done in bug 1104634 because back then the Android NDK had a
broken combination of compiler and libc, where the compiler would emit
calls to the ffs function, but the libc wouldn't contain them, but only
when building without optimization.
Things have changed in the meanwhile, and recent NDK doesn't have this
problem. So we can remove the hack.
--HG--
extra : rebase_source : 22d6c279a60d0d23161ca1addd5b5e9a3411d8ab
Bug 1402174 made all arenas registered in a Red-Black tree. Which means
they are iterable through that tree, making the arenas list now redundant.
The list is also inconvenient, since it needs to be constantly
reallocated, and the allocator in charge of the list doesn't know how to
free things.
Iteration of arenas is not on any hot path anyways, so even though
iterating the RB tree is slower, it doesn't matter.
So we remove the arenas list, and keep a direct pointer to the main
arena for convenience (instead of calling First() on the RB tree every
time)
--HG--
extra : rebase_source : 31f12b2de18a886eb4f8f078e11040aad3fdc800
As we're going to enable stylo on Android at some point, we'll have to
have thread local arenas there, which means Android needs to be using
thread local storage. Since Android is the last use of NO_TLS in the
allocator code base, remove it.
--HG--
extra : rebase_source : 658cbc94b4478950f683bd104b7e5da27cd08a2e
The trivial expansion of macros ended up creating cases like
expr.IsRed() ? NodeColor::Red : NodeColor::Black
which practically speaking, is the same as
expr.Color()
so we replace those.
There are also a bunch of expr.IsRed() == false, which are replaced with
expr.IsBlack() (adding that method at the same time)
--HG--
extra : rebase_source : ab50212ff80f0c0151e7df329d8933ccd45f9781
While we're going in the opposite direction, moving away from macros,
upcoming intermediate steps are going to "manually" expand macros, but
later steps will require changing how the link field reference is done,
and having it in a single location then will be more convenient.
--HG--
extra : rebase_source : 6dde414ce392924081a41b7e3f66ae848cb14be5
That stack space would matter if recursion was involved, but there
isn't any, and a max of 1440 bytes temporarily allocated on the stack
is not really a problem.
--HG--
extra : rebase_source : 2968fafe9d604d9e6c03ac93c21d8a3a087043a4
All uses of rb_wrap have "static" as first argument to rb_wrap, move that
in the macro itself.
--HG--
extra : rebase_source : cbfe87d0539452c044b415c725cb7ce6ebb5628c
Things left for followups:
- Full cleanup of disposed arenas: bug 1364359.
- Random arena Ids: bug 1402282.
- Enforcing the arena to match on moz_arena_{realloc,free}: bug 1402283.
- Make it impossible to use arenas not created with moz_create_arena
with moz_arena_* functions: bug 1402284.
Until it's proven to require a different data structure, arena lookup by
Id is done through the use of the same RB tree used everywhere else in
the allocator.
At this stage, the implementation of the API doesn't ride the trains,
but the API can be used safely and will fall back to normal allocations
transparently for the caller.
--HG--
extra : rebase_source : aaa9bdab5b4e0c534da0c9c7a299028fc8d66dc8
Things left for followups:
- Full cleanup of disposed arenas: bug 1364359.
- Random arena Ids: bug 1402282.
- Enforcing the arena to match on moz_arena_{realloc,free}: bug 1402283.
- Make it impossible to use arenas not created with moz_create_arena
with moz_arena_* functions: bug 1402284.
Until it's proven to require a different data structure, arena lookup by
Id is done through the use of the same RB tree used everywhere else in
the allocator.
At this stage, the implementation of the API doesn't ride the trains,
but the API can be used safely and will fall back to normal allocations
transparently for the caller.
--HG--
extra : rebase_source : 089e4cbb62c239713f40763ab819c79e5cbe28ce
The implementation is not doing anything just yet. This will be done in
a followup bug.
--HG--
extra : rebase_source : e301eac77c6bd8247c09d369074ecb8d7b5a1a2f
malloc, free, calloc, realloc and memalign constitute some sort of
minimal interface to the allocator. posix_memalign, aligned_alloc and
valloc are already defined in terms of memalign. The remaining functions
are not related to active allocation.
--HG--
extra : rebase_source : ee27ca70e271f3abef76c7782724d607b52f58b1
This effectively means malloc_hook_table_t is now C++ only, which is not
a big problem.
This also makes some functions use a return construct with functions
that don't return a value (such as free). While that is not allowed in
ISO C, it's allowed in C++, so the simplification is welcome (although,
retrospectively, it turns out C compilers don't complain about it
without -pedantic).
--HG--
extra : rebase_source : defd88ca3f6d478e61a4b970393dba60fb6ca81d
This was done in bug 736564 for the xulrunner SDK, which later became
the firefox SDK, which is now gone. So we don't actually need to keep it
separate anymore (except for logalloc/replay, which still needs to link
it directly, so we keep the library definition intact so it can be
referenced ; we just don't DIST_INSTALL it anymore, and always make it
linked into mozglue)
--HG--
extra : rebase_source : e4d0627ec907fe0139df5c0b2b9f7d04b43c7c78
In bug 1361258, we unified the initialization sequence on mac, and
chose to make the zone registration happen after jemalloc
initialization.
The order between jemalloc init and zone registration shouldn't actually
matter, because jemalloc initializes the first time the allocator is
actually used.
On the other hand, in some build setups (e.g. with light optimization),
the initialization of the thread_arena thread local variable can happen
after the forced jemalloc initialization because of the order the
corresponding static initializers run. In some levels of optimization,
the thread_arena initializer resets the value the jemalloc
initialization has set, which subsequently makes choose_arena() return
a bogus value (or hit an assertion in ThreadLocal.h on debug builds).
So instead of initializing jemalloc from a static initializer, which
then registers the zone, we instead register the zone and let jemalloc
initialize itself when used, which increases the chances of the
thread_arena initializer running first.
--HG--
extra : rebase_source : 4d9a5340d097ac8528dc4aaaf0c05bbef40b59bb
isalloc_validate is the function behind malloc_usable_size. If for some
reason malloc_usable_size is called before mozjemalloc is initialized,
this can lead to an unexpected crash.
The chance of this actually happening is rather slim on Linux
and Windows (although still possible), and impossible on Mac, due to the
fact the earlier something can end up calling it is after the
mozjemalloc zone is registered, which happens after initialization.
... except with bug 1399921, which reorders that initialization, and
puts the zone registration first. There's then a slim chance for the
zone allocator to call into zone_size, which calls malloc_usable_size,
to determine whether a pointer allocated by some other zone belongs to
mozjemalloc's.
And it turns out that does happen, during the startup of the
plugin-container process on OSX 10.10 (but not more recent versions).
--HG--
extra : rebase_source : 331d093b03add7b2c2ce440593f5aeccaaf4dd1f
Now that this is a C++ file, and that the function names are not
mangled, we can just use the actual C++ names.
We do however need to replace MOZ_MEMORY_API, which implies extern "C",
with MFBT_API.
Also use the correct type for the size given to operator new. It
happened to work before because the generated code would just jump to
malloc without touching any register, but on aarch64, unsigned int was
the wrong type.
--HG--
extra : rebase_source : 8045f30e9c609dd7d922c77d85ac017638df6961
And since the build system doesn't handle transitions from foo.c to
foo.cpp properly without a clobber, we work around the issue by
switching to unified sources.
--HG--
rename : memory/build/mozmemory_wrap.c => memory/build/mozmemory_wrap.cpp
extra : rebase_source : 3f074b4ccab255bb0eb16841f79582060fafbc86
It happens to work because of mozglue.def, but we might as well have the
right annotations (which will also make things correct when building this
file to C++)
--HG--
extra : rebase_source : 61056dc21c9c29bab62ad5d648e94dd56dc53b14
This used to be necessary because those functions might be prefixed with
__wrap_, and linked against with -Wl,--wrap, but that's not been the case
since bug 1077366. Furthermore, mozmem_malloc_impl nowadays only does
something on Windows, and those operators are only exposed on Android.
--HG--
extra : rebase_source : ca34442bfbc5fc8be20ffcfacb9afa0f2f818b82
In bug 1361258, we unified the initialization sequence on mac, and
chose to make the zone registration happen after jemalloc
initialization.
The order between jemalloc init and zone registration shouldn't actually
matter, because jemalloc initializes the first time the allocator is
actually used.
On the other hand, in some build setups (e.g. with light optimization),
the initialization of the thread_arena thread local variable can happen
after the forced jemalloc initialization because of the order the
corresponding static initializers run. In some levels of optimization,
the thread_arena initializer resets the value the jemalloc
initialization has set, which subsequently makes choose_arena() return
a bogus value (or hit an assertion in ThreadLocal.h on debug builds).
So instead of initializing jemalloc from a static initializer, which
then registers the zone, we instead register the zone and let jemalloc
initialize itself when used, which increases the chances of the
thread_arena initializer running first.
--HG--
extra : rebase_source : 4d9a5340d097ac8528dc4aaaf0c05bbef40b59bb
There is a lot of churn involved in adding new API surface to
mozjemalloc, and mozmemory.h is one. Instead of declaring
everything manually in there, "generate" the declarations through
malloc_decls.h.
--HG--
extra : rebase_source : 1416fa972319c419112c4a8b16759d90692db5b2
The bin-unused count in memory reports indicates how much memory is
used by runs of small and sub-page allocations that is not actually
allocated. This is generally thought as an indicator of fragmentation.
While this is generally true, with the use of thread local arenas by
stylo, combined with how stylo allocates memory, it ends up also being
an indicator of wasted memory.
For instance, over the lifetime of an AWSY iteration, there are only a
few allocations that ends up in the bucket for 2048 allocated bytes. In
the "worst" case, there's only one. But the run size for such
allocations is 132KiB. Which means just because we're allocating one
buffer of size between 1024 and 2048 bytes, we end up wasting 130+KiB.
Per thread.
Something similar happens with size classes of 512 and 1024, where the
run size is respectively 32KiB and 64KiB, and where there's at most a
handful of allocations of each class ever happening per thread.
Overall, an allocation log from a full AWSY iteration reveals that there
are only 448 of 860700 allocations happening on the stylo arenas that
involve sizes above (and excluding) 512 bytes, so 0.05%.
While there are improvements that can be done to mozjemalloc so that it
doesn't waste more than one page per sub-page size class, they are
changes that are too low-level to land at this time of the release
cycle. However, considering the numbers above and the fact that the
stylo arenas are only really meant to avoid lock contention during the
heavy parallel work involved, a short term, low risk, strategy is to
just delegate all sub-page (> 512, < 4096) and large (>= 4096) to the
main arena. Technically speaking, only sub-page allocations are causing
this waste, but it's more consistent to just delegate everything above
512 bytes.
This should save 132KiB + 64KiB = 196KiB per stylo thread.
--HG--
extra : rebase_source : c7233d60305365e76aa124045b1c9492068d9415
Until bug 1361258, there was only ever one mozjemalloc arena, and the
number of dirty pages we allow to be kept dirty, fixed to 1MB per arena,
was, in fact, 1MB for an entire process.
With stylo using thread local arenas, we now can have multiple arenas
per process, multiplying that number of dirty pages.
While those dirty pages may be reused later on, when other allocations
end up filling them later on, the fact that a relatively large number of
them is kept around for each stylo thread (in proportion to the amount of
memory ever allocated by stylo), combined with the fact that the memory
use from stylo depends on the workload generated by the pages being
visited, those dirty pages may very well not be used for a rather long
time. This is less of a problem with the main arena, used for most
everything else.
So, for each arena except the main one, we decrease the number of dirty
pages we allow to be kept around to 1/8 of the current value. We do this
by introducing a per-arena configuration of that maximum number.
--HG--
extra : rebase_source : 75eebb175b3746d5ca1c371606cface50ec70f2f
Those macros are one more thing that needs to be added when the
mozjemalloc API surface is increased, but after bug 1399350, nothing
actually needs them, so remove them.
--HG--
extra : rebase_source : 2bf62cc6c179540482722a72b0d0c134d2ac2a19
The files relevant to the memory allocator are currently spread between
memory/mozjemalloc and memory/build, and the distinction was
historically from sharing some Mozilla-specific things between
mozjemalloc and jemalloc3. That distinction is not useful anymore, so
we fold everything together.
As we will likely rename the allocator at some point in the future, it
is preferable to move away from the mozjemalloc directory rather than in
its direction.
--HG--
rename : memory/mozjemalloc/Makefile.in => memory/build/Makefile.in
rename : memory/mozjemalloc/mozjemalloc.cpp => memory/build/mozjemalloc.cpp
rename : memory/mozjemalloc/mozjemalloc.h => memory/build/mozjemalloc.h
rename : memory/mozjemalloc/mozjemalloc_types.h => memory/build/mozjemalloc_types.h
rename : memory/mozjemalloc/rb.h => memory/build/rb.h
clang warns about this code in mozmemory_wrap.c in the reimplementation
of vasprintf, complaining that `fmt` cannot be null:
if (str == NULL || fmt == NULL) {
^~~ ~~~~
clang is apparently exploiting knowledge about the requirements of
vasprintf here, but defensive programming on the part of our
reimplementation seems like the wiser course. Let's turn off the
warning.
Some system libraries call malloc_zone_free directly instead of free,
and sometimes they do that with the wrong zone. When that happens, we
circle back, trying to find the right zone, and call malloc_zone_free
with the right one, but when we can't find one, we crash, which matches
what the system free() would do. Except in one case where the pointer
we're being passed is NULL, in which case we can't trace it back to any
zone, but shouldn't crash (system free() explicitly doesn't crash in
that case).
--HG--
extra : rebase_source : 17efdcd80f1a53be7ab6b7293bfb6060a9aa4a48
Because malloc_decls.h is meant to be included multiple times, it
shouldn't actually declare things itself.
--HG--
extra : rebase_source : 9d6f9b2c61407265377845963a19ace2614160f4
jemalloc_ptr_info() gives info about any pointer, such as whether it's within a
live or free allocation, and if so, info about that allocation. It's useful for
debugging.
moz_malloc_enclosing_size_of() uses jemalloc_ptr_info() to measure the size of
an allocation from an interior pointer. It's useful for memory reporting,
especially for Rust code.
--HG--
extra : rebase_source : caa19cccf8c2d1f79cf004fe6a408775de5a7b22
Back when it was added (for Windows CE, in bug 488608), mozjemalloc was
C and all the supported compilers didn't support C99 bools. Now
mozjemalloc is C++, and all the supported compilers support C99 bools
for the cases where the type is used from C.
--HG--
extra : rebase_source : b9c710a0c48dc36cb473af59e3119131d13523ce
Back when mozjemalloc was considered third-party, and before bug
1365194, mozjemalloc was calling abort() and that was redirectory to
MOZ_CRASH through a moz_abort() function. Bug 1365194 changed that so
that moz_abort is called directly instead of abort, but the indirection
is actually not necessary anymore.
So we just kill moz_abort, which is unused anywhere else, and use
MOZ_CRASH directly.
Note this will (obviously) change crash signatures involving moz_abort.
--HG--
extra : rebase_source : 67698ffd8c5e52e62b9a0b7f28efb0352c8fe8ce
Bug 1186064 removed most of it when we started requiring VS 2015u2, but
the "frex" function exported through mozglue.def.in was only used
through the MSVCRT being patched by fixcrt.py, which is not done anymore.
So the "frex" export is not used anymore, and so the "dumb_free_thunk"
function is not used anymore as well.
--HG--
extra : rebase_source : 879c469c317c8b6749410a4a476d6c951c9a1d0f
Going through the system zone allocator for every call to realloc/free
on OSX is costly, because the zone allocator needs to first verify that
the allocations do belong to the allocator it invokes (which ends up
calling jemalloc's malloc_usable_size), which is unnecessary when we
expect the allocations to belong to jemalloc.
So, we export the malloc/realloc/free/etc. symbols from
libmozglue.dylib, such that libraries and programs linked against it
call directly into jemalloc instead of going through the system zone
allocator, effectively shortcutting the allocator verification.
The risk is that some things in Gecko try to realloc/free pointers it
got from system libraries, if those were allocated with a system zone
that is not jemalloc.
--HG--
extra : rebase_source : ee0b29e1275176f52e64f4648dfa7ce25d61292e
Going through the system zone allocator for every call to realloc/free
on OSX is costly, because the zone allocator needs to first verify that
the allocations do belong to the allocator it invokes (which ends up
calling jemalloc's malloc_usable_size), which is unnecessary when we
expect the allocations to belong to jemalloc.
So, we export the malloc/realloc/free/etc. symbols from
libmozglue.dylib, such that libraries and programs linked against it
calls directly into jemalloc instead of going through the system zone
allocator, effectively shortcutting the allocator verification.
The risk is that some things in Gecko try to realloc/free pointers it
got from system libraries, if those were allocated with a system zone
that is not jemalloc.
--HG--
extra : rebase_source : 45b9b98499760a7f946878d41d2fdaadb6dff4d6
Replace-malloc libraries, such as DMD, don't really need to care about
the details of implementing all the variants of aligned memory
allocation functions. Currently, by defining MOZ_REPLACE_ONLY_MEMALIGN
before including replace_malloc.h, they get predefined functions.
Instead of making that an opt-in at build time, we make the
replace-malloc initialization just fill the replace-malloc
malloc_table_t with implementations that rely on the replace_memalign
the library provides.
--HG--
extra : rebase_source : 0842a67d9bc27a9a86c33d14d98b9c25f39982fb
Until now, the malloc implementation functions would call the
replace-malloc functions if they exist, and fallback to the real
allocator in no such function exists. Instead of doing this, we now
fill the empty slots in the malloc_table_t with the real allocator
functions.
--HG--
extra : rebase_source : b54634f23188906939e4dc01fc5a3007de0f3f2c
We make replace_malloc_init_funcs called on all platforms and fill out a
malloc_table_t for the replace-malloc functions with what comes from
dlsym/GetProcAddress on Android/Windows, and from the dynamically linked
weak symbols replace_* on other platforms.
replace_malloc.h contains definitions of *_impl_t types for each of the
functions in the malloc_table_t, which is redundant with the
replace_*_impl_t types we were creating, so we remove those typedefs,
except for the two functions (init and get_bridge) that don't have such
a typedef. Those functions don't appear in malloc_table_t.
--HG--
extra : rebase_source : 3705a99ee07f63dbaa66973eef19ddab224e0911
We want, in a subsequent patch, to have replace_malloc_init_funcs be
called on all platforms (including those relying on the replace-malloc
library being loaded already) and perform more initialization.
To prepare for that, we move the non-platform-specific pieces out.
--HG--
extra : rebase_source : 239ed363ee168bf4f8a96e0a1ca52981cb941b71
All the _impl functions in replace-malloc.c are largely identical. This
replaces all of them with macro expansions.
--HG--
extra : rebase_source : 67a1809b0b0fc4645ea5041154fa3a6dcb6cce6b
This makes no significant difference in practice in the macro
expansions, but will help down the line.
--HG--
extra : rebase_source : 6d61c1f28c558321478d7e5f26390d27ae8ae3ac
MOZ_REPLACE_JEMALLOC was only defined when building jemalloc4 as a
replace-malloc library.
--HG--
extra : rebase_source : fa5c402da07fa96448c170b6db99629469691efe
This avoids many additions of `extern "C"` in C++ code and will avoid
having to do the same to mozjemalloc once built as C++.
--HG--
extra : rebase_source : af55696262f40a9dd16a19c29edcb9bb307d4957
This avoids many additions of `extern "C"` in C++ code and will avoid
having to do the same to mozjemalloc once built as C++.
--HG--
extra : rebase_source : af55696262f40a9dd16a19c29edcb9bb307d4957
NO_TLS used to be hardcoded on mac because up to 10.6, __thread was not
supported. Until recently, we still supported for 10.6, and it's not the
case anymore, so we could make mac builds use __thread.
Unfortunately, on OSX, __thread circles back calling malloc to allocate
storage on first access, so we have an infinite loop problem here.
Fortunately, pthread_keys don't have this property, so we can use that
instead. It doesn't appear to have significantly more overhead (and TLS
overhead is small anyways compared to the amount of work involved in
allocating memory with mozjemalloc).
At the same time, we uniformize the initialization sequence between
mozjemalloc and mozjemalloc+replace-malloc, such that we have less
occasions for surprises when riding the trains (replace-malloc being
nightly only), ensuring the zone registration happens at the end of
mozjemalloc's initialization.
Adapted from
4e2e3dd9cf
and
d9f7b2a430
As per the latter commit, it would seem unlocking, in fork() child
processes, mutexes that were locked in the parent process is not really
well supported on OSX 10.12. The addition of the zone_reinit_lock
function in 10.12 supports this idea.
--HG--
extra : rebase_source : b3b58558cc195d63200078085c7e9b6c9b8d83ff
Some system libraries are using malloc_default_zone() and then using
some of the malloc_zone_* API. Under normal conditions, those functions
check the malloc_zone_t/malloc_introspection_t struct for the values
that are allowed to be NULL, so that a NULL deref doesn't happen.
As of OSX 10.12, malloc_default_zone() doesn't return the actual default
zone anymore, but returns a fake, wrapper zone. The wrapper zone defines
all the possible functions in the malloc_zone_t/malloc_introspection_t
struct (almost), and calls the function from the registered default zone
(jemalloc in our case) on its own. Without checking whether the pointers
are NULL.
This means that a system library that calls e.g.
malloc_zone_batch_malloc(malloc_default_zone(), ...) ends up trying to
call jemalloc_zone.batch_malloc, which is NULL, and crash follows.
So as of OSX 10.12, the default zone is required to have all the
functions available (really, the same as the wrapper zone), even if they
do nothing.
This is arguably a bug in libsystem_malloc in OSX 10.12, but jemalloc
still needs to work in that case.
[Adapted from
c6943acb3c]
--HG--
extra : rebase_source : 7d7a5b47fa18f56183e99c3655aee003c9be161e
The SDK jemalloc is built against might be not be the latest for various
reasons, but the resulting binary ought to work on newer versions of
OSX.
In order to ensure this, we need the fullest definitions possible, so
copy what we need from the latest version of malloc/malloc.h available
on opensource.apple.com.
[Adapted from
c68bb41793]
--HG--
extra : rebase_source : ab19c478b568ea24095a3be62c39fb81efc1920a
We have been using a different zone allocator between mozjemalloc and
replace-malloc for a long time. Jemalloc 4 uses the same as
replace-malloc, albeit as part of the jemalloc upstream code base.
We've been bitten many times in the past with Apple changes breaking the
zone allocator, and each time we've had to make changes to the three
instances, although two of them are similar and the changes there are
straightforward.
It also turns out that the way the mozjemalloc zone allocator is set up,
when a new version of OSX appears with a new version of the system zone
allocator, Firefox ends up using the system allocator, because the zone
allocator version is not supported.
So, we use the same zone allocator for both replace-malloc and
mozjemalloc, making everything on par with jemalloc 4.
--HG--
extra : rebase_source : 9c0e245b5f82bb71294370d607e690c05cc89fbc
The intent here is to reuse the zone allocator for mozjemalloc, to avoid
all the shortcomings of mozjemalloc using a different one. This change
only moves the replace-malloc zone allocator out of replace-malloc.c, to
make changes for mozjemalloc integration clearer.
--HG--
rename : memory/build/replace_malloc.c => memory/build/zone.c
extra : rebase_source : 8b98efaa4a88862f2967c855b511e92beb9c4031
Somehow, we never called those hooks when replace-malloc is enabled. I'd
expect this to cause random deadlocks when forking, and I'm surprised
this hasn't surfaced. Maybe it actually causes some intermittent oranges
on automation, who knows.
This also brings consistency with what is done for jemalloc 4, and with
the mozjemalloc implementation, too, that we're going to replace with
this one in a subsequent changeset.
--HG--
extra : rebase_source : 059567d17f928098db8367e9081b631ced351110
This removes the unnecessary setting of c-basic-offset from all
python-mode files.
This was automatically generated using
perl -pi -e 's/; *c-basic-offset: *[0-9]+//'
... on the affected files.
The bulk of these files are moz.build files but there a few others as
well.
MozReview-Commit-ID: 2pPf3DEiZqx
--HG--
extra : rebase_source : 0a7dcac80b924174a2c429b093791148ea6ac204
Jemalloc 4 purges dirty pages regularly during free() when the ratio of dirty
pages compared to active pages is higher than 1 << lg_dirty_mult. We set
lg_dirty_mult in jemalloc_config to limit RSS usage, but it also has an impact
on performance.
So instead of enforcing a high ratio to force more pages being purged, we keep
jemalloc's default ratio of 8, and force a regular purge of all dirty pages,
after cycle collection.
Keeping jemalloc's default ratio avoids cycle-collection-triggered purge to
have to go through really all dirty pages when there are a lot, in which case
the normal jemalloc purge during free() will already have kicked in. It also
takes care of everything that doesn't run the cycle collector still having
a level of purge, like plugins in the plugin-container.
At the same time, since jemalloc_purge_freed_pages does nothing with jemalloc 4,
repurpose the MEMORY_FREE_PURGED_PAGES_MS telemetry probe to track the time
spent in this cycle-collector-triggered purge.
The patch removes 455 occurrences of FAIL_ON_WARNINGS from moz.build files, and
adds 78 instances of ALLOW_COMPILER_WARNINGS. About half of those 78 are in
code we control and which should be removable with a little effort.
--HG--
extra : rebase_source : 82e3387abfbd5f1471e953961d301d3d97ed2973
The bulk of this commit was generated by running:
run-clang-tidy.py \
-checks='-*,llvm-namespace-comment' \
-header-filter=^/.../mozilla-central/.* \
-fix