This also creates a new arena_t::Ralloc function replacing parts of
iralloc, the other part being inlined in the unique caller.
--HG--
extra : rebase_source : 76a9ca77e651c99641d8906faea8e469d8eea041
Since the old size arena_ralloc is given is already a size class, when
the size class for the new size is the same as the old size, the new
size can't be larger than the old size, so there's never anything to
zero.
--HG--
extra : rebase_source : dd468de60b2ed87718ec4e26f13d3b0c8932f455
We intend to move some functions to methods of the arena_t class. Moving
the arena selection out of them is the first step towards that.
--HG--
extra : rebase_source : b8380c3a0c90ed817a1dbbe8d60e6107b78ec677
The immediate goal for this is to allow determinism in an arena used for
an upcoming test, by essentially disabling purge on that specific arena.
We do that by allowing arenas to be created with a specific setting for
mMaxDirty.
Incidentally, this allows to cleanup the mMaxDirty initialization for
thread-local arenas.
Longer term, this would allow to tweak arenas with more parameters, on
a per arena basis.
--HG--
extra : rebase_source : e4b844185d132aca9ee10224fc626f8293be0a34
Some unit tests rely on jemalloc_stats to get information such as chunk
size or page size. They can do so before any allocation happens, when
using gtest filters. So it is preferable for jemalloc_stats to
initialize the allocator.
--HG--
extra : rebase_source : 6696ec1cdaa3b121a3d12cb7b6049b79c656d271
We currently turn off the C++14 sized-deallocation facility on MSVC, and
we'd like to ensure we do the same thing for clang and gcc. To do so,
we add new functionality to moz.configure for checking and adding
compilation flags, similar to the facility for checking and adding
warning flags. The newly added facility is then used to add
-fno-sized-deallocation to the compilation flags, when the option is
supported.
Once we do this, we can't define the sized deallocation functions in
mozalloc.h; the compiler will complain that we are using
-fno-sized-deallocation, yet defining these special functions that we'll
never use. These functions were added for MinGW, where we needed to
compile with C++14 ahead of other platforms to be compatible with MSVC
headers. But they're no longer necessary, though they would be if we
removed -fno-sized-deallocation; the compiler will complain if we do
that and we'll add them back at that point.
SRWLock is more lightweight than CriticalSection, but is only available
on Windows Vista and more. So until we actually dropped support Windows
XP, we had to use CriticalSection.
Now that all supported Windows versions do have SRWLock, this is a
switch we can make, and not only because SRWLock is more lightweight,
but because it can be statically initialized like on other platforms,
allowing to use the same initialization code as on other platforms,
and removing the requirement for a DllMain, which in turn can allow
to statically link mozjemalloc in some cases, instead of requiring a
shared library (DllMain only works on shared libraries), or manually
call the initialization function soon enough.
There is a downside, though: SRWLock, as opposed to CriticalSection, is
not fair, meaning it can have thread scheduling implications, and can
theoretically increase latency on some threads. However, it is the
default used by Rust Mutex, meaning it's at least good enough there.
Let's see how things go with this.
--HG--
extra : rebase_source : 337dc4e245e461fd0ea23a2b6b53981346a545c6
Currently, huge allocations are completely independent from arenas. But
in order to ensure that e.g. moz_arena_realloc can't reallocate huge
allocations from another arena, we need to track which arena was
responsible for the huge allocation. We do that in the corresponding
extent_node_t.
Both functions do essentially the same thing, one having more validation
than the other. We can use a template with a boolean parameter to avoid
the duplication.
Furthermore, we're soon going to require, in some cases, more
information than just the size of the allocation, so we wrap their
result in a helper class that gives information about an active
allocation.
FloorLog2 expands to, essentially, a compiler builtin/intrinsic, that,
in turn, expands to a single machine instruction on tier 1 and other
platforms. On platforms where that's not the case, we can expect the
compiler to generate fast code anyways. So overall, this is all better
than manually using a log2 lookup table.
Also replace a manual power-of-two check with mozilla::IsPowerOfTwo,
which does the same test.
--HG--
extra : rebase_source : e8164c254723c74ef83e798073327ec6afa6f1fb
They are both only used once, are trivial wrappers, and even repeat the
same assertions.
--HG--
extra : rebase_source : b40b26e303cb69a451e63937efd8a666053954e5
There are multiple flaws to the current code:
- The loop calculating the right parameters for a given run size is
repeated.
- The loop trying different run sizes doesn't actually work to fulfil
the overhead constraint: while it stops when the constraint is
fulfilled, the values that are kept are those from the previous
iteration, which may well be well over the constraint.
In practice, the latter resulted in a few surprising results:
- most size classes had an overhead slightly over the constraint
(1.562%), which, while not terribly bad, doesn't match the set
expectations.
- some size classes ended up with relatively good overheads only because
of the additional constraint that run sizes had to be larger than the
run size of smaller size classes. Without this constraint, some size
classes would end up with overheads well over 2% just because that
happens to be the last overhead value before reaching below the 1.5%
constraint.
Furthermore, for higher-level fragmentation concerns, smaller run sizes
are better than larger run sizes, and in many cases, smaller run sizes
can yield the same (or even sometimes, better) overhead as larger run
sizes. For example, the current code choses 8KiB for runs of size 112,
but using 4KiB runs would actually yield the same number of regions, and
the same overhead.
We thus change the calculation to:
- not force runs to be smaller than those of smaller classes.
- avoid the code repetition.
- actually enforce its overhead constraint, but make it 1.6%.
- for especially small size classes, relax the overhead constraint to
2.4%.
This leads to an uneven set of run sizes:
size class before after
4 4 KiB 4 KiB
8 4 KiB 4 KiB
16 4 KiB 4 KiB
32 4 KiB 4 KiB
48 4 KiB 4 KiB
64 4 KiB 4 KiB
80 4 KiB 4 KiB
96 4 KiB 4 KiB
112 8 KiB 4 KiB
128 8 KiB 8 KiB
144 8 KiB 4 KiB
160 8 KiB 8 KiB
176 8 KiB 4 KiB
192 12 KiB 4 KiB
208 12 KiB 8 KiB
224 12 KiB 4 KiB
240 12 KiB 4 KiB
256 16 KiB 16 KiB
272 16 KiB 4 KiB
288 16 KiB 4 KiB
304 16 KiB 12 KiB
320 20 KiB 12 KiB
336 20 KiB 4 KiB
352 20 KiB 8 KiB
368 20 KiB 4 KiB
384 24 KiB 8 KiB
400 24 KiB 20 KiB
416 24 KiB 16 KiB
432 24 KiB 12 KiB
448 28 KiB 4 KiB
464 28 KiB 16 KiB
480 28 KiB 8 KiB
496 28 KiB 20 KiB
512 32 KiB 32 KiB
1024 64 KiB 64 KiB
2048 132 KiB 128 KiB
* Note: before is before this change only, not before the set of changes
from this bug; before that, the run size for 96 could be 8 KiB in some
configurations.
In most cases, the overhead hasn't changed, with a few exceptions:
- Improvements:
size class before after
208 1.823% 0.977%
304 1.660% 1.042%
320 1.562% 1.042%
400 0.716% 0.391%
464 1.283% 0.879%
480 1.228% 0.391%
496 1.395% 0.703%
- Regressions:
352 0.312% 1.172%
416 0.130% 0.977%
2048 1.515% 1.562%
For the regressions, the values are either still well within the
constraint or very close to the previous value, that I don't feel like
it's worth trying to avoid them, with the risk of making things worse
for other size classes.
--HG--
extra : rebase_source : fdff18df8a0a35c24162313d4adb1a1c24fb6e82
On 64-bit platforms, sizeof(arena_run_t) includes a padding at the end
of the struct to align to 64-bit, since the last field, regs_mask, is
32-bit, and its offset can be a multiple of 64-bit depending on the
configuration. But we're doing size calculations for a dynamically-sized
regs_mask based on sizeof(arena_run_t), completely ignoring that
padding.
Instead, we use the offset of regs_mask as a base for the calculation.
Practically speaking, this doesn't change much with the current set of
values, but could affect the overheads when we squeeze run sizes more.
--HG--
extra : rebase_source : a3bdf10a507b81aa0b2b437031b884e18499dc8f
This makes the run header larger than necessary, which happens to make
the current arena_bin_run_calc_size pick 8KiB runs for size class 96
when MOZ_DIAGNOSTIC_ASSERT_ENABLED is set. This change makes it pick
4KiB runs, making MOZ_DIAGNOSTIC_ASSERT_ENABLED builds use the same set
of run sizes as non-MOZ_DIAGNOSTIC_ASSERT_ENABLED builds.
--HG--
extra : rebase_source : fd7ef2d58ec601186647799e9dcf8146e723241c
First and foremost, the code and corresponding comment weren't in
agreement on what's going on.
The code checks:
RUN_MAX_OVRHD * (bin->mSizeClass << 3) <= RUN_MAX_OVRHD_RELAX
which is equivalent to:
(bin->mSizeClass << 3) <= RUN_MAX_OVRHD_RELAX / RUN_MAX_OVRHD
replacing constants:
(bin->mSizeClass << 3) <= 0x1800 / 0x3d
The left hand side is just bin->mSizeClass * 8, and the right hand side
is about 100, so this can be roughly summarized as:
bin->mSizeClass <= 12
The comment says the overhead constraint is relaxed for runs with a
per-region overhead greater than RUN_MAX_OVRHD / (mSizeClass << (3+RUN_BFP)).
Which, on itself, doesn't make sense, because it translates to
61 / (mSizeClass * 32768), which, even for a size class of 1 would mean
less than 0.2%, and this value would be even smaller for bigger classes.
The comment would make more sense with RUN_MAX_OVRHD_RELAX, but would
still not match what the code was doing.
So we change how the relaxed rule works, as per the comment in the new
code, and make it happen after the standard run overhead constraint has
been checked.
--HG--
extra : rebase_source : cec35b5bfec416761fbfbcffdc2b39f0098af849
The description above the RUN_* constant definitions talks about binary
fixed point math, which is one way to look at the problem, but a clearer
one is to look at it as comparing ratios in a way that doesn't use
divisions.
So, starting from the current expression:
(try_reg0_offset << RUN_BFP) <= RUN_MAX_OVRHD * try_run_size
This can be rewritten as
try_reg0_offset * (1 << RUN_BFP) <= RUN_MAX_OVRHD * try_run_size
Dividing both sides with ((1 << RUN_BFP) * try_run_size), and
simplifying, gives us:
try_reg0_offset / try_run_size <= RUN_MAX_OVRHD / (1 << RUN_BFP)
Replacing the constants:
try_reg0_offset / try_run_size <= 0x3d / (1 << 12)
or
try_reg0_offset / try_run_size <= 61 / 4096
61 / 4096 is roughly 1.5%.
So what the check really intends to do is check that the overhead is
below 1.5%.
So we introduce a helper class and a user-defined literal that makes the
test more self-descriptive, while producing identical machine code.
This is a lot of code to add, but I think it's one of those cases where
abstraction can help make the code clearer.
--HG--
extra : rebase_source : 3d4a94f524a60e40ba75859c4f761f59d689e81a
This is, practically speaking, a no-op, and will hopefully help make the
following changes clearer.
--HG--
extra : rebase_source : b704bdf2ae46c2408e0061363822b9744ef449cb
QUANTUM_2POW_MIN is exactly 4, and we are unlikely to ever make it
smaller. Also turn a MOZ_ASSERT into a static_assert, because it only
uses constants, and will fail if QUANTUM_2POW_MIN is lowered without
touching size_invs.
--HG--
extra : rebase_source : 7c8ee3c0ea30a88bddba816c41c6f63914f7a03c
There is a set of "constants" that are actually globals that depend on
the page size that we get at runtime, when compiling without
MALLOC_STATIC_PAGESIZE, but that are actual constants when compiling
with it. Their value was unfortunately duplicated.
We setup a set of macros allowing to make the declarations unique.
--HG--
extra : rebase_source : 56557b7ba01ee60fe85f2cd3c2a0aa910c4c93c6
At the same time, add user-defined literals to make those constants more
legible.
--HG--
extra : rebase_source : ce143ad9d8a6603179042d8cf432f00c815156c5
At the moment, while they are used before their declaration, it's from a
macro. It is desirable to replace the macros with C++ constants, which
will require the structures being defined first.
--HG--
extra : rebase_source : 7a351dafea04a7d75b6eec50fa52fb49c135e569
We create a new helper class that rounds up allocations sizes and
categorizes them. Compilers are smart enough to elide what they don't
need, like in malloc_good_size, they elide the code related to the
class type enum.
--HG--
extra : rebase_source : 61381e600587b045e720a85a7b46673edeb691b9
Because of alignment issues due to the system glibc when running the
SSE2 gcov code generated during the PGO profile gen phase, Firefox
crashes when running the PGO profile. We work around the issue by
disabling SSE2 when building mozjemalloc during that phase. That
shouldn't affect the coverage data anyways, which is bound to the
original C++ code, and the profile-use code generation will still emit
SSE2 based on the coverage data if it needs to.
--HG--
extra : rebase_source : 3596fdc795cdef0789f3a2dd8f10b42cde00430f
We introduce the notion of private arenas, separate from other arenas
(main and thread-local). They are kept in a separate arena tree, and
arena lookups from moz_arena_* functions only access the tree of
private arenas. Iteration still goes through all arenas, private and
non-private.
--HG--
extra : rebase_source : 86c43c7c920b01eb6fa1fa214d612fd9220eac3e
We create the ArenaCollection class to handle operations on the
arena tree. Ideally, iter() would trigger locking, but the
prefork/postfork code complicates things, so we leave this for later.
--HG--
extra : rebase_source : bd7021098baf0ec01c14063294098edea4473d36
Note we use a local variable for fallible allocator because using plain
`new (fallible)` would require some figuring out for non-Firefox builds
(e.g. standalone js).
--HG--
extra : rebase_source : 2132f98ebc7e37a139b673f80631e672bcf8ed15
RedBlackTree::{Insert,Remove} allocate an object on the stack for its
RedBlackTreeNode, and that shouldn't have side effects if the type
happens to have a constructor. This will allow to add constructors to
some of the mozjemalloc types.
--HG--
extra : rebase_source : 14dbb7d73c86921701d83156186df5d645530dda
We introduce the notion of private arenas, separate from other arenas
(main and thread-local). They are kept in a separate arena tree, and
arena lookups from moz_arena_* functions only access the tree of
private arenas. Iteration still goes through all arenas, private and
non-private.
--HG--
extra : rebase_source : ec48631a4a65520892331c1fcd62db37ed35ba1d
We create the ArenaCollection class to handle operations on the
arena tree. Ideally, iter() would trigger locking, but the
prefork/postfork code complicates things, so we leave this for later.
--HG--
extra : rebase_source : 90c96575d65c920f75aa621ba119d354d1ce252a
Note we use the deprecated `new (fallible_t())` form because using `new
(fallible)` would require some figuring out for non-Firefox builds (e.g.
standalone js).
--HG--
extra : rebase_source : 0159d8476a1b5509330517c00af6c387d522722d
RedBlackTree::{Insert,Remove} allocate an object on the stack for its
RedBlackTreeNode, and that shouldn't have side effects if the type
happens to have a constructor. This will allow to add constructors to
some of the mozjemalloc types.
--HG--
extra : rebase_source : 14dbb7d73c86921701d83156186df5d645530dda
- In the cases where it's used on powers of 2, replace it with
FloorLog2() + 1.
- In the cases where it's used on any kind of number, replace it with
CountTrailingZeroes, which is `ffs(x) - 1`.
- In the case of tiny allocations in arena_t::MallocSmall, we rearrange
the code so that the intent is clearer, which also simplifies the
expression for the mBins offset: mBins[0] is the first tiny bucket,
for allocations of sizes 1 << TINY_MIN_2POW, mBins[1] for allocations
of size 1 << (TINY_MIN_2POW + 1), etc. up to small_min. So the offset
is really the log2 of the normalized size.
--HG--
extra : rebase_source : 954a655dcaa93857dc976078e133704bb141de0d
Comparing ffs(x) == ffs(y), when x and y are guaranteed to be powers of
2 (or 0, or 1), is the same as x == y.
--HG--
extra : rebase_source : d6cc3399d85fa9fda2559435e99adbfb82ac8da0
The sentinel was taking as much space as one element of the tree, while
only really used for its RedBlackTreeNode, wasting space.
This results in some decrease in struct sizes, for example on 64-bits
linux:
- arena_bin_t: 80 -> 56
- arena_t (excluding mBins): 224 -> 144
- arena_t + dynamic size of mBins: 3024 -> 2104
It also decreases the size of several globals:
- gChunksBySize, gChunksByAddress, huge: 64 -> 8
- gArenaTree: 312 -> 8
--HG--
extra : rebase_source : d5bb52f93e064ab4cca3fb07b2c5a77ce57fb7db
Interestingly, this turns single-instruction checks into
two-instructions checks (at least with GCC, from one cmpb to a movl
followed by a testl), but this is due to Atomic<bool> being actually
backed by a uint32_t, not by the use of atomics.
--HG--
extra : rebase_source : cfc0bec2113b44635120216b4abbbbbe9028b286
- First, MOZ_DIAGNOSTIC_ASSERT_ENABLED is always true when MOZ_DEBUG is
set, so don't check for MOZ_DEBUG.
- Second, all the magic number assertions should be
MOZ_DIAGNOSTIC_ASSERTs instead of MOZ_ASSERTs.
--HG--
extra : rebase_source : 5601cd13604e21c46a9f0ad8b0b4d6fc399b853e
Some need initialization to happen, some can be skipped when the
allocator was not initialized, and others should crash.
--HG--
extra : rebase_source : d6c2697ca27f6110fe52a067440a0583e0ed0ccd
Also rearrange some code accordingly, but don't fix indentation issues
just yet.
Also apply changes from the google-readability-braces-around-statements
check.
But don't apply the modernize-use-nullptr recommendation about
strerror_r because it's wrong (bug #1412214).
--HG--
extra : histedit_source : 2d61af7074fbdc5429902d9c095c69ea30261769
Backed out changeset f75f427fde1d
Backed out changeset 6278aa5fec1d
Backed out changeset eefc284bbf13
Backed out changeset e2b391ae4688
Backed out changeset 58070c2511c6
--HG--
extra : histedit_source : d14fa171a5cf4d9400cae7f94d5cc64a1e58b98d%2C856ad5b650074a1dcff2edb0b95adc20aaf38db3
Also rearrange some code accordingly, but don't fix indentation issues
just yet.
Also apply changes from the google-readability-braces-around-statements
check.
But don't apply the modernize-use-nullptr recommendation about
strerror_r because it's wrong (bug #1412214).
--HG--
extra : rebase_source : 28b07aca01fd0275127341233755852cccda22fe
This will make allocation operations return nullptr in the face of OOM,
allowing callers to either handle the allocation error or for the normal
OOM machinery, which also records the requested size, to kick in.
--HG--
extra : rebase_source : 723048645cb3f0db269c91f9d023bb06825a817b
On its own, this change is unnecessary. But it prepares for pages_commit
becoming fallible in next commit, allowing a possibly early return while
leaving metadata in a consistent state: runs are still available for
future use even if they couldn't be committed immediately.
--HG--
extra : rebase_source : 971e5c49882409c00ac61ec641d469dcc94e5cc2
Bug 1403843 made more things constant, but missed a few that don't
depend on the page size.
--HG--
extra : rebase_source : 036722744ff7054de9d081bde1f4c7b035fd9501
- Move variable declarations to their initialization.
- Remove gotos and use RAII.
--HG--
extra : rebase_source : 9d983452681edf63593d033727ba6faebe418afe
The chunk_recycle and chunk_record functions are never called with
different red-black trees than the globals, so just use them directly
instead of passing them as argument. The functions were already using
the associated global mutex anyways.
At the same time, rename them.
--HG--
extra : rebase_source : c45bb2e584c61b458eab4343562eb3a5a64543a3
Instead of calling it with a boolean indicating whether the call was for
base allocations or not, and return immediately if it was, avoid the
call altogether.
--HG--
extra : rebase_source : abb2a3d0eaefc16efd2e828f09a330ab2a3b8b1f
jemalloc_ptr_info takes an outparam, which makes it harder to use in a
debugger: you'd need to find some memory to use as outparam and pass
that in.
So for convenience, we add a non-exported symbol for use in debuggers,
which just returns a pointer to a static buffer for the result.
lldb:
(lldb) print *Debug::jemalloc_ptr_info($0)
(jemalloc_ptr_info_t) $1 = (tag = TagLiveSmall, addr=0x000000011841dd80, size = 160)
gdb:
(gdb) print *Debug::jemalloc_ptr_info($0)
$1 = {tag = TagLiveSmall, addr = 0x7f8e7ebd0dc0, size = 96}
windbg:
0:040> .call Debug::jemalloc_ptr_info(0x6187880)
Thread is set up for call, 'g' will execute.
WARNING: This can have serious side-effects,
including deadlocks and corruption of the debuggee.
0:040> g
.call returns:
struct jemalloc_ptr_info_t * 0x7501f3f4
+0x000 tag : 1 ( TagLiveSmall )
+0x004 addr : 0x06187880 Void
+0x008 size : 0x20
--HG--
extra : rebase_source : 09aedd48aabee3e273a17000a61b1d09cdd619b9
Bug 1378258 removed malloc_print_stats and bug 1379890 further removed
the subsequently unused arena stats. It turns out there are also some
huge stats that have been unused since bug 1378258, and that are still
there, so remove them.
--HG--
extra : rebase_source : ae71c7507143503dff8d2e517352a97eb53e4676
The way inlining is disabled in mozjemalloc is via a #define of "inline"
to nothing, which is a dubious way to do that. This makes the compiler
trigger warnings we -Werror on for some static functions. While there
are such functions in mozjemalloc.cpp that could be fixed by wrapping
them in the right #ifdefs, there are also others coming from headers,
and it's not something that can be fixed in a satisfactory way.
The right way to disable inlining is to pass the right compiler flags
for that. But inlining is the least of the problems to debug optimized
C++ code, so it feels like if debugging requires some optimization
tweaking, it should be done manually with compile flags when needed,
instead of fiddling with #defines to remove keywords.
--HG--
extra : rebase_source : 962c3409f86060c4d5ddf966778b58b64f89c31d
Bug 1403444 massively refactored the red-black tree code, with the
result of removing the warnings the old code was triggering. We can thus
remove the exceptions for those warnings now.
--HG--
extra : rebase_source : 76c7ce7a7282471399c7592601f6986bfb33b256
Ideally, we'd be reusing some Mutex class we have in Gecko, the base one
in mozglue/misc being the best candidate. However, the contraints in
mozjemalloc make that unconvenient:
- Can't have a constructor because malloc_init() would likely run before
it, and that would mean the mutexes would be re-initialized.
- Can't have a destructor because code will run after static
destructors, and some of that code likely will invoke the allocator,
and we can't have destructed mutexes by then.
- Can't use pthread_mutex on OSX because that loops back into the
allocator.
Accomodating the use of Gecko mutexes around those constraints would
mean much more code than just implementing a new mutex class, so the
latter is preferred.
--HG--
extra : rebase_source : d2e180a5007390c620aa6d7921340b9784c7699f
The malloc_spin_* functions have ended up being strictly identical to
the malloc_mutex_* functions, so use the latter instead of the former.
--HG--
extra : rebase_source : 746bdf57cb4a33fd65335174a748cb567630e05b
Now that the radix tree structure has a fixed size, we can just allocate
the chunk radix tree object statically.
--HG--
extra : rebase_source : 6a5f022d46da1b24401b197751e594903987b7f6
All the parameters of the radix tree (bits per level, height) are
derived from the aBits argument to ::Create in a straightforward way.
aBits itself is a constant at the call point, making them all constants,
so we can turn all of them as constants at compile time instead of
storing as data.
--HG--
extra : rebase_source : aa1be8e97ed4133d7fc106fb3ea678a759476bef
All levels except the first are using the same size, and in some cases,
even the first uses the same size. Only storing those two different
sizes allows to fix the class size, while not making the code
significantly more complex.
--HG--
extra : rebase_source : 8028c18de2fa84060c5baff7c95cd0a70e7a3c6b
The tree height was defined as:
height = aBits / bits_per_level;
if (height * bits_per_level != aBits) {
height++;
}
What's wanted here is a height that covers all the bits, where the first
level might cover less than bits_per_level.
So aBits / bits_per_level gets us the height covered by levels with
exactly bits_per_level bits. The tree height is one more when there
are remaining bits.
Put differently, we can write aBits as:
aBits = bits_per_level * x + y
with y < bits_per_level.
We have:
aBits / bits_per_level = x.
height = x when y = 0, and x + 1 when y > 0.
We're looking for a number z such that
height = (aBits + z) / bits_per_level.
Or:
height = (bits_per_level * x + y + z) / bits_per_level.
= x + (y + z) / bits_per_level.
So we're looking for a z such that
(y + z) / bits_per_level = 0 when y = 0
= 1 when y > 0
The properties of the integer division are such that the above means:
0 <= y + z < bits_per_level when y = 0
bits_per_level <= y + z < 2 * bits_per_level when y > 0
Which gives us:
0 <= z < bits_per_level
bits_per_level - y <= z < 2 * bits_per_level - y when y > 0
y being < bit_per_level per the constraint further above,
2 * bits_per_level - y > bits_per_level.
So all in all, we want a z such that
bits_per_level - y <= z < bits_per_level with 0 < y < bits_per_level
The largest value where this is true is z = bits_per_level - 1.
In summary,
height = (aBits + bits_per_level - 1) / bits_per_level
is the same as the height as originally defined.
With that formula, it's self evident that height * bits_per_level is
always >= aBits, so we remove the assertion.
--HG--
extra : rebase_source : 8ca2e5fbad7d4ad537f26508af5aa250483f1f08
bits_per_level was defined as:
ffs(pow2_ceil((kNodeSize / sizeof(void*)))) - 1
kNodeSize is (1U << 14) when SIZEOF_PTR is 4 (sizeof(void*) being the
same). Otherwise, it's CACHELINE, which is (1U << 6).
The most important part, though, is that it's always a power of 2.
And it's divided by sizeof(void*) which is always a power or 2.
The result of that division is thus always a power of 2, as long as
kNodeSize is larger than the size of a pointer, which it is.
The argument to pow2_ceil being a power of 2, pow2_ceil is a noop,
so it can go away. And the argument to ffs being a power of 2, it
returns one more than n that matches 1 << n == value. So overall
the expression returns the number of shifts for
kNodeSize / SIZEOF_PTR.
Transforming kNodeSize to a number of shifts/power of 2, the expression
can then be simplified as kNodeSize2Pow - SIZEOF_PTR_2POW.
--HG--
extra : rebase_source : a22a378ba6622e2a4fbcf28811c7042cea9da24a
The only semantic change is in the value returned by Set, which now
returns whether the value could be set or not.
--HG--
extra : rebase_source : a80f5d6fdb3672715887e69215f55df0cedb231e
There is a lot of redundancy between malloc_rtree_get and
malloc_rtree_set. Essentially, they both look up a slot, and either get
a value or set a value in that slot. malloc_rtree_get doesn't create a
tree path for the slot when it doesn't exist. And the
MALLOC_RTREE_GET_GENERATE macro machinery makes malloc_rtree_get retry
with a lock and validate both results agree in debug builds.
By introducing a malloc_rtree_get_slot function that returns a slot,
optionally creating a tree path to it, we remove the redundancy between
_get and _set, and we can avoid the macro machinery as well.
--HG--
extra : rebase_source : bbbdd33e81e8bfdc11c028f882ab877bba26f7f3
It seemingly hasn't been needed since Mac OS 10.7. A diagnostic assertion that
has been in place for a while hasn't caught any uses of it.
--HG--
extra : rebase_source : 9834849eec9174267c7df8de7fd22840ffa36d8f
Bug 571209 made many different kinds of sizes static at build time, as
opposed to configurable at run-time. While the dynamic sizes can be
useful to quickly test tweaks to e.g. quantum sizes, a
replace-malloc-built allocator could just as well do the same. This
need, however, is very rare, and doesn't justify keeping the sizes
dynamic on platforms where static sizes can't be used for the page size
because page size may vary depending on kernel options.
So we make every size that doesn't depend on the page size static,
whether MALLOC_STATIC_SIZES is enabled or not.
This makes no practical difference on tier-1 platforms, except Android
aarch64, which will benefit from more static sizes.
--HG--
extra : rebase_source : 28243a67e4fe41154c23dc39b45405479854d31d
This was done in bug 1104634 because back then the Android NDK had a
broken combination of compiler and libc, where the compiler would emit
calls to the ffs function, but the libc wouldn't contain them, but only
when building without optimization.
Things have changed in the meanwhile, and recent NDK doesn't have this
problem. So we can remove the hack.
--HG--
extra : rebase_source : 22d6c279a60d0d23161ca1addd5b5e9a3411d8ab
Bug 1402174 made all arenas registered in a Red-Black tree. Which means
they are iterable through that tree, making the arenas list now redundant.
The list is also inconvenient, since it needs to be constantly
reallocated, and the allocator in charge of the list doesn't know how to
free things.
Iteration of arenas is not on any hot path anyways, so even though
iterating the RB tree is slower, it doesn't matter.
So we remove the arenas list, and keep a direct pointer to the main
arena for convenience (instead of calling First() on the RB tree every
time)
--HG--
extra : rebase_source : 31f12b2de18a886eb4f8f078e11040aad3fdc800
As we're going to enable stylo on Android at some point, we'll have to
have thread local arenas there, which means Android needs to be using
thread local storage. Since Android is the last use of NO_TLS in the
allocator code base, remove it.
--HG--
extra : rebase_source : 658cbc94b4478950f683bd104b7e5da27cd08a2e