If the operating system will allocate private pages on demand for the
pagemap then use the FlatPageMap by default as it generates better code
for deallocation.
This commit changes the strategy for finding a free list from
a stack to a queue. This tends to avoid the slow path considerably more.
It has some memory overheads.
TOOD: We should move the bump allocation data out of the metaslab and
into the allocator. At the moment, the slab contains the bump allocation
data, we should move this into the allocator, as it only ever has one slab
it is bump allocating from per sizeclass.
This is needed because in some configurations the constructor for the
global placeholder is not called before the first allocation (i.e. when
other globals call the allocator in their constructor) and so we ended
up following a null pointer.
Most compilers are happy if you say always-inline but they can't. GCC
will complain. Here, we have two mutually recursive functions that are
marked as always inline. In an optimised build, one is inlined into the
other and then becomes a tail-recursive function that should inline the
tail call. Inlining the tail call can be done by simply jumping to the
start of the function and so everything is fine. In a debug build, the
second transform doesn't happen and so we're left with a call to an
always-inline function.
We were passing an argument less than 4K to the MAP_ALIGNED macro, which
caused an undefined shift. The compiler helpfully propagated the undef
values back to earlier in the code and gave us some exciting nonsense.
By caching the result of the first call to ThreadAlloc::get(), we were
always hitting a code path that should be hit once per thread in normal
operation.
Copying an idea from mimalloc, initialise the TLS variable to a global
allocator that doesn't own any memory and then lazily check when we hit
a slow path (which we always do when using the global allocator, because
it doesn't own any memory) if we are the global allocator and replace
it.
There is a slight complication compared to mimalloc's version of this
idea. Snmalloc collects outgoing messages and it's possible for the
first operation in a thread to be a free of memory allocated by a
different thread. We address this by initialising the queues with a
size value indicating that they are full and then do the lazy check when
about to insert a message that would make a queue full. This will then
trigger lazy creation of an allocator.
Global initialisation doesn't work for the fake allocator, so skip most
of its constructor.