* Use C++17 inline statics
This leads to better codegen in GCC, and fixes some linking issues in OE.
* Detect GCC and OE combination and fall-back to lock based ABA.
* clangformat
* Defensive code for alloc/dealloc during TLS teardown
If an allocation or deallocation occurs during TLS teardown, then it is
possible for a new allocator to be created and then this is leaked. On
the mimalloc-bench mstressN benchmark this was observed leading to a
large memory leak.
This fix, detects if we are in the TLS teardown phase, and if so,
the calls to alloc or dealloc must return the allocator once they have
perform the specific operation.
Uses a separate variable to represent if a thread_local's destructor has
run already. This is used to detect thread teardown to put the
allocator into a special slow path to avoid leaks.
* Added some printing first operation to track progress
* Improve error messages on posix
Flush errors, print assert details, and present stack traces.
* Detect incorrect use of pool.
* Clang format.
* Replace broken LL/SC implementation
LL/SC implementation was broken, this replaces it with
a locking implementation. Changes the API to support LL/SC
for future implementation on ARM.
* Improve TLS teardown.
* Make std::function fully inlined.
* Factor out PALLinux stack trace.
* Add checks for leaking allocators.
* Add release build of Windows Clang
* Remote dealloc refactor.
* Improve remote dealloc
Change remote to count down to 0, so fast path does not need a constant.
Use signed value so that branch does not depend on addition.
* Inline remote_dealloc
The fast path of remote_dealloc is sufficiently compact that it can be
inlined.
* Improve fast path in Slab::alloc
Turn the internal structure into tail calls, to improve fast path.
Should be no algorithmic changes.
* Refactor initialisation to help fast path.
Break lazy initialisation into two functions, so it is easier to codegen
fast paths.
* Minor tidy to statically sized dealloc.
* Refactor semi-slow path for alloc
Make the backup path a bit faster. Only algorithmic change is to delay
checking for first allocation. Otherwise, should be unchanged.
* Test initial operation of a thread
The first operation a new thread takes is special. It results in
allocating an allocator, and swinging it into the TLS. This makes
this a very special path, that is rarely tested. This test generates
a lot of threads to cover the first alloc and dealloc operations.
* Correctly handle reusing get_noncachable
* Fix large alloc stats
Large alloc stats aren't necessarily balanced on a thread, this changes
to tracking individual pushs and pops, rather than the net effect
(with an unsigned value).
* Fix TLS init on large alloc path
* Add Bump ptrs to allocator
Each allocator has a bump ptr for each size class. This is no longer
slab local.
Slabs that haven't been fully allocated no longer need to be in the DLL
for this sizeclass.
* Change to a cycle non-empty list
This change reduces the branching in the case of finding a new free
list. Using a non-empty cyclic list enables branch free add, and a
single branch in remove to detect the empty case.
* Update differences
* Rename first allocation
Use needs initialisation as makes more sense for other scenarios.
* Use a ptrdiff to help with zero init.
* Make GlobalPlaceholder zero init
The GlobalPlaceholder allocator is now a zero init block of memory.
This removes various issues for when things are initialised. It is made read-only
to we detect write to it on some platforms.
* Increase Remote batch size
The remote batch size has not changed since the fast path optimisations.
The optimisations mean we are checking the queue considerably less
often, so the batch should be larger. This has a dramatic improvement
on performance on a few of the mimalloc microbenchmarks.
It is set to 4096 as this should cover the worse case scenario of only
remote deallocation at 16 bytes for the 2^16 slab size.
* Fixes for Clang-10
Clang-10 outputs a warning for calling alignment intrinsic with an
alignment of 1. At add constexpr to handle this case.
Improve remote dealloc
- Outline the slow path to improve code gen significantly
- Handle message queue only on slow path for remote dealloc.
- Change remote size to count down 0, so fast path does not need a constant.
- Use signed value so that branch does not depend on addition.
The bootstrapping allocator needs to perform a memcpy to bypass the
removed move constructors on std::atomic. This is safe as there is no
concurrency at this point, but GCC is unhappy with this.
This commit moves CI to GCC8 and disables this warning for that line.
The readme was considerably out of date, with the introduction
being over a year old. This commit reflects the developments and
improvements in stabiity of snmalloc.
On platforms that support low-memory notifications register callbacks
that perform lazy decommit. This allows idle processes to return memory
to the OS. Without incurring the cost of constantly committing and
decommitting memory.
Code review and CI changes
* Fixed test to use a template to make constexpr magic work
* Factored out basic notification mechanism so can be reused on other
platforms.