* export netbsd's reallocarr proposal.
acts subtly differently from reallocarray, returns an error code
and first argument as receiver.
* not export by default
* ci tests
* apply suggestions
* doc addition
* Apply suggestions from code review
Co-authored-by: Matthew Parkinson <mjp41@users.noreply.github.com>
This adds a way to periodically pool the PAL to see if any timers have
expired. Timers can be used to periodically provide callbacks to the
rest of snmalloc.
The memcpy implementation is not completely stupid but is almost
certainly not as good as a carefully tuned and optimised one.
Building snmalloc with FreeBSD's libc memcpy + jemalloc and with this,
each 10 times, does not show a statistically significant performance
difference at 95% confidence. The snmalloc version has very slightly
lower median and worst-case times. This is in no way a sensible
benchmark, but it serves as a smoke test for significant performance
regressions.
The CI self-host job now uses the checked memcpy.
This also fixes an off-by-one error in the external bounds. This is
triggered by ninja, so we will see breakage in CI if it is reintroduced.
In debug builds, we provide a verbose error containing the address of
the allocation, the base and bounds of the allocation, and a backtrace.
The backtrace was broken by the CI cleanup moving the BACKTRACE_HEADER
macro into the SNMALLOC_ namespace. This is also fixed.
The test involves hijacking `abort`, which doesn't work everywhere. It
also requires `backtrace` to work in configurations where stack traces
are enabled. This is disabled in QEMU because `backtrace` appears to
crash reliably in QEMU user mode.
For now, in the -checks build configurations, we are hitting a slow path
in the pagemap on accesses so that the pages that are `PROT_NONE` don't
cause crashes. These need to be made read-only, but this requires a PAL
change.
Modernise and tidy the CMake a bit:
- Use generator expressions for a lot of conditionals so that things
are more reliable with multi-config generators (and less verbose).
- Remove C as a needed language. None of the code was C but we were
using C to test if headers worked. This was fragile because a build
with `CMAKE_CXX_COMPILER` set might have checked things compiled with
the system C compiler and then failed when the specified C++ compiler
used different headers.
- Rename the `BACKTRACE_HEADER` macro to `SNMALLOC_BACKTRACE_HEADER`.
This is exposed into code that consumes snmalloc and so should be
'namespaced' (to the degree that's possible with C macros).
- Clean up the options and use dependent options to hide options
that are not always relevant.
- Use functions instead of macros for better variable scoping.
- Factor out some duplicated bits into functions.
- Update to the latest way of telling CMake to use C++17 or C++20.
- Migrate everything that's setting global properties to setting only
per-target properties.
- Link with -nostdlib++ if it's available. If it isn't, fall back to
enabling the C language and linking with the C compiler.
- Make the per-test log messages verbose outputs. These kept scrolling
important messages off the top of the screen for me.
- Make building as a header-only library a public option.
- Add install targets that install all of the headers and provide a
config option. This works with the header-only configuration for
integration with things like vcpkg.
- Fix a missing `#endif` in the `malloc_useable_size` check. This was
failing co compile on all platforms because of the missing `#endif`.
- Bump the minimum version to 3.14 so that we have access to
target_link_options. This is necessary to use generator expressions
for linker flags.
- Make the linker error if the shim libraries depend on symbols that
are not defined in the explicitly-provided libraries.
- Make the old-Ubuntu CI jobs use C++17 explicitly (previously CMake
was silently ignoring the fact that the compiler didn't support C++20)
- Fix errors found by the more aggressive linking mode.
With these changes, it's now possible to install snmalloc and then, in
another project, do something like this:
```cmake
find_package(snmalloc CONFIG REQUIRED)
target_link_libraries(t1 snmalloc::snmalloc)
target_link_libraries(t2 snmalloc::snmallocshim-static)
```
In this example, `t1` gets all of the compile flags necessary to include
snmalloc headers for its build configuration. `t2` is additionally
linked to the snmalloc static shim library.
This passes though to an underlying allocator rather than using
snmalloc. This is required for using ASAN in Verona. Verona takes a
close coupling with snmalloc, but to use with ASAN would require a
more work, so we pass to the system allocator in this case.
The code was able to use pthread destructors rather than C++ thread
local destructors. This removes the dependence on a C++ .so on linux.
However, this is not stable on other platforms such as Apple. Where the
C++ thread local state can be cleared before the pthread destructor
runs.
# Pagemap
The Pagemap now stores all the meta-data for the object allocation. The meta-data in the pagemap is effectively a triple of the sizeclass, the remote allocator, and a pointer to a 64 byte block of meta-data for this chunk of memory. By storing the pointer to a block, it allows the pagemap to handle multiple slab sizes without branching on the fast path. There is one entry in the pagemap per 16KiB of address space, but by using the same entry in the pagemap for 4 adjacent entries, then we can treat a 64KiB range can be treated as a single slab of allocations.
This change also means there is almost no capability amplification required by the implementation on CHERI for finding meta-data. The only amplification is required, when we change the way a chunk is used to a size of object allocation.
# Backend
There is a second major aspect of the refactor that there is now a narrow API that abstracts the Pagemap, PAL and address space management. This should better enable the compartmentalisation and makes it easier to produce alternative backends for various research directions. This is a template parameter that can be used to specialised by the front-end in different ways.
# Thread local state
The thread local state has been refactored into two components, one (called 'localalloc') that is stored directly in the TLS and is constant initialised, and one that is allocated in the address space (called 'coreallloc') which is lazily created and pooled.
# Difference
This removes Superslabs/Medium slabs as there meta-data is now part of the pagemap.
The slab allocation pattern is randomised based on the deallocation
pattern. This achieved by using two queues to enqueue free elements
onto. We pick "randomly", which queue to add to, and then when we take
the free_queue to use, we splice the two queues together.
The initial performance monitoring for snmalloc used timing of small
operations to guide the design. This feature has not been maintained or
used for several years.
This commit removes the feature.
Free list pointers can be exploited by attackers. This commit implements
a simple encoding scheme to detect corruption of the pointers. This can
be used to detect UAF and double free.
This does not currently address anything for Medium or Large
allocations. It also does not address cross thread deallocations.
Co-authored-by: Nathaniel Wesley Filardo <nfilardo@microsoft.com>
MSVC has strong opinions on implicit conversions as used in CI, while Clang both
locally and in CI has weaker opinions. In an effort to avoid subsequent
roundtrips through CI, make clang more strict. Adding -Wconversion definitely
increases the strength of clang's opinions, apparently to include frowning on
some that even MSVC considers OK, so go make explicit the current implicit
behavior.
The previous setting applied USE_POSIX_COMMIT_CHECKS to snmalloc if it
was a non-release build. This caused issues in CCF virtual mode, as it
was being built in RelWithDebInfo.
This commit changes the flag to be applied less, but for tests to always
apply the setting independent of build type.
This means that when snmalloc is being used as a library, it will be
off, unless explicitly requested.
Summary of changes:
- Add a new PAL that doesn't allocate memory, which can be used with a
memory provider that is pre-initialised with a range of memory.
- Add a `NoAllocation` PAL property so that the methods on a PAL that
doesn't support dynamically reserving address space will never be
called and therefore don't need to be implemented.
- Slightly refactor the memory provider class so that it has a narrower
interface with LargeAlloc and is easier to proxy.
- Allow the address space manager and the memory provider to be
initialised with a range of memory.
This may eventually also remove the need for (or, at least, simplify)
the Open Enclave PAL.
This commit also ends up with a few other cleanups:
- The `malloc_useable_size` CMake test that checks whether the
parameter is const qualified was failing on FreeBSD where this
function is declared in `malloc_np.h` but where including
`malloc.h` raises an error. This should now be more robust.
- The BSD aligned PAL inherited from the BSD PAL, which does not
expose aligned allocation. This meant that it exposed both the
aligned and non-aligned allocation interfaces and so happily
accepted incorrect `constexpr` if blocks that expected one or
the other but accidentally required both to exist. The unaligned
function is now deleted so the same failures that appear in CI should
appear locally for anyone using this PAL.
- Close to OpenBSD as there is no malloc*size api nor arbritrary
alignment support.
- Like FreeBSD, MAP_NORESERVE never had been implemented even tough
still present in the header but not mentioned in the man page,
FreeBSD has reserved the value for another later usage seems
DragonFly has just out of sync header.
* Add concept of natural alignment to tests.
snmalloc naturally aligns blocks very heavily, so that
the largest power-of-two in the rounded size is the alignment.
This checks that in the test, and provides a method for
finding the natural alignment of a block.
* Improve USE_MALLOC to provide alignment
snmalloc provides a lot of alginment guarantees. This ensures that when
we pass through to the system allocator we still get those alignment
guarantees.
The commit also fixes the tests to work with USE_MALLOC, and builds a
set of unit tests for ctest to check behaviour.
This will not be used unless the C++ standard version is raised to 20. As
concepts and C++20 more generally are quite new, this does not do so.
Nevertheless, the use of concepts can improve the local development experience
as type mismatches are discovered earlier (at template invocation rather than
only during expansion).
This change makes the original 16MiB option not the common option.
It also changes the names of the defines to
SNMALLOC_USE_LARGE_CHUNKS
SNMALLOC_USE_SMALL_CHUNKS
The second should be set for Open Enclave configuration, and results in
256KiB chunk sizes. The first being set builds the original 16MiB chunk
sizes. If neither is set, then we default to 1MiB chunk sizes.
* Make binaries more compatible by default
Turn `-march=native` off by default. This makes binaries more portable,
but may harm performance. However, fast paths look unaltered
* Change setting to on if specified.
* Defensive code for alloc/dealloc during TLS teardown
If an allocation or deallocation occurs during TLS teardown, then it is
possible for a new allocator to be created and then this is leaked. On
the mimalloc-bench mstressN benchmark this was observed leading to a
large memory leak.
This fix, detects if we are in the TLS teardown phase, and if so,
the calls to alloc or dealloc must return the allocator once they have
perform the specific operation.
Uses a separate variable to represent if a thread_local's destructor has
run already. This is used to detect thread teardown to put the
allocator into a special slow path to avoid leaks.
* Added some printing first operation to track progress
* Improve error messages on posix
Flush errors, print assert details, and present stack traces.
* Detect incorrect use of pool.
* Clang format.
* Replace broken LL/SC implementation
LL/SC implementation was broken, this replaces it with
a locking implementation. Changes the API to support LL/SC
for future implementation on ARM.
* Improve TLS teardown.
* Make std::function fully inlined.
* Factor out PALLinux stack trace.
* Add checks for leaking allocators.
* Add release build of Windows Clang