All the checks and mitigations have been placed under feature flags.
These can be controlled by defining
SNMALLOC_CHECK_CLIENT_MITIGATIONS
This can take a term that represents the mitigations that should be enabled.
E.g.
-DSNMALLOC_CHECK_CLIENT_MITIGATIONS=nochecks+random_pagemap
The CMake uses this to build numerous versions of the LD_PRELOAD library and
tests to allow individual features to be benchmarked.
Co-authored-by: Nathaniel Wesley Filardo <nfilardo@microsoft.com>
This uses VirtualBox instead of xhyve. It might be slower, but should
be more reliable.
Tests run on FreeBSD, NetBSD, and OpenBSD. Only the FreeBSD ones are
passing at the moment, the others will keep running but aren't added as
dependencies for the action used to guard commits.
As with CTest, but without the full machinery thereof. This facilitates
package builders to use the usual build targets (all, install) without
needing to build the test programs if they're just going to get dropped
on the floor.
* Fix pal_linux.h for older linux systems
Where MADV_FREE is not defined - replaced with MADV_DONTNEED
Where GRND_NONBLOCK is not defined in <sys/random.h> but in <linux/random.h>
* Check for linux/random.h in CMake
as __has_include seems to not be reliable
* Use CMake module CheckIncludeFilesCXX
as C language isn't enabled by default everywhere
* Move madvise flag ifdefs into constexpr for cleaner code
Build three levels of checking
- None
- Checks memcpy only
- Checks (full)
Currently you can build checks without enabling the memcpy protection.
This PR fixes that.
See src/snmalloc/README.md for an explanation of the layers.
Some other cleanups on the way:
Fine-grained stats support is now gone.
It's been broken for two years, it depends on iostream (which then
causes linker failures with libstdc++) and it's collecting the wrong
stats for the new design. After discussion with @mjp41, it's better to
remove it and introduce new stats support later, rather than keep broken
code in the main branch.
Tracing was controlled with a preprocessor macro, now there's also a
CMake option.
* export netbsd's reallocarr proposal.
acts subtly differently from reallocarray, returns an error code
and first argument as receiver.
* not export by default
* ci tests
* apply suggestions
* doc addition
* Apply suggestions from code review
Co-authored-by: Matthew Parkinson <mjp41@users.noreply.github.com>
This adds a way to periodically pool the PAL to see if any timers have
expired. Timers can be used to periodically provide callbacks to the
rest of snmalloc.
The memcpy implementation is not completely stupid but is almost
certainly not as good as a carefully tuned and optimised one.
Building snmalloc with FreeBSD's libc memcpy + jemalloc and with this,
each 10 times, does not show a statistically significant performance
difference at 95% confidence. The snmalloc version has very slightly
lower median and worst-case times. This is in no way a sensible
benchmark, but it serves as a smoke test for significant performance
regressions.
The CI self-host job now uses the checked memcpy.
This also fixes an off-by-one error in the external bounds. This is
triggered by ninja, so we will see breakage in CI if it is reintroduced.
In debug builds, we provide a verbose error containing the address of
the allocation, the base and bounds of the allocation, and a backtrace.
The backtrace was broken by the CI cleanup moving the BACKTRACE_HEADER
macro into the SNMALLOC_ namespace. This is also fixed.
The test involves hijacking `abort`, which doesn't work everywhere. It
also requires `backtrace` to work in configurations where stack traces
are enabled. This is disabled in QEMU because `backtrace` appears to
crash reliably in QEMU user mode.
For now, in the -checks build configurations, we are hitting a slow path
in the pagemap on accesses so that the pages that are `PROT_NONE` don't
cause crashes. These need to be made read-only, but this requires a PAL
change.
Modernise and tidy the CMake a bit:
- Use generator expressions for a lot of conditionals so that things
are more reliable with multi-config generators (and less verbose).
- Remove C as a needed language. None of the code was C but we were
using C to test if headers worked. This was fragile because a build
with `CMAKE_CXX_COMPILER` set might have checked things compiled with
the system C compiler and then failed when the specified C++ compiler
used different headers.
- Rename the `BACKTRACE_HEADER` macro to `SNMALLOC_BACKTRACE_HEADER`.
This is exposed into code that consumes snmalloc and so should be
'namespaced' (to the degree that's possible with C macros).
- Clean up the options and use dependent options to hide options
that are not always relevant.
- Use functions instead of macros for better variable scoping.
- Factor out some duplicated bits into functions.
- Update to the latest way of telling CMake to use C++17 or C++20.
- Migrate everything that's setting global properties to setting only
per-target properties.
- Link with -nostdlib++ if it's available. If it isn't, fall back to
enabling the C language and linking with the C compiler.
- Make the per-test log messages verbose outputs. These kept scrolling
important messages off the top of the screen for me.
- Make building as a header-only library a public option.
- Add install targets that install all of the headers and provide a
config option. This works with the header-only configuration for
integration with things like vcpkg.
- Fix a missing `#endif` in the `malloc_useable_size` check. This was
failing co compile on all platforms because of the missing `#endif`.
- Bump the minimum version to 3.14 so that we have access to
target_link_options. This is necessary to use generator expressions
for linker flags.
- Make the linker error if the shim libraries depend on symbols that
are not defined in the explicitly-provided libraries.
- Make the old-Ubuntu CI jobs use C++17 explicitly (previously CMake
was silently ignoring the fact that the compiler didn't support C++20)
- Fix errors found by the more aggressive linking mode.
With these changes, it's now possible to install snmalloc and then, in
another project, do something like this:
```cmake
find_package(snmalloc CONFIG REQUIRED)
target_link_libraries(t1 snmalloc::snmalloc)
target_link_libraries(t2 snmalloc::snmallocshim-static)
```
In this example, `t1` gets all of the compile flags necessary to include
snmalloc headers for its build configuration. `t2` is additionally
linked to the snmalloc static shim library.
This passes though to an underlying allocator rather than using
snmalloc. This is required for using ASAN in Verona. Verona takes a
close coupling with snmalloc, but to use with ASAN would require a
more work, so we pass to the system allocator in this case.
The code was able to use pthread destructors rather than C++ thread
local destructors. This removes the dependence on a C++ .so on linux.
However, this is not stable on other platforms such as Apple. Where the
C++ thread local state can be cleared before the pthread destructor
runs.
# Pagemap
The Pagemap now stores all the meta-data for the object allocation. The meta-data in the pagemap is effectively a triple of the sizeclass, the remote allocator, and a pointer to a 64 byte block of meta-data for this chunk of memory. By storing the pointer to a block, it allows the pagemap to handle multiple slab sizes without branching on the fast path. There is one entry in the pagemap per 16KiB of address space, but by using the same entry in the pagemap for 4 adjacent entries, then we can treat a 64KiB range can be treated as a single slab of allocations.
This change also means there is almost no capability amplification required by the implementation on CHERI for finding meta-data. The only amplification is required, when we change the way a chunk is used to a size of object allocation.
# Backend
There is a second major aspect of the refactor that there is now a narrow API that abstracts the Pagemap, PAL and address space management. This should better enable the compartmentalisation and makes it easier to produce alternative backends for various research directions. This is a template parameter that can be used to specialised by the front-end in different ways.
# Thread local state
The thread local state has been refactored into two components, one (called 'localalloc') that is stored directly in the TLS and is constant initialised, and one that is allocated in the address space (called 'coreallloc') which is lazily created and pooled.
# Difference
This removes Superslabs/Medium slabs as there meta-data is now part of the pagemap.
The slab allocation pattern is randomised based on the deallocation
pattern. This achieved by using two queues to enqueue free elements
onto. We pick "randomly", which queue to add to, and then when we take
the free_queue to use, we splice the two queues together.
The initial performance monitoring for snmalloc used timing of small
operations to guide the design. This feature has not been maintained or
used for several years.
This commit removes the feature.
Free list pointers can be exploited by attackers. This commit implements
a simple encoding scheme to detect corruption of the pointers. This can
be used to detect UAF and double free.
This does not currently address anything for Medium or Large
allocations. It also does not address cross thread deallocations.
Co-authored-by: Nathaniel Wesley Filardo <nfilardo@microsoft.com>
MSVC has strong opinions on implicit conversions as used in CI, while Clang both
locally and in CI has weaker opinions. In an effort to avoid subsequent
roundtrips through CI, make clang more strict. Adding -Wconversion definitely
increases the strength of clang's opinions, apparently to include frowning on
some that even MSVC considers OK, so go make explicit the current implicit
behavior.
The previous setting applied USE_POSIX_COMMIT_CHECKS to snmalloc if it
was a non-release build. This caused issues in CCF virtual mode, as it
was being built in RelWithDebInfo.
This commit changes the flag to be applied less, but for tests to always
apply the setting independent of build type.
This means that when snmalloc is being used as a library, it will be
off, unless explicitly requested.
Summary of changes:
- Add a new PAL that doesn't allocate memory, which can be used with a
memory provider that is pre-initialised with a range of memory.
- Add a `NoAllocation` PAL property so that the methods on a PAL that
doesn't support dynamically reserving address space will never be
called and therefore don't need to be implemented.
- Slightly refactor the memory provider class so that it has a narrower
interface with LargeAlloc and is easier to proxy.
- Allow the address space manager and the memory provider to be
initialised with a range of memory.
This may eventually also remove the need for (or, at least, simplify)
the Open Enclave PAL.
This commit also ends up with a few other cleanups:
- The `malloc_useable_size` CMake test that checks whether the
parameter is const qualified was failing on FreeBSD where this
function is declared in `malloc_np.h` but where including
`malloc.h` raises an error. This should now be more robust.
- The BSD aligned PAL inherited from the BSD PAL, which does not
expose aligned allocation. This meant that it exposed both the
aligned and non-aligned allocation interfaces and so happily
accepted incorrect `constexpr` if blocks that expected one or
the other but accidentally required both to exist. The unaligned
function is now deleted so the same failures that appear in CI should
appear locally for anyone using this PAL.
- Close to OpenBSD as there is no malloc*size api nor arbritrary
alignment support.
- Like FreeBSD, MAP_NORESERVE never had been implemented even tough
still present in the header but not mentioned in the man page,
FreeBSD has reserved the value for another later usage seems
DragonFly has just out of sync header.
* Add concept of natural alignment to tests.
snmalloc naturally aligns blocks very heavily, so that
the largest power-of-two in the rounded size is the alignment.
This checks that in the test, and provides a method for
finding the natural alignment of a block.
* Improve USE_MALLOC to provide alignment
snmalloc provides a lot of alginment guarantees. This ensures that when
we pass through to the system allocator we still get those alignment
guarantees.
The commit also fixes the tests to work with USE_MALLOC, and builds a
set of unit tests for ctest to check behaviour.
This will not be used unless the C++ standard version is raised to 20. As
concepts and C++20 more generally are quite new, this does not do so.
Nevertheless, the use of concepts can improve the local development experience
as type mismatches are discovered earlier (at template invocation rather than
only during expansion).