# Pagemap
The Pagemap now stores all the meta-data for the object allocation. The meta-data in the pagemap is effectively a triple of the sizeclass, the remote allocator, and a pointer to a 64 byte block of meta-data for this chunk of memory. By storing the pointer to a block, it allows the pagemap to handle multiple slab sizes without branching on the fast path. There is one entry in the pagemap per 16KiB of address space, but by using the same entry in the pagemap for 4 adjacent entries, then we can treat a 64KiB range can be treated as a single slab of allocations.
This change also means there is almost no capability amplification required by the implementation on CHERI for finding meta-data. The only amplification is required, when we change the way a chunk is used to a size of object allocation.
# Backend
There is a second major aspect of the refactor that there is now a narrow API that abstracts the Pagemap, PAL and address space management. This should better enable the compartmentalisation and makes it easier to produce alternative backends for various research directions. This is a template parameter that can be used to specialised by the front-end in different ways.
# Thread local state
The thread local state has been refactored into two components, one (called 'localalloc') that is stored directly in the TLS and is constant initialised, and one that is allocated in the address space (called 'coreallloc') which is lazily created and pooled.
# Difference
This removes Superslabs/Medium slabs as there meta-data is now part of the pagemap.
The slab allocation pattern is randomised based on the deallocation
pattern. This achieved by using two queues to enqueue free elements
onto. We pick "randomly", which queue to add to, and then when we take
the free_queue to use, we splice the two queues together.
The initial performance monitoring for snmalloc used timing of small
operations to guide the design. This feature has not been maintained or
used for several years.
This commit removes the feature.
Free list pointers can be exploited by attackers. This commit implements
a simple encoding scheme to detect corruption of the pointers. This can
be used to detect UAF and double free.
This does not currently address anything for Medium or Large
allocations. It also does not address cross thread deallocations.
Co-authored-by: Nathaniel Wesley Filardo <nfilardo@microsoft.com>
MSVC has strong opinions on implicit conversions as used in CI, while Clang both
locally and in CI has weaker opinions. In an effort to avoid subsequent
roundtrips through CI, make clang more strict. Adding -Wconversion definitely
increases the strength of clang's opinions, apparently to include frowning on
some that even MSVC considers OK, so go make explicit the current implicit
behavior.
The previous setting applied USE_POSIX_COMMIT_CHECKS to snmalloc if it
was a non-release build. This caused issues in CCF virtual mode, as it
was being built in RelWithDebInfo.
This commit changes the flag to be applied less, but for tests to always
apply the setting independent of build type.
This means that when snmalloc is being used as a library, it will be
off, unless explicitly requested.
Summary of changes:
- Add a new PAL that doesn't allocate memory, which can be used with a
memory provider that is pre-initialised with a range of memory.
- Add a `NoAllocation` PAL property so that the methods on a PAL that
doesn't support dynamically reserving address space will never be
called and therefore don't need to be implemented.
- Slightly refactor the memory provider class so that it has a narrower
interface with LargeAlloc and is easier to proxy.
- Allow the address space manager and the memory provider to be
initialised with a range of memory.
This may eventually also remove the need for (or, at least, simplify)
the Open Enclave PAL.
This commit also ends up with a few other cleanups:
- The `malloc_useable_size` CMake test that checks whether the
parameter is const qualified was failing on FreeBSD where this
function is declared in `malloc_np.h` but where including
`malloc.h` raises an error. This should now be more robust.
- The BSD aligned PAL inherited from the BSD PAL, which does not
expose aligned allocation. This meant that it exposed both the
aligned and non-aligned allocation interfaces and so happily
accepted incorrect `constexpr` if blocks that expected one or
the other but accidentally required both to exist. The unaligned
function is now deleted so the same failures that appear in CI should
appear locally for anyone using this PAL.
- Close to OpenBSD as there is no malloc*size api nor arbritrary
alignment support.
- Like FreeBSD, MAP_NORESERVE never had been implemented even tough
still present in the header but not mentioned in the man page,
FreeBSD has reserved the value for another later usage seems
DragonFly has just out of sync header.
* Add concept of natural alignment to tests.
snmalloc naturally aligns blocks very heavily, so that
the largest power-of-two in the rounded size is the alignment.
This checks that in the test, and provides a method for
finding the natural alignment of a block.
* Improve USE_MALLOC to provide alignment
snmalloc provides a lot of alginment guarantees. This ensures that when
we pass through to the system allocator we still get those alignment
guarantees.
The commit also fixes the tests to work with USE_MALLOC, and builds a
set of unit tests for ctest to check behaviour.
This will not be used unless the C++ standard version is raised to 20. As
concepts and C++20 more generally are quite new, this does not do so.
Nevertheless, the use of concepts can improve the local development experience
as type mismatches are discovered earlier (at template invocation rather than
only during expansion).
This change makes the original 16MiB option not the common option.
It also changes the names of the defines to
SNMALLOC_USE_LARGE_CHUNKS
SNMALLOC_USE_SMALL_CHUNKS
The second should be set for Open Enclave configuration, and results in
256KiB chunk sizes. The first being set builds the original 16MiB chunk
sizes. If neither is set, then we default to 1MiB chunk sizes.
* Make binaries more compatible by default
Turn `-march=native` off by default. This makes binaries more portable,
but may harm performance. However, fast paths look unaltered
* Change setting to on if specified.
* Defensive code for alloc/dealloc during TLS teardown
If an allocation or deallocation occurs during TLS teardown, then it is
possible for a new allocator to be created and then this is leaked. On
the mimalloc-bench mstressN benchmark this was observed leading to a
large memory leak.
This fix, detects if we are in the TLS teardown phase, and if so,
the calls to alloc or dealloc must return the allocator once they have
perform the specific operation.
Uses a separate variable to represent if a thread_local's destructor has
run already. This is used to detect thread teardown to put the
allocator into a special slow path to avoid leaks.
* Added some printing first operation to track progress
* Improve error messages on posix
Flush errors, print assert details, and present stack traces.
* Detect incorrect use of pool.
* Clang format.
* Replace broken LL/SC implementation
LL/SC implementation was broken, this replaces it with
a locking implementation. Changes the API to support LL/SC
for future implementation on ARM.
* Improve TLS teardown.
* Make std::function fully inlined.
* Factor out PALLinux stack trace.
* Add checks for leaking allocators.
* Add release build of Windows Clang
With clangformat9 the AfterCaseLabel is introduced.
And this defaults to false, but our code is formatted
implicitly with this set to true.
PRs to Verona and Snmalloc are being formatted with clangformat9,
and this is causing complexity. Let's move forward to clangformat9
in CI.
find_program can do that for us. Additionally that loop was resetting
the CLANG_FORMAT every time ninja was run, making it impossible to
pass a specific -DCLANG_FORMAT=... to cmake.
I've tried version 6 to 8 and formatting remains stable. clang 9 is
when it breaks down because of AfterCaseLabel.
Fixes a few places where Clang complains about Windows specific code,
and also uses macros supported by Clang on Windows. A few places
separating platform and compiler specific code, as MSVC and WIN32 were
used interchangably previously.
* add rust support
* move aligned_size to sizeclass.h
* add static qualifier
* adjust CMakeLists.txt, may broke CI tests
* fix msvc's complaining on c++17
* use SNMALLOC_FAST_PATH as the decorator of aligned_size
* adapt new alignment algorithm and add related test
Co-authored-by: mjp41 <mattpark@microsoft.com>
* fix test cases for msvc
* add extra test for size == 0
* treat memory block of same sizeclass as the same
* fix formatting problem
* remove extra declarations
Co-authored-by: Matthew Parkinson <mjp41@users.noreply.github.com>