* Removed unneeded headers
This removes some unneeded headers from the headers.
* Remove use of std::string
This stack allocates and copies a c-string to replace the calls to std::string.
* Pickup page size from unistd.h
This uses the PAGESIZE constant from the unistd.h on POSIX.
This should make the code more resilient to being compiled on platforms with
different page sizes.
* Allow pagesize to come from cmake.
* Update src/snmalloc/pal/pal_posix.h
Co-authored-by: Nathaniel Filardo <105816689+nwf-msr@users.noreply.github.com>
---------
Co-authored-by: Nathaniel Filardo <105816689+nwf-msr@users.noreply.github.com>
* Rename dealloc_local_object_slower to _meta
Unlike its brethren, `dealloc_local_object` and
`dealloc_local_object_slow`, the `dealloc_local_object_slower` method
does not take a pointer to free space. Make this slightly more apparent
by renaming it and adding some commentary to both definition and call
site.
* corealloc: get meta in dealloc_local_object
Make both _fast() and _slow() arms take the meta as an argument; _meta()
already did.
* Introduce RemoteMessage structure
Plumb its use around remoteallocator and remotecache
* NFC: Plumb metadata to remotecache dealloc
* Initial steps in batched remote messages
This prepares the recipient to process a batched message.
* Initial dealloc-side batching machinery
Exercise recipient machinery by having the senders collect adjacent frees to
the same slab into a batch.
* Match free batch keying to slab freelist keying
* freelist: add append_segment
* SlabMetadata: machinery for returning multiple objects
This might involve multiple (I think at most two, at the moment) transitions in
the slab lifecycle state machine. Towards that end, return indicators to the
caller that the slow path must be taken and how many objects of the original
set have not yet been counted as returned.
* corealloc: operate ring-at-a-time on remote queues
* RemoteCache associative cache of rings
* RemoteCache: N-set caching
* Initial CHERI support for free rings
* Matt's fix for slow-path codegen
* Try: remotecache: don't store allocator IDs
We can, as Matt so kindly reminds me, go get them from the pagemap. Since we
need this value only when closing a ring, the read from over there is probably
not very onerous. (We could also get the slab pointer from an object in the
ring, but we need that whenever inserting into the cache, so it's probably more
sensible to store that locally?)
* Make BatchIt optional
Move ring set bits and associativity knobs to allocconfig and expose them via
CMake. If associtivity is zero, use non-batched implementations of the
`RemoteMessage` and `RemoteDeallocCacheBatching` classes.
By default, kick BatchIt on when we have enough room in the minimum allocation
size to do it. Exactly how much space is enough is a function of which
mitigations we have enabled and whether or not we are compiling with C++20.
This commit reverts the change to `MIN_ALLOC_SIZE` made in "Introduce
RemoteMessage structure" now that we have multiple types, and zies, of
remote messages to choose from.
* RemoteDeallocCacheBatching: store metas as address
There's no need for a full pointer here, it'd just make the structure larger on
CHERI.
* NFC: plumb entropy from LocalAlloc to BatchIt
* BatchIt random eviction
In order not to thwart `mitigations(random_preserve)` too much, if it's on in
combination with BatchIt, roll the dice every time we append to a batch to
decide if we should stochastically evict this batch. By increasing the number
of batches, we allow the recipient allocator increased opportunity to randomly
stripe batches across the two `freelist::Builder` segments associated with each
slab.
---------
Co-authored-by: Nathaniel Wesley Filardo <nfilardo@microsoft.com>
Co-authored-by: Matthew Parkinson <mattpark@microsoft.com>
When building test/perf/singlethread to use the system allocator, gcc
(Debian 14.2.0-3) correctly sees that we were using the value of a
pointer after it had been passed to the privileged free(), which is UB.
Flip the check and dealloc, so that we query the set of pointers we're
tracking first, using the pointer while the allocation is still live.
In test/perf/startup, gcc (Debian 14.2.0-3) seems to get confused about
the size of the counters vector as the code was written. Rewrite the
code to pass the same value (`std:🧵:hardware_concurrency()`, but
in a local) to both `counters.resize()` and the `ParallelTest` ctor.
* msvc: set __cplusplus to the actual value in use
* ds_core/bits: add mask_bits; convert one_at_bit-s
* remotecache: enable reserve_space multiple objects
* nits
* Small changes to tracing
- Trace "Handling remote" once per batch, rather than per element
- Remote queue events also log the associated metaslab; we'll use this
to assess the efficacy of https://github.com/microsoft/snmalloc/issues/634
* freelist builder: allow forcibly tracking length
* Try forward declaring freelist::Builder to appease macos-14
* freelist: tweak intra-slab obfuscation keys by meta address
* NFC: freelist: allow `next` to be arbitrary value
* Switch to a central, tweaked key for all free lists
* allocconfig: introduce some properties of slabs
We'll use these to pack values in message queues.
- Maximum distance between two objects in a single slab
- Maximum number of objects in a slab
* NFC: Templatize LocalCache on Config
* NFC: split dealloc_local_object_slow
We'll use the _slower form when we're just stepping a slab through
multiple rounds of state transition (to come), which can't involve
the actual memory object in question.
* NFC: make freelist::Object::T-s by placement new
* NFC: CoreAlloc: split dealloc_local_object
The pattern of `if (!fast()) { slow() }` occurs in a few places, including in
contexts where we already know the entry and so don't need to look it up.
- Split ubuntu and macos CI actions, even though they use very similar steps
- Remove macos-11, keep -12, and add -14
- Have all macos platforms build with and without C++17
- Remove duplicated dependency lines in ubuntu matrix entries; push this down
to the steps
- Ensure that all added ubuntu matrix tuples have non-empty build-type
- Add all jobs to all-checks' "needs:" to ensure we wait for everything
* backend_helpers: introduce NopRange
* Fix to Buddy MIN == MAX case
This fixes the 0-length arrays discussed (and made into assertion failures) in
the next commit. This works because the Buddy's `MIN_SIZE_BITS` is instantiated
at `MIN_CHUNK_BITS`, and so we ensure that we instantiate the LargeBuddyRange
only with `max_page_chunk_size_bits` above `MIN_CHUNK_BITS`.
* Buddy range: assert that MAX > MIN
Now that the case leading to several 0-sized arrays in Buddy ranges, that then
cause gcc's -Warray-bounds to trip, has been removed, add a static assert so
that we can catch this with better error messages next time.
posix_memalign() requires alignment values of at least
sizeof(uintptr_t), but aligned_alloc() does not. memalign() regressed
to require larger alignment in commit
6cbc50fe2c.
Fixes#668
* Added lambda lock primitive
* Implement MCS Combining lock
This is hybrid of Flat Combining and the MCS queue lock. It uses
the queue like the MCS queue lock, but each item additionally
contains a thunk to perform the body of the lock. This enables
other threads to perform the work than initially issued the request.
* Add a fast path flag
This update adds a fast path flag for the uncontended case. This
reduces the number of atomic operations in the uncontended case.
* CR feedback
* Refactor buddy allocator
Make it clearer the structure of add_block by pulling out remove_buddy.
* Give buddy a few elements so don't have to touch pagemap earlie on.
* Only use do and dont dump on pagemap
The do and dont dump calls were costings a lot during start up of snmalloc. This reduces the times they are called to only be for the pagemap.
* NFC: split freelist_queue from remoteallocator
This lets us use freelists as message queues in contexts other than
the remoteallocator. No functional change indended.
* freelist_queue: add and use destroy_and_iterate
* freelist: make backptr obfuscation key "tweakable"
* freelist: tweakable keys in forward direction, too
* test/perf/msgpass: ubench a producer-consumer app
Approximate a message-passing application as a set of producers, a set of
consumers, and a set of proxies that do both. We'll use this for some initial
insight for https://github.com/microsoft/snmalloc/issues/634 but it seems worth
having in general.
This provide a way to configure snmalloc to provide per object meta-data that is out of band. This can be used to provide different mitigations on top of snmalloc, such as storing memory tags in a compressed form, or provide a miracle pointer like feature.
This also includes a couple of TSAN fixes as it wasn't fully on in CI.
* Move sizeclass debugging code to sizeclass test
The sizeclass was already testing most of this, so just add the missing bits.
Forgo some tests whose failure would have implied earlier failures.
This moves the last dynamic call of size_to_sizeclass_const into tests
(and so, too, to_exp_mant_const). sizeclasstable.h still contains a static
call to compute NUM_SMALL_SIZECLASSES from MAX_SMALL_SIZECLASS_SIZE.
* Remove unused to_exp_mant
Only its _const sibling is used, and little at that, now that almost everything
to do with sizes and size classes is table-driven.
* test/memcpy: trap, if we can, before exiting
This just means I don't need to remember to set a breakpoint on exit
* test/memcpy: don't assume sizeclass 0 is allocable
* test/memory: don't assume sizeclass 0 is allocable
* test/sizeclass: handle nonzero minimum sizeclasses
* sizeclass: distinguish min alloc and step size
Add support for a minimum allocation size that isn't the minimum step of
the sizeclass table.
* Expose MIN_ALLOC_{,STEP}_SIZE through cmake
* test/sizeclass: report MIN_ALLOC_{STEP_,}SIZE
If we're running with the freelist_backward_edge mitigation turned on, then
we're going to follow the pointer, not just de-obfuscate it (in freelist's
atomic_read_next), so even if the queue heads are tame, we still need to
do this domestication.
* Template construction of Pool elements
The Pool class is used by verona-rt. The recent changes made this
less nice to consume as an API.
This change makes the construction logic a template parameter to the
Pool. This enables standard allocation to be used from Verona.
* Drop parameter from acquire
Pool::acquire took a list of parameters to initialise the object that it
constructed. But if this was serviced from the pool, the parameter
would be ignored. This is not an ideal API.
This PR removes the ability to pass a parameter.
* Benchmark for testing startup performance.
* Make pool pass spare space to pooled item
The pool will result in power of 2 allocations as it doesn't have a
local state when it is initially set up.
This commit passes this extra space to the constructor of the pooled
type, so that it can be feed into the freshly created allocator.
Co-authored-by: Nathaniel Wesley Filardo <nfilardo@microsoft.com>
This allows them to exist as fields without invalidating the set.
This should make it possible to remove the undefined behaviour in the
creation of FrontendSlabMetadata, which is currently created via a
reinterpret_cast from a different-typed allocation.
FrontendSlabMetadata has a SeqSet::Node field that is in an unspecified
state on construction and which is valid only when inserted into a
SeqSet.
* Fix#631
Add wrapper override for the Windows variant of maloc_usable_size : _msize
Co-authored-by: Matthew Parkinson <mjp41@users.noreply.github.com>
* Using exclusive mode prefetch
The prefetching is always used to move the cache line to the current
core for writing. This change makes it use exclusive mode prefetch
and enables it as a feature flag for x64.
* Debug platform for BSDs
* CI fixes
* More CI
* Update ARM prefetch
* Update x64 prefetch default
* Factor out libc code into a header.
This pulls the main definitions of the various libc malloc functions
into a header for easier use and inclusion in other projects.
* Clang-tidy fixes.
* Clang-tidy fixes really.
* More code quality changes
* Minor fix
* Clangformat
The current version requires clang-format-9. This now getting hard to get.
This commit moves it to the clang-format-15, which is the latest in 22.04.
Also, updates clang-tidy to 15 as well.
* Make a conditional range
This range allows for a contained range to be disabled at runtime.
This allows for thread local caching to be disabled if the initial fixed
size heap is below a threshold.