Граф коммитов

1194 Коммитов

Автор SHA1 Сообщение Дата
David CARLIER fd560d472b
Revert "custom memmove implementation proposal. (#593)" (#692)
* Revert "custom memmove implementation proposal. (#593)"

This reverts commit 01885f5a04.

* disable memmove fuzzing
2024-11-20 10:36:42 +00:00
Matthew Parkinson e343232611
Add fuzz to CI (#690)
* Add fuzz to CI

* Change compiler to clang++

* Fix flag name

* Remove fuzzing from a requirement.

* Update to checkout v4
2024-11-19 14:28:55 +00:00
Schrodinger ZHU Yifan 0b53b9301e
fuzzing test and detect the memmove error (#688)
* fuzzing

* add an additional random walk test
2024-11-19 13:48:18 +00:00
David CARLIER 01885f5a04
custom memmove implementation proposal. (#593)
mostly like memcpy with optional bound checking but
capable of handling overlapping cases thus using
reverse copy instead.
2024-11-16 07:39:38 +00:00
Schrodinger ZHU Yifan f7fe702f77
implement polite waiting (#685)
* implement polite waiting

* Update src/snmalloc/pal/pal_linux.h

Co-authored-by: Matthew Parkinson <mjp41@users.noreply.github.com>

* fix build issues

* fix more build issues

* support _umtx_op for freebsd

* unify waiting style

* fix

* address CR

Co-authored-by: Matthew Parkinson <mjp41@users.noreply.github.com>

* support macos

* static dispatch os APIs for apple

* make wait_on_address configurable via cmake

* undo extra include

* fix macos build

* fix clang-tidy build

---------

Co-authored-by: Matthew Parkinson <mjp41@users.noreply.github.com>
2024-11-16 07:33:11 +00:00
Matthew Parkinson fe3fed4414
Remove unneeded template. (#687) 2024-11-12 19:56:14 +00:00
David CARLIER 43ad730c29
PAL::haiku finally supports getentropy. (#684)
Also fixing the build with PAL::posix assuming the intent is
to avoid using std namespace for max but rather using the in-house
implementation.
2024-10-31 11:54:27 +00:00
Matthew Parkinson 97b7675670
Remove some unneeded headers (#680)
* Removed unneeded headers

This removes some unneeded headers from the headers.

* Remove use of std::string

This stack allocates and copies a c-string to replace the calls to std::string.
2024-10-06 09:14:56 +01:00
Matthew Parkinson c77076983d
Add documentation for the combining lock (#683)
This adds some documentation to make the combining lock easier to understand.
This is working towards documenting the changes for the 0.7 release.
2024-10-05 07:31:00 +01:00
David CARLIER f1df3d43bb
override malloc, aligned_alloc calling libc::aligned_alloc (#681) 2024-09-28 22:02:40 +01:00
Matthew Parkinson ab4fe84804
Provide option to CMake for Page Size (#664)
* Pickup page size from unistd.h

This uses the PAGESIZE constant from the unistd.h on POSIX.
This should make the code more resilient to being compiled on platforms with
different page sizes.

* Allow pagesize to come from cmake.

* Update src/snmalloc/pal/pal_posix.h

Co-authored-by: Nathaniel Filardo <105816689+nwf-msr@users.noreply.github.com>

---------

Co-authored-by: Nathaniel Filardo <105816689+nwf-msr@users.noreply.github.com>
2024-09-25 11:27:31 +01:00
Nathaniel Wesley Filardo fb776da909
WIP: BatchIt (#677)
* Rename dealloc_local_object_slower to _meta

Unlike its brethren, `dealloc_local_object` and
`dealloc_local_object_slow`, the `dealloc_local_object_slower` method
does not take a pointer to free space.  Make this slightly more apparent
by renaming it and adding some commentary to both definition and call
site.

* corealloc: get meta in dealloc_local_object

Make both _fast() and _slow() arms take the meta as an argument; _meta()
already did.

* Introduce RemoteMessage structure

Plumb its use around remoteallocator and remotecache

* NFC: Plumb metadata to remotecache dealloc

* Initial steps in batched remote messages

This prepares the recipient to process a batched message.

* Initial dealloc-side batching machinery

Exercise recipient machinery by having the senders collect adjacent frees to
the same slab into a batch.

* Match free batch keying to slab freelist keying

* freelist: add append_segment

* SlabMetadata: machinery for returning multiple objects

This might involve multiple (I think at most two, at the moment) transitions in
the slab lifecycle state machine.  Towards that end, return indicators to the
caller that the slow path must be taken and how many objects of the original
set have not yet been counted as returned.

* corealloc: operate ring-at-a-time on remote queues

* RemoteCache associative cache of rings

* RemoteCache: N-set caching

* Initial CHERI support for free rings

* Matt's fix for slow-path codegen

* Try: remotecache: don't store allocator IDs

We can, as Matt so kindly reminds me, go get them from the pagemap.  Since we
need this value only when closing a ring, the read from over there is probably
not very onerous.  (We could also get the slab pointer from an object in the
ring, but we need that whenever inserting into the cache, so it's probably more
sensible to store that locally?)

* Make BatchIt optional

Move ring set bits and associativity knobs to allocconfig and expose them via
CMake.  If associtivity is zero, use non-batched implementations of the
`RemoteMessage` and `RemoteDeallocCacheBatching` classes.

By default, kick BatchIt on when we have enough room in the minimum allocation
size to do it.  Exactly how much space is enough is a function of which
mitigations we have enabled and whether or not we are compiling with C++20.

This commit reverts the change to `MIN_ALLOC_SIZE` made in "Introduce
RemoteMessage structure" now that we have multiple types, and zies, of
remote messages to choose from.

* RemoteDeallocCacheBatching: store metas as address

There's no need for a full pointer here, it'd just make the structure larger on
CHERI.

* NFC: plumb entropy from LocalAlloc to BatchIt

* BatchIt random eviction

In order not to thwart `mitigations(random_preserve)` too much, if it's on in
combination with BatchIt, roll the dice every time we append to a batch to
decide if we should stochastically evict this batch.  By increasing the number
of batches, we allow the recipient allocator increased opportunity to randomly
stripe batches across the two `freelist::Builder` segments associated with each
slab.

---------

Co-authored-by: Nathaniel Wesley Filardo <nfilardo@microsoft.com>
Co-authored-by: Matthew Parkinson <mattpark@microsoft.com>
2024-09-23 19:18:09 +01:00
Nathaniel Wesley Filardo 416fd39f6a gcc UAF warning in test/perf/singlethread -malloc
When building test/perf/singlethread to use the system allocator, gcc
(Debian 14.2.0-3) correctly sees that we were using the value of a
pointer after it had been passed to the privileged free(), which is UB.

Flip the check and dealloc, so that we query the set of pointers we're
tracking first, using the pointer while the allocation is still live.
2024-09-21 14:48:04 +00:00
Nathaniel Wesley Filardo 19259095c6 Further gcc -Werror=array-bounds fix
In test/perf/startup, gcc (Debian 14.2.0-3) seems to get confused about
the size of the counters vector as the code was written.  Rewrite the
code to pass the same value (`std:🧵:hardware_concurrency()`, but
in a local) to both `counters.resize()` and the `ParallelTest` ctor.
2024-09-21 10:32:28 +00:00
David CARLIER d537d35268
export snmalloc::alloc_size to rust wrappers. (#676) 2024-09-19 15:54:56 +00:00
Nathaniel Filardo 8b95b9a916
Bottom commits from BatchIt (#675)
* msvc: set __cplusplus to the actual value in use

* ds_core/bits: add mask_bits; convert one_at_bit-s

* remotecache: enable reserve_space multiple objects

* nits

* Small changes to tracing

- Trace "Handling remote" once per batch, rather than per element

- Remote queue events also log the associated metaslab; we'll use this
  to assess the efficacy of https://github.com/microsoft/snmalloc/issues/634

* freelist builder: allow forcibly tracking length

* Try forward declaring freelist::Builder to appease macos-14

* freelist: tweak intra-slab obfuscation keys by meta address

* NFC: freelist: allow `next` to be arbitrary value

* Switch to a central, tweaked key for all free lists

* allocconfig: introduce some properties of slabs

We'll use these to pack values in message queues.

- Maximum distance between two objects in a single slab
- Maximum number of objects in a slab

* NFC: Templatize LocalCache on Config

* NFC: split dealloc_local_object_slow

We'll use the _slower form when we're just stepping a slab through
multiple rounds of state transition (to come), which can't involve
the actual memory object in question.

* NFC: make freelist::Object::T-s by placement new

* NFC: CoreAlloc: split dealloc_local_object

The pattern of `if (!fast()) { slow() }` occurs in a few places, including in
contexts where we already know the entry and so don't need to look it up.
2024-09-12 17:06:53 -04:00
Nathaniel Filardo 12f2b10122
Rework CI (#674)
- Split ubuntu and macos CI actions, even though they use very similar steps
- Remove macos-11, keep -12, and add -14
- Have all macos platforms build with and without C++17
- Remove duplicated dependency lines in ubuntu matrix entries; push this down
  to the steps
- Ensure that all added ubuntu matrix tuples have non-empty build-type
- Add all jobs to all-checks' "needs:" to ensure we wait for everything
2024-09-10 10:07:28 -04:00
Nathaniel Filardo 7fbca11527
0-length arrays in Buddy ranges (#672)
* backend_helpers: introduce NopRange

* Fix to Buddy MIN == MAX case

This fixes the 0-length arrays discussed (and made into assertion failures) in
the next commit.  This works because the Buddy's `MIN_SIZE_BITS` is instantiated
at `MIN_CHUNK_BITS`, and so we ensure that we instantiate the LargeBuddyRange
only with `max_page_chunk_size_bits` above `MIN_CHUNK_BITS`.

* Buddy range: assert that MAX > MIN

Now that the case leading to several 0-sized arrays in Buddy ranges, that then
cause gcc's -Warray-bounds to trip, has been removed, add a static assert so
that we can catch this with better error messages next time.
2024-09-09 12:25:51 -04:00
John Baldwin fcad15456b
aligned_alloc: Permit alignment values smaller than sizeof(uintptr_t) (#671)
posix_memalign() requires alignment values of at least
sizeof(uintptr_t), but aligned_alloc() does not.  memalign() regressed
to require larger alignment in commit
6cbc50fe2c.

Fixes #668
2024-08-27 22:16:19 +01:00
Matthew Parkinson 6af38acd94
Implement MCS Combining lock (#666)
* Added lambda lock primitive

* Implement MCS Combining lock

This is hybrid of Flat Combining and the MCS queue lock. It uses
the queue like the MCS queue lock, but each item additionally
contains a thunk to perform the body of the lock. This enables
other threads to perform the work than initially issued the request.

* Add a fast path flag

This update adds a fast path flag for the uncontended case. This
reduces the number of atomic operations in the uncontended case.

* CR feedback
2024-06-28 15:43:17 +01:00
Nathaniel Filardo 6bd6db5f61
Remove the SNMALLOC_USE_CXX17 C preprocessor symbol (#667)
* Revise search for no-unique-addres

* Remove SNMALLOC_USE_CXX17 as preprocessor symbol

Just test __cplusplus in the last place we were using it.
2024-06-28 10:19:32 -04:00
Matthew Parkinson 4620220080
Startup speed (#665)
* Refactor buddy allocator

Make it clearer the structure of add_block by pulling out remove_buddy.

* Give buddy a few elements so don't have to touch pagemap earlie on.

* Only use do and dont dump on pagemap

The do and dont dump calls were costings a lot during start up of snmalloc.  This reduces the times they are called to only be for the pagemap.
2024-06-26 21:34:22 +01:00
Nathaniel Filardo 835ab51863
msgpass benchmark and its refactoring dependencies (#659)
* NFC: split freelist_queue from remoteallocator

This lets us use freelists as message queues in contexts other than
the remoteallocator.  No functional change indended.

* freelist_queue: add and use destroy_and_iterate

* freelist: make backptr obfuscation key "tweakable"

* freelist: tweakable keys in forward direction, too

* test/perf/msgpass: ubench a producer-consumer app

Approximate a message-passing application as a set of producers, a set of
consumers, and a set of proxies that do both.  We'll use this for some initial
insight for https://github.com/microsoft/snmalloc/issues/634 but it seems worth
having in general.
2024-06-13 17:28:48 -04:00
Matthew Parkinson 2a7eabef6c
Configurable client meta-data (#662)
This provide a way to configure snmalloc to provide per object meta-data that is out of band. This can be used to provide different mitigations on top of snmalloc, such as storing memory tags in a compressed form, or provide a miracle pointer like feature.

This also includes a couple of TSAN fixes as it wasn't fully on in CI.
2024-06-13 09:32:07 -04:00
Matthew Parkinson 2dba088d24
Refactor new/delete overrides (#660)
Produce static library that only overrides new and delete. This allows for a static library to be added that only overrides new and delete.
2024-06-06 10:23:12 +01:00
Nathaniel Filardo 846a926155
NFC: sizeclass: differentiate minimum step size and minimum allocation sizes (#651)
* Move sizeclass debugging code to sizeclass test

The sizeclass was already testing most of this, so just add the missing bits.
Forgo some tests whose failure would have implied earlier failures.

This moves the last dynamic call of size_to_sizeclass_const into tests
(and so, too, to_exp_mant_const).  sizeclasstable.h still contains a static
call to compute NUM_SMALL_SIZECLASSES from MAX_SMALL_SIZECLASS_SIZE.

* Remove unused to_exp_mant

Only its _const sibling is used, and little at that, now that almost everything
to do with sizes and size classes is table-driven.

* test/memcpy: trap, if we can, before exiting

This just means I don't need to remember to set a breakpoint on exit

* test/memcpy: don't assume sizeclass 0 is allocable

* test/memory: don't assume sizeclass 0 is allocable

* test/sizeclass: handle nonzero minimum sizeclasses

* sizeclass: distinguish min alloc and step size

Add support for a minimum allocation size that isn't the minimum step of
the sizeclass table.

* Expose MIN_ALLOC_{,STEP}_SIZE through cmake

* test/sizeclass: report MIN_ALLOC_{STEP_,}SIZE
2024-05-24 18:49:39 +01:00
Matthew Parkinson b1d0d7dc78
Remove unnecessary assertion from aligned_alloc (#658)
With [DR460](https://open-std.org/JTC1/SC22/WG14/www/docs/summary.htm#dr_460) the undefined behaviour for size not being a multiple of alignment was remove.

The [N2072](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2072.htm) allows for this to return memory.

So we can remove this assert and keep the same conditions as memalign.
2024-05-23 16:07:01 +01:00
Javier Blazquez d9bca64426
Add Windows ARM64 support (#656)
* Add Windows ARM64 support

* Add Windows ARM64/ARM64EC CI workflows
2024-04-10 06:31:10 +01:00
Nathaniel Wesley Filardo b8e9e99cf0 NFC: Remove "Backend" from MetaEntry template arg
This parameter is in fact instantiated with FrontendSlabMetadata, so the use of
Backend here is confusing.
2024-01-05 17:29:08 +00:00
Nathaniel Wesley Filardo 24b79264df remotealloc: can_dequeue needs both domesticators
If we're running with the freelist_backward_edge mitigation turned on, then
we're going to follow the pointer, not just de-obfuscate it (in freelist's
atomic_read_next), so even if the queue heads are tame, we still need to
do this domestication.
2024-01-05 17:29:08 +00:00
Nathaniel Wesley Filardo fffc9453bc remotealloc: make can_dequeue match its tin
The introduction of the remote stub back in
https://github.com/microsoft/snmalloc/pull/604 renamed this function from
`is_empty` but did not flip the return value to match.  Do so now.
2024-01-05 17:29:08 +00:00
Nathaniel Wesley Filardo f3e470c3e4 realloc: ASSUME non-NULL source pointer for memcpy
NULL has size zero, which rounds up to zero, and so would have been handled.
2024-01-05 17:29:08 +00:00
Matthew Parkinson 640cacf90e
Updated CI workaround. (#650)
https://github.com/actions/runner-images/issues/8659
2024-01-02 10:02:27 +00:00
Matthew Parkinson 3c3739ddaf
Update NetBSD runner (#648) 2023-11-09 06:24:35 +00:00
Matthew Parkinson a781f96211
FreeBSD CI. (#647) 2023-11-08 10:15:20 +00:00
Matthew Parkinson 7d88f4d638
Workaround runner issue. (#645) 2023-11-08 10:14:33 +00:00
Matthew Parkinson f38ee89e72
Template construction of Pool elements (#641)
* Template construction of Pool elements

The Pool class is used by verona-rt.  The recent changes made this
less nice to consume as an API.

This change makes the construction logic a template parameter to the
Pool. This enables standard allocation to be used from Verona.

* Drop parameter from acquire

Pool::acquire took a list of parameters to initialise the object that it
constructed.  But if this was serviced from the pool, the parameter
would be ignored.  This is not an ideal API.

This PR removes the ability to pass a parameter.
2023-10-03 13:59:23 +00:00
Matthew Parkinson 5543347543
Startup improvements (#639)
* Benchmark for testing startup performance.

* Make pool pass spare space to pooled item

The pool will result in power of 2 allocations as it doesn't have a
local state when it is initially set up.

This commit passes this extra space to the constructor of the pooled
type, so that it can be feed into the freshly created allocator.

Co-authored-by: Nathaniel Wesley Filardo <nfilardo@microsoft.com>
2023-09-28 14:53:39 +01:00
David Chisnall 126e77f2a5
Add a default constructor to seqset nodes. (#636)
This allows them to exist as fields without invalidating the set.

This should make it possible to remove the undefined behaviour in the
creation of FrontendSlabMetadata, which is currently created via a
reinterpret_cast from a different-typed allocation.
FrontendSlabMetadata has a SeqSet::Node field that is in an unspecified
state on construction and which is valid only when inserted into a
SeqSet.
2023-09-18 10:15:49 +00:00
Matthew Parkinson 2a7670eb82
Add C++17 version for Mac. (#635) 2023-09-18 09:37:56 +01:00
Zhang 6b8f3338c7
Fix #631 (#633)
* Fix #631

Add wrapper override for the Windows variant of maloc_usable_size : _msize

Co-authored-by: Matthew Parkinson <mjp41@users.noreply.github.com>
2023-09-14 11:04:12 +00:00
Matthew A Johnson 35eef33099
Adding an option to disable TLS (#632)
Signed-off-by: Matthew A Johnson <matjoh@microsoft.com>
2023-09-13 10:17:10 +00:00
Matthew Parkinson 7b597335ae
Disable empty static prefix for Windows. (#629) 2023-08-25 14:17:04 +01:00
Matthew Parkinson c2e4a12e21
Using exclusive mode prefetch (#627)
* Using exclusive mode prefetch

The prefetching is always used to move the cache line to the current
core for writing.  This change makes it use exclusive mode prefetch
and enables it as a feature flag for x64.

* Debug platform for BSDs

* CI fixes

* More CI

* Update ARM prefetch

* Update x64 prefetch default
2023-08-09 07:15:37 +01:00
Matthew Parkinson 6cbc50fe2c
Factor out libc code into a header. (#624)
* Factor out libc code into a header.

This pulls the main definitions of the various libc malloc functions
into a header for easier use and inclusion in other projects.

* Clang-tidy fixes.

* Clang-tidy fixes really.

* More code quality changes

* Minor fix

* Clangformat
2023-08-09 07:15:09 +01:00
Matthew Parkinson 9d4466093a
Move to clang-format 15 (#621)
The current version requires clang-format-9.  This now getting hard to get.
This commit moves it to the clang-format-15, which is the latest in 22.04.

Also, updates clang-tidy to 15 as well.
2023-07-18 11:24:07 +01:00
Matthew Parkinson cdfedd8718
Update README.md (#622) 2023-07-17 15:09:36 +01:00
Matthew Parkinson dc1268886a
Improve CMake slightly (#620)
* Prefix build testing flag with SNMALLOC

* Only add clangformat target is testing enabled.
2023-06-28 11:42:19 +01:00
Matthew Parkinson 95bad423a7
Correct order of test based on #618 (#619) 2023-06-20 12:16:45 -04:00
Matthew Parkinson ce489cfffe
Conditional range (#617)
* Make a conditional range

This range allows for a contained range to be disabled at runtime.
This allows for thread local caching to be disabled if the initial fixed
size heap is below a threshold.
2023-06-20 12:00:34 -04:00