Граф коммитов

5 Коммитов

Автор SHA1 Сообщение Дата
Matthew Parkinson d900e29424
Improve slow path performance for allocation (#143)
* Remote dealloc refactor.

* Improve remote dealloc

Change remote to count down to 0, so fast path does not need a constant.

Use signed value so that branch does not depend on addition.

* Inline remote_dealloc

The fast path of remote_dealloc is sufficiently compact that it can be
inlined.

* Improve fast path in Slab::alloc

Turn the internal structure into tail calls, to improve fast path.
Should be no algorithmic changes.

* Refactor initialisation to help fast path.

Break lazy initialisation into two functions, so it is easier to codegen
fast paths.

* Minor tidy to statically sized dealloc.

* Refactor semi-slow path for alloc

Make the backup path a bit faster.  Only algorithmic change is to delay
checking for first allocation. Otherwise, should be unchanged.

* Test initial operation of a thread

The first operation a new thread takes is special.  It results in
allocating an allocator, and swinging it into the TLS.  This makes
this a very special path, that is rarely tested.  This test generates
a lot of threads to cover the first alloc and dealloc operations.

* Correctly handle reusing get_noncachable

* Fix large alloc stats

Large alloc stats aren't necessarily balanced on a thread, this changes
to tracking individual pushs and pops, rather than the net effect
(with an unsigned value).

* Fix TLS init on large alloc path

* Add Bump ptrs to allocator

Each allocator has a bump ptr for each size class.  This is no longer
slab local.

Slabs that haven't been fully allocated no longer need to be in the DLL
for this sizeclass.

* Change to a cycle non-empty list

This change reduces the branching in the case of finding a new free
list. Using a non-empty cyclic list enables branch free add, and a
single branch in remove to detect the empty case.

* Update differences

* Rename first allocation

Use needs initialisation as makes more sense for other scenarios.

* Use a ptrdiff to help with zero init.

* Make GlobalPlaceholder zero init

The GlobalPlaceholder allocator is now a zero init block of memory.
This removes various issues for when things are initialised. It is made read-only
to we detect write to it on some platforms.
2020-03-31 09:17:53 +01:00
Matthew Parkinson 0453e43cc5 Typo 2019-07-01 15:23:01 +01:00
Matthew Parkinson 1b26c6d163 Updated differences 2019-07-01 15:11:39 +01:00
Paul Liétar 37d8fcf47b Add link to difference.md from README 2019-05-23 18:49:19 +01:00
Matthew Parkinson 7de46182a8 Used start of a slab for link. 2019-05-21 09:47:23 +01:00