Граф коммитов

2076 Коммитов

Автор SHA1 Сообщение Дата
Eric Wong a19b2d59fc ruby_gc_set_params: update malloc_limit when env is set
During VM startup, rb_objspace_alloc sets malloc_limit
(objspace->malloc_params.limit) before ruby_gc_set_params is called, thus
nullifying the effect of RUBY_GC_MALLOC_LIMIT before the initial GC run.

The call sequence is as follows:

  main.c::main()
    ruby_init
      ruby_setup
        Init_BareVM
          rb_objspace_alloc // malloc_limit = gc_params.malloc_limit_min;
    ruby_options
      ruby_process_options
        process_options
          ruby_gc_set_params // RUBY_GC_MALLOC_LIMIT => gc_params.malloc_limit_min

With ruby_gc_set_params setting malloc_limit, RUBY_GC_MALLOC_LIMIT
affects the process sooner.

[ruby-core:107170]
2022-04-04 21:46:02 +00:00
Peter Zhu ea9c09a92c Disable mmap on WASM
WASM does not have proper support for mmap.
2022-04-04 09:27:14 -04:00
Peter Zhu c482ee4025 Make heap page sizes 64KiB by default
Commit dde164e968 decoupled incremental
marking from page sizes. This commit changes Ruby heap page sizes to
64KiB. Doing so will have several benefits:

1. We can use compaction on systems with 64KiB system page sizes (e.g.
   PowerPC).
2. Larger page sizes will allow Variable Width Allocation to increase
   slot sizes and embed larger objects.
3. Since commit 002fa28599, macOS has 64
   KiB pages. Making page sizes 64 KiB will bring these systems to
   parity.

I have attached some bechmark results below.

Discourse:
    On Discourse, we saw much better p99 performance (e.g. for "categories"
    it went from 214ms on master to 134ms on branch, for "home" it went
    from 265ms to 251ms). We don’t see much change in p60, p75, and p90
    performance. We also see a slight decrease in memory usage by 1.04x.

    Branch RSS: 354.9MB
    Master RSS: 368.2MB

railsbench:
    On rails bench, we don’t see a big change in RPS or p99
    performance. We don’t see a big difference in memory usage.

    Branch RPS: 826.27
    Master RPS: 824.85

    Branch p99: 1.67
    Master p99: 1.72

    Branch RSS: 88.72MB
    Master RSS: 88.48MB

liquid:
    We don’t see a significant change in liquid performance.

    Branch parse & render: 28.653 I/s
    Master parse & render: 28.563 i/s
2022-04-04 09:27:14 -04:00
Matt Valentine-House 651b832c1b extract magic number from gc_sweep_step 2022-04-01 10:52:18 -04:00
Peter Zhu fe21b7794a Use mmap for heap page allocation only
Currently, rb_aligned_malloc uses mmap if Ruby heap pages can be
allocated through mmap (when system heap page size <= Ruby heap page
size). If Ruby heap page sizes is increased to 64KiB, then mmap will
be used on systems with 64KiB system page sizes. However, the transient
heap also uses rb_aligned_malloc and requires 32KiB alignment. This
would break in the current implementation since it would allocate sizes
through mmap that is not a multiple of the system page size.

This commit adds heap_page_body_allocate which will use mmap when
possible and changes rb_aligned_malloc to not use mmap (and only
use posix_memalign).
2022-04-01 10:27:18 -04:00
Matt Valentine-House d8352ff3ac [Feature #18619] remove FL_FROM_FREELIST 2022-04-01 08:45:52 -04:00
Matt Valentine-House c26a85fc96 [Feature #18619] Remove redundant compaction path 2022-04-01 08:45:52 -04:00
Matt Valentine-House 76572e5a7f [Feature #18619] Reverse the order of compaction movement
This commit changes the way compaction moves objects and sweeps pages in
order to better facilitate object movement between size pools.

Previously we would move the scan cursor first until we found an empty
slot and then we'd decrement the compact cursor until we found something
to move into that slot. We would sweep the page that contained the scan
cursor before trying to fill it

In this algorithm we first move the compact cursor down until we find an
object to move - We then take a free page from the desired destination
heap (always the same heap in this current iteration of the code).

If there is no free page we sweep the page at the sweeping_page cursor,
add it to the free pages, and advance the cursor to the next page, and
try again.

We sweep one page from each size pool in this way, and then repeat that
process until all the size pools are compacted (all the cursors have
met), and then we update references and sweep the rest of the heap.
2022-04-01 08:45:52 -04:00
Matt Valentine-House bb037f6d86 Remove hard-coded swept slots threshold 2022-03-31 14:39:59 -04:00
Peter Zhu dde164e968 Decouple incremental marking step from page sizes
Currently, the number of incremental marking steps is calculated based
on the number of pooled pages available. This means that if we make Ruby
heap pages larger, it would run fewer incremental marking steps (which
would mean each incremental marking step takes longer).

This commit changes incremental marking to run after every
INCREMENTAL_MARK_STEP_ALLOCATIONS number of allocations. This means that
the behaviour of incremental marking remains the same regardless of the
Ruby heap page size.

I've benchmarked against discourse benchmarks and did not get a
significant change in response times beyond the margin of error. This is
expected as this new incremental marking algorithm behaves very
similarly to the previous one.
2022-03-30 09:33:17 -04:00
Nobuyoshi Nakada 42a0bed351
Prefix ccan headers (#4568)
* Prefixed ccan headers

* Remove unprefixed names in ccan/build_assert

* Remove unprefixed names in ccan/check_type

* Remove unprefixed names in ccan/container_of

* Remove unprefixed names in ccan/list

Co-authored-by: Samuel Williams <samuel.williams@oriontransfer.co.nz>
2022-03-30 20:36:31 +13:00
Peter Zhu ae650f0372 Remove unneeded function declarations in gc.c 2022-03-28 10:02:45 -04:00
Peter Zhu 5f10bd634f Add ISEQ_BODY macro
Use ISEQ_BODY macro to get the rb_iseq_constant_body of the ISeq. Using
this macro will make it easier for us to change the allocation strategy
of rb_iseq_constant_body when using Variable Width Allocation.
2022-03-24 10:03:51 -04:00
John Hawthorn 19f331f588 Dedup superclass array in leaf sibling classes
Previously, we would build a new `superclasses` array for each class,
even though for all immediate subclasses of a class, the array is
identical.

This avoids duplicating the arrays on leaf classes (those without
subclasses) by calculating and storing a "superclasses including self"
array on a class when it's first inherited and sharing that among all
superclasses.

An additional trick used is that the "superclass array including self"
is valid as "self"'s superclass array. It just has it's own class at the
end. We can use this to avoid an extra pointer of storage and can use
one bit of a flag to track that we've "upgraded" the array.
2022-03-03 11:23:27 -08:00
John Hawthorn b13a7c8e36 Constant time class to class ancestor lookup
Previously when checking ancestors, we would walk all the way up the
ancestry chain checking each parent for a matching class or module.

I believe this was especially unfriendly to CPU cache since for each
step we need to check two cache lines (the class and class ext).

This check is used quite often in:
* case statements
* rescue statements
* Calling protected methods
* Class#is_a?
* Module#===
* Module#<=>

I believe it's most common to check a class against a parent class, to
this commit aims to improve that (unfortunately does not help checking
for an included Module).

This is done by storing on each class the number and an array of all
parent classes, in order (BasicObject is at index 0). Using this we can
check whether a class is a subclass of another in constant time since we
know the location to expect it in the hierarchy.
2022-02-23 19:57:42 -08:00
Peter Zhu 71afa8164d Change darray size to size_t and add functions that use GC malloc
Changes size and capacity of darray to size_t to support more
elements.

Adds functions to darray that use GC allocation functions.
2022-02-16 09:50:29 -05:00
Koichi Sasada 1ae630db26 `wmap#each` should check liveness of keys
`ObjectSpace::WeakMap#each*` should check key's liveness.
fix [Bug #18586]
2022-02-16 13:31:46 +09:00
Koichi Sasada 76e594d515 fix GC event synchronization
(1) gc_verify_internal_consistency() use barrier locking
    for consistency while `during_gc == true` at the end
    of the sweep on `RGENGC_CHECK_MODE >= 2`.
(2) `rb_objspace_reachable_objects_from()` is called without
    VM synchronization and it checks `during_gc != true`.

So (1) and (2) causes BUG because of `during_gc == true`.
To prevent this error, wait for VM barrier on `during_gc == false`
and introduce VM locking on `rb_objspace_reachable_objects_from()`.

http://ci.rvm.jp/results/trunk-asserts@phosphorus-docker/3830088
2022-02-14 17:17:55 +09:00
Peter Zhu 2617532499 Free cached mark stack chunks when freeing objspace
Cached mark stack chunks should also be freed when freeing objspace.
2022-02-10 09:33:42 -05:00
Peter Zhu af321ea727 Move total_freed_pages to size pool 2022-02-03 15:06:55 -05:00
Peter Zhu a9221406aa Move total_allocated_pages to size pool 2022-02-03 15:06:55 -05:00
Peter Zhu 424374d330 Fix case when gc_marks_continue does not yield slots
gc_marks_continue will start sweeping when it finishes marking. However,
if the heap we are trying to allocate into is full, then the sweeping
may not yield any free slots. If we don't call gc_sweep_continue
immediate after this, then another GC will be started halfway during
lazy sweeping. gc_sweep_continue will either grow the heap or finish
sweeping.
2022-02-03 09:22:24 -05:00
Peter Zhu 7b77d46671 Decouple GC slot sizes from RVALUE
Add a new macro BASE_SLOT_SIZE that determines the slot size.

For Variable Width Allocation (compiled with USE_RVARGC=1), all slot
sizes are powers-of-2 multiples of BASE_SLOT_SIZE.

For USE_RVARGC=0, BASE_SLOT_SIZE is set to sizeof(RVALUE).
2022-02-02 09:52:04 -05:00
Peter Zhu 605f226142 Fix heap page iteration in gc_verify_heap_page
The for loops are not correctly iterating heap pages in
gc_verify_heap_page.
2022-01-31 09:42:20 -05:00
Nobuyoshi Nakada 67f4729ff0
[Bug#18556] Fallback `MAP_ ANONYMOUS`
Define `MAP_ANONYMOUS` to `MAP_ANON` if undefined on old systems.
2022-01-29 19:07:38 +09:00
Peter Zhu e714163011 Fix typo in assertion in gc.c 2022-01-26 09:45:22 -05:00
Nobuyoshi Nakada 16e7585557
Unpoison the cached object in the exact size 2022-01-26 14:34:25 +09:00
Peter Zhu 82f0580aa4 Call rb_id_table_foreach_values instead
These places never replace the value, so call rb_id_table_foreach_values
instead of rb_id_table_foreach_values_with_replace.
2022-01-25 16:51:16 -05:00
Peter Zhu 4d9ad91a35 Rename rb_id_table_foreach_with_replace
Renames rb_id_table_foreach_with_replace to
rb_id_table_foreach_values_with_replace and passes only the value to the
callback. We can use this in GC compaction when we cannot access the
global symbol array.
2022-01-25 16:51:16 -05:00
Peter Zhu b07879e553 Remove redundant if statement in try_move
The if statement is redundant since if `index == 0` then
`BITS_BITLENGTH * index == 0`.
2022-01-25 09:38:17 -05:00
Peter Zhu 87784fdeb2 Keep right operand within width when right shifting
NUM_IN_PAGE could return a value much larger than 64. According to the
C11 spec 6.5.7 paragraph 3 this is undefined behavior:

> If the value of the right operand is negative or is greater than or
> equal to the width of the promoted left operand, the behavior is
> undefined.

On most platforms, this is usually not a problem as the architecture
will mask off all out-of-range bits.
2022-01-24 14:34:12 -05:00
Peter Zhu 663833b08f [wasm] Disallow compaction
WebAssembly doesn't support signals so we can't use read
barriers so we can't use compaction.
2022-01-24 09:21:08 -05:00
Nobuyoshi Nakada 8f3e29c849
Fix format size qualifier on IL32P64 2022-01-19 13:33:14 +09:00
Yuta Saito bf1c4d254b [wasm] gc.c: scan wasm locals and c stack to mark living objects
WebAssembly has function local infinite registers and stack values, but
there is no way to scan the values in a call stack for now.
This implementation uses Asyncify to spilling out wasm locals into
linear memory.
2022-01-19 11:19:06 +09:00
Yuta Saito e7fb1fa041 [wasm] gc.c: disable read signal barrier for wasi
WASI currently does not yet support signal
2022-01-19 11:19:06 +09:00
Yuta Saito 23de01c7aa [wasm] eval_inter.h gc.c vm_core.h: include wasm/setjmp.h instead of sysroot header 2022-01-19 11:19:06 +09:00
Peter Zhu 6b7eff9086 Separately allocate class_serial on 32-bit systems
On 32-bit systems, VWA causes class_serial to not be aligned (it only
guarantees 4 byte alignment but class_serial is 8 bytes and requires 8
byte alignment). This commit uses a hack to allocate class_serial
through malloc. Once VWA allocates with 8 byte alignment in the future,
we will revert this commit.
2022-01-14 14:36:33 -05:00
Peter Zhu d9ef711f29 Improve string info in rb_raw_obj_info
Improve rb_raw_obj_info to output additional into about strings
including the length, capacity, and whether or not it is embedded.
2022-01-07 14:22:32 -05:00
Peter Zhu 6f7e02bf46 Remove assertion causing read barrier to trigger
GET_HEAP_PAGE reads the page. If during compaction there is a read
barrier on the page, it causes the read barrier to trigger.
2022-01-05 09:32:53 -05:00
Matt Valentine-House ad007bc6ea Switch `is_pointer_to_heap` to use library bsearch
This commit switches from a custom implemented bsearch algorithm to
use the one provided by the C standard library.

Because `is_pointer_to_heap` will only return true if the pointer
being searched for is a valid slot starting address within the heap
page body, we've extracted the bsearch call site into a more general
function so we can use it elsewhere.

The new function `heap_page_for_ptr` returns the heap page for any heap
page pointer, regardless of whether that is at the start of a slot or
in the middle of one.

We then use this function as the basis of `is_pointer_to_heap`.
2022-01-04 10:27:46 -05:00
Peter Zhu 615e9b2865 [Feature #18364] Add GC.stat_heap to get stats for memory heaps
GC.stat_heap will return stats for memory heaps. This is
used for the Variable Width Allocation feature.
2022-01-04 09:46:36 -05:00
Nobuyoshi Nakada 069cca6f74
Negative RBOOL usage 2022-01-01 17:02:04 +09:00
Nobuyoshi Nakada 002fa28599 On 64bit macOS, enlarge heap pages to reduce mmap calls [Bug #18447] 2021-12-29 20:53:43 +09:00
Nobuyoshi Nakada 7c738ce5e6
Remove deprecate rb_cData [Bug #18433]
Also enable the warning for T_DATA allocator.
2021-12-26 23:28:54 +09:00
Koichi Sasada 2da53b1468 `finalize_deferred` doesn't need VM lock
`finalize_list()` acquires VM lock to manipulate objspace state.
2021-12-23 16:50:17 +09:00
Koichi Sasada ca032d5eea undef `rb_vm_lookup_overloaded_cme()`
Some callable method entries (cme) can be a key of `overloaded_cme_table`
and the keys should be pinned because the table is numtable (VALUE is a key).
Before the patch GC checks the cme is in `overloaded_cme_table` by looking up
the table, but it needs VM locking.

It works well in normal GC marking because it is protected by the VM lock,
but it doesn't work on `rb_objspace_reachable_objects_from` because it doesn't
use VM lock.

Now, the number of target cmes are small enough, I decide to pin down
all possible cmes instead of using looking up the table.
2021-12-23 16:49:49 +09:00
Koichi Sasada ad450c9fe5 make `overloaded_cme_table` truly weak key map
`overloaded_cme_table` keeps cme -> monly_cme pairs to manage
corresponding `monly_cme` for `cme`. The lifetime of the `monly_cme`
should be longer than `monly_cme`, but the previous patch losts the
reference to the living `monly_cme`.

Now `overloaded_cme_table` values are always root (keys are only weak
reference), it means `monly_cme` does not freed until corresponding
`cme` is invalidated.

To make managing easy, move `overloaded_cme_table` to `rb_vm_t`.
2021-12-21 15:21:30 +09:00
Koichi Sasada df48db987d `mandatory_only_cme` should not be in `def`
`def` (`rb_method_definition_t`) is shared by multiple callable
method entries (cme, `rb_callable_method_entry_t`).

There are two issues:

* old -> young reference: `cme1->def->mandatory_only_cme = monly_cme`
  if `cme1` is young and `monly_cme` is young, there is no problem.
  Howevr, another old `cme2` can refer `def`, in this case, old `cme2`
  points young `monly_cme` and it violates gengc assumption.
* cme can have different `defined_class` but `monly_cme` only has
  one `defined_class`. It does not make sense and `monly_cme`
  should be created for a cme (not `def`).

To solve these issues, this patch allocates `monly_cme` per `cme`.
`cme` does not have another room to store a pointer to the `monly_cme`,
so this patch introduces `overloaded_cme_table`, which is weak key map
`[cme] -> [monly_cme]`.

`def::body::iseqptr::monly_cme` is deleted.

The first issue is reported by Alan Wu.
2021-12-21 11:03:09 +09:00
Alan Wu 39cf0b5314
Show whether object is garbage in rb_raw_obj_info()
When using `rp(obj)` for debugging during development, it may be
useful to know that an object is soon to be swept. Add a new letter to
the object dump for whether the object is garbage. It's easy to forget
about lazy sweep.
2021-12-20 16:13:34 -05:00
Peter Zhu 0e7d073914 Remove compaction support detection using sysconf
Except on Windows and MinGW, we can only use compaction on systems that
use mmap (only systems that use mmap can use the read barrier that
compaction requires). We don't need to separately detect whether we can
support compaction or not.
2021-12-14 09:16:18 -05:00