Since the callback defined in the objspace module might give up the GVL,
we need to make sure the right cr->mfd value is set back after the GVL
is re-obtained.
The previous implementation was using the pointer given
by `DATA_PTR` in all cases. But in the case of an embedded
TypedData, that pointer is garbage, we need to use RTYPEDDATA_GET_DATA
to get the proper data pointer.
Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>
This commit adds a new flag RUBY_TYPED_EMBEDDABLE that allows the data
of a TypedData object to be embedded after the object itself. This will
improve cache locality and allow us to save the 8 byte data pointer.
Co-Authored-By: Jean Boussier <byroot@ruby-lang.org>
Tracks other callinfo that references the same kwargs and frees them when all references are cleared.
[bug #19906]
Co-authored-by: Peter Zhu <peter@peterzhu.ca>
fix memory leak in vm_method
This introduces a unified reference_count to clarify who is referencing a method.
This also allows us to treat the refinement method as the def owner since it counts itself as a reference
Co-authored-by: Peter Zhu <peter@peterzhu.ca>
By compacting into slots with pinned objects first, we improve the
efficiency of compaction. As it is less likely that there will exist
pages containing only pinned objects after compaction. This will
increase the number of free pages left after compaction and enable us to
free them.
This used to be the default compaction method before it was removed
(inadvertently?) during the introduction of auto_compaction.
This commit will sort the pages by the pinned slot count at the start of
a major GC that has been triggered by explicitly calling GC.compact (and
thus setting objspace->flags.during_compaction).
It works using the same method by which we sort the heap by empty slot
count during GC.verify_compaction_references.
Previously it was only being sorted during the verify compaction
references stage - so would only happen during testing.
This commit allows us to sort the heap prior to each explicit GC.compact
run
Previously, configuring any GC event hook would cause all allocations to
go through the newobj slowpath. We should only need to do that when the
newobj specifically is subscribed to.
This renames flags.has_hook to flags.has_newobj_hook, to make this new
usage clear. newobj_of0 was the only place which previously checked this
flag.
This should help fix the following flaky test:
```
1) Failure:
TestProcess#test_warmup_frees_pages [test/ruby/test_process.rb:2751]:
<0> expected but was
<1>.
```
If we're during incremental marking, then Ruby code can execute that
deallocates certain memory buffers that have been called with
rb_gc_mark_weak, which can cause use-after-free bugs.
The term "shady object" was renamed to "uncollectible write barrier
unprotected object", so rename `has_uncollectible_shady_objects` to
`has_uncollectible_wb_unprotected_objects` for consistency.
We always sweep at least 2048 slots per sweep step, but only pool one
page. For large size pools, 2048 slots is many pages but one page is
very few slots. This commit changes it so that at least 1024 slots are
placed in the pooled pages per sweep step.
We move all pooled pages to free pages at the start of incremental
marking, so we shouldn't run incremental marking only when we have run
out of free pages. This causes incremental marking to always complete
in a single step.
If we are in a minor GC and the object to mark is old, then the old
object should already be marked and cannot be reclaimed in this GC cycle
so we don't need to add it to the weak refences list.
This is an internal only function not exposed to the C extension API.
It's only use so far is from rb_vm_mark, where it's used to mark the
values in the vm->trap_list.cmd array.
There shouldn't be any reason why these cannot move.
This commit allows them to move by updating their references during the
reference updating step of compaction.
To do this we've introduced another internal function
rb_gc_update_values as a partner to rb_gc_mark_values.
This allows us to refactor rb_gc_mark_values to not pin
The old algorithm could calculate an undercount for the initial pages
due to two issues:
1. It did not take into account that some heap pages will have one less
slot due to alignment. It assumed that every heap page would be able
to be fully filled with slots. Pages that are unaligned with the slot
size will lose one slot. The new algorithm assumes that every page
will be unaligned.
2. It performed integer division, which truncates down. This means that
the number of pages might not actually satisfy the number of slots.
This can cause the heap to grow in `gc_sweep_finish_size_pool` after
allocating all of the allocatable pages because the total number of
slots would be less than the initial configured number of slots.
This commit changes RUBY_GC_HEAP_INIT_SIZE_{40,80,160,320,640}_SLOTS to
RUBY_GC_HEAP_{0,1,2,3,4}_INIT_SLOTS. This is easier to use because the
user does not need to determine the slot sizes (which can vary between
32 and 64 bit systems). They now just use the heap names
(`GC.stat_heap.keys`).
If initial slots is set, then during a minor GC, if we have allocatable
pages but the heap is mostly full, then we will set `grow_heap` to true
since `total_slots` does not count allocatable pages so it will be less
than `init_slots`. This can cause `allocatable_pages` to grow to much
higher than desired since it will appear that the heap is mostly full.
This commit adds `free_empty_pages` which frees all empty heap pages and
moves the number of pages freed to the allocatable pages counter. This
is used in Process.warmup to improve performance because page
invalidation from copy-on-write is slower than allocating a new page.
[Feature #19783]
This commit adds stats about weak references to `GC.latest_gc_info`.
It adds the following two keys:
- `weak_references_count`: number of weak references registered during
the last GC.
- `retained_weak_references_count`: number of weak references that
survived the last GC.
[Feature #19783]
This commit adds support for weak references in the GC through the
function `rb_gc_mark_weak`. Unlike strong references, weak references
does not mark the object, but rather lets the GC know that an object
refers to another one. If the child object is freed, the pointer from
the parent object is overwritten with `Qundef`.
Co-Authored-By: Jean Boussier <byroot@ruby-lang.org>
This commit adds key force_incremental_marking_finish_count to
GC.stat_heap. This statistic returns the number of times the size pool
has forced incremental marking to finish due to running out of slots.
Functions rgengc_remembered, rgengc_remembered_sweep, and
rgengc_remembersetbits_get are just wrappers of RVALUE_REMEMBERED and
doesn't do much more. We can remove all those and use RVALUE_REMEMBERED
directly instead.
We don't need to check stack for moved objects after compaction because
the mutator cannot run between marking the stack and the end of
compaction. However, the stack may have moved objects leftover from
marking and sweeping phases. This means that their pages will be
invalidated and all objects moved back. We don't need to move these
objects back.
This also fixes the issue on Windows where some compaction tests
sometimes fail due to the page of the object being invalidated.
This commit stores the initial slots per size pool, configured with
the environment variables `RUBY_GC_HEAP_INIT_SIZE_%d_SLOTS`. This
ensures that the configured initial slots remains a low bound for the
number of slots in the heap, which can prevent heaps from thrashing in
size.
From Ruby 3.0, refined method invocations are slow because
resolved methods are not cached by inline cache because of
conservertive strategy. However, `using` clears all caches
so that it seems safe to cache resolved method entries.
This patch caches resolved method entries in inline cache
and clear all of inline method caches when `using` is called.
fix [Bug #18572]
```ruby
# without refinements
class C
def foo = :C
end
N = 1_000_000
obj = C.new
require 'benchmark'
Benchmark.bm{|x|
x.report{N.times{
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
}}
}
_END__
user system total real
master 0.362859 0.002544 0.365403 ( 0.365424)
modified 0.357251 0.000000 0.357251 ( 0.357258)
```
```ruby
# with refinment but without using
class C
def foo = :C
end
module R
refine C do
def foo = :R
end
end
N = 1_000_000
obj = C.new
require 'benchmark'
Benchmark.bm{|x|
x.report{N.times{
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
}}
}
__END__
user system total real
master 0.957182 0.000000 0.957182 ( 0.957212)
modified 0.359228 0.000000 0.359228 ( 0.359238)
```
```ruby
# with using
class C
def foo = :C
end
module R
refine C do
def foo = :R
end
end
N = 1_000_000
using R
obj = C.new
require 'benchmark'
Benchmark.bm{|x|
x.report{N.times{
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
}}
}
`cc->klass` and `cc->cme_` can be free'ed while last marking
so that it should be checked bofore updating the pointers.
Note that `T_MOVED` is living, but `is_live_object()` returns false.
cc is callcache.
cc->klass (klass) should not be marked because if the klass is
free'ed, the cc->klass will be cleared by `vm_cc_invalidate()`.
cc->cme (cme) should not be marked because if cc is invalidated
when cme is free'ed.
- klass marks cme if klass uses cme.
- caller classe's ccs->cme marks cc->cme.
- if cc is invalidated (klass doesn't refer the cc),
cc is invalidated by `vm_cc_invalidate()` and cc->cme is
not be accessed.
- On the multi-Ractors, cme will be collected with global GC
so that it is safe if GC is not interleaving while accessing
cc and cme.
fix [Bug #19436]
```ruby
10_000.times{|i|
# p i if (i%1_000) == 0
str = "x" * 1_000_000
def str.foo = nil
eval "def call#{i}(s) = s.foo"
send "call#{i}", str
}
```
Without this patch:
```
real 1m5.639s
user 0m6.637s
sys 0m58.292s
```
and with this patch:
```
real 0m2.045s
user 0m1.627s
sys 0m0.164s
```
This both save time for when it will be eventually needed,
and avoid mutating heap pages after a potential fork.
Instrumenting some large Rails app, I've witnessed up to
58% of String instances having their coderange still unknown.
[Feature #18885]
For now, the optimizations performed are:
- Run a major GC
- Compact the heap
- Promote all surviving objects to oldgen
Other optimizations may follow.
Closes [Feature #19729]
Previously 2 bits of the flags on each RVALUE are reserved to store the
number of GC cycles that each object has survived. This commit
introduces a new bit array on the heap page, called age_bits, to store
that information instead.
This patch still reserves one of the age bits in the flags (the old
FL_PROMOTED0 bit, now renamed FL_PROMOTED).
This is set to 0 for young objects and 1 for old objects, and is used as
a performance optimisation for the write barrier. Fetching the age_bits
from the heap page and doing the required math to calculate if the
object was old or not would slow down the write barrier. So we keep this
bit synced in the flags for fast access.
According to the C99 specification section 7.20.3.2 paragraph 2:
> If ptr is a null pointer, no action occurs.
So we do not need to check that the pointer is a null pointer.
This reverts commit 10621f7cb9.
This was reverted because the gc integrity build started failing. We
have figured out a fix so I'm reopening the PR.
Original commit message:
Fix cvar caching when class is cloned
The class variable cache that was added in
ruby#4544 changed the behavior of class
variables on cloned classes. As reported when a class is cloned AND a
class variable was set, and the class variable was read from the
original class, reading a class variable from the cloned class would
return the value from the original class.
This was happening because the IC (inline cache) is stored on the ISEQ
which is shared between the original and cloned class, therefore they
share the cache too.
To fix this we are now storing the `cref` in the cache so that we can
check if it's equal to the current `cref`. If it's different we don't
want to read from the cache. If it's the same we do. Cloned classes
don't share the same cref with their original class.
This will need to be backported to 3.1 in addition to 3.2 since the bug
exists in both versions.
We also added a marking function which was missing.
Fixes [Bug #19379]
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
The class variable cache that was added in
https://github.com/ruby/ruby/pull/4544 changed the behavior of class
variables on cloned classes. As reported when a class is cloned AND a
class variable was set, and the class variable was read from the
original class, reading a class variable from the cloned class would
return the value from the original class.
This was happening because the IC (inline cache) is stored on the ISEQ
which is shared between the original and cloned class, therefore they
share the cache too.
To fix this we are now storing the `cref` in the cache so that we can
check if it's equal to the current `cref`. If it's different we don't
want to read from the cache. If it's the same we do. Cloned classes
don't share the same cref with their original class.
This will need to be backported to 3.1 in addition to 3.2 since the bug
exists in both versions.
We also added a marking function which was missing.
Fixes [Bug #19379]
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
[Feature #19678]
References from an old object to a write barrier protected young object
will not immediately promote the young object. Instead, the young object
will age just like any other object, meaning that it has to survive
three collections before being promoted to the old generation.
References from an old object to a write barrier unprotected object will
place the parent object in the remember set for marking during minor
collections. This allows the child object to be reclaimed in minor
collections at the cost of increased time for minor collections.
On one of [Shopify's highest traffic Ruby apps, Storefront
Renderer](https://shopify.engineering/how-shopify-reduced-storefront-response-times-rewrite),
we saw significant improvements after deploying this feature in
production. We compare the GC time and response time of web workers that
have the original behaviour (non-experimental group) and this new
behaviour (experimental group). We see that with this feature we spend
significantly less time in the GC, 0.81x on average, 0.88x on p99, and
0.45x on p99.9.
This translates to improvements in average response time (0.96x) and p99
response time (0.92x).
[Feature #19571]
This commit adds the environment variable
`RUBY_GC_HEAP_REMEMBERED_WB_UNPROTECTED_OBJECTS_LIMIT_RATIO` which is
used to calculate the `remembered_wb_unprotected_objects_limit` using a
ratio of `old_objects`. This should improve performance by reducing
major GC because, in a major GC, we mark all of the old objects, so we
should have more uncollectible WB unprotected objects before starting a
major GC. The default has been set to 0.01 (1% of old objects).
On one of [Shopify's highest traffic Ruby apps, Storefront Renderer](https://shopify.engineering/how-shopify-reduced-storefront-response-times-rewrite),
we saw significant improvements after deploying this patch in
production. In the graphs below, we have the `tuned` group which uses
`RUBY_GC_HEAP_REMEMBERED_WB_UNPROTECTED_OBJECTS_LIMIT_RATIO=0.01` (the
default value), and an `untuned` group, which turns this feature off
with `RUBY_GC_HEAP_REMEMBERED_WB_UNPROTECTED_OBJECTS_LIMIT_RATIO=0`. We
see that the tuned group spends significantly less time in GC, on
average 0.67x of the time compared to the untuned group and 0.49x for
p99. We see this improvement in GC time translate to improvements in
response times. The average response time is now 0.96x of the time
compared to the untuned group and 0.86x for p99.
https://user-images.githubusercontent.com/15860699/229559078-e23e8ce4-5f1f-4a2f-b5ef-5769f92b8c70.png
[Bug #19584]
Some C extensions pass a pointer to a global variable to
rb_gc_register_address. However, if a GC is triggered inside of
rb_gc_register_address, then the object could get swept since it does
not exist on the stack.
[Bug #19580]
The real-world scenario motivating this change is libxml2's pthread
code which uses `pthread_key_create` to set up a destructor that is
called at thread exit to free thread-local storage.
There is a small window of time -- after ruby_vm_destruct but before
the process exits -- in which a pthread may exit and the destructor is
called, leading to a segfault.
Please note that this window of time may be relatively large if
`atexit` is being used.
Remove !USE_RVARGC code
[Feature #19579]
The Variable Width Allocation feature was turned on by default in Ruby
3.2. Since then, we haven't received bug reports or backports to the
non-Variable Width Allocation code paths, so we assume that nobody is
using it. We also don't plan on maintaining the non-Variable Width
Allocation code, so we are going to remove it.