By compacting into slots with pinned objects first, we improve the
efficiency of compaction. As it is less likely that there will exist
pages containing only pinned objects after compaction. This will
increase the number of free pages left after compaction and enable us to
free them.
This used to be the default compaction method before it was removed
(inadvertently?) during the introduction of auto_compaction.
This commit will sort the pages by the pinned slot count at the start of
a major GC that has been triggered by explicitly calling GC.compact (and
thus setting objspace->flags.during_compaction).
It works using the same method by which we sort the heap by empty slot
count during GC.verify_compaction_references.
Previously it was only being sorted during the verify compaction
references stage - so would only happen during testing.
This commit allows us to sort the heap prior to each explicit GC.compact
run
Previously, configuring any GC event hook would cause all allocations to
go through the newobj slowpath. We should only need to do that when the
newobj specifically is subscribed to.
This renames flags.has_hook to flags.has_newobj_hook, to make this new
usage clear. newobj_of0 was the only place which previously checked this
flag.
This should help fix the following flaky test:
```
1) Failure:
TestProcess#test_warmup_frees_pages [test/ruby/test_process.rb:2751]:
<0> expected but was
<1>.
```
If we're during incremental marking, then Ruby code can execute that
deallocates certain memory buffers that have been called with
rb_gc_mark_weak, which can cause use-after-free bugs.
The term "shady object" was renamed to "uncollectible write barrier
unprotected object", so rename `has_uncollectible_shady_objects` to
`has_uncollectible_wb_unprotected_objects` for consistency.
We always sweep at least 2048 slots per sweep step, but only pool one
page. For large size pools, 2048 slots is many pages but one page is
very few slots. This commit changes it so that at least 1024 slots are
placed in the pooled pages per sweep step.
We move all pooled pages to free pages at the start of incremental
marking, so we shouldn't run incremental marking only when we have run
out of free pages. This causes incremental marking to always complete
in a single step.
If we are in a minor GC and the object to mark is old, then the old
object should already be marked and cannot be reclaimed in this GC cycle
so we don't need to add it to the weak refences list.
This is an internal only function not exposed to the C extension API.
It's only use so far is from rb_vm_mark, where it's used to mark the
values in the vm->trap_list.cmd array.
There shouldn't be any reason why these cannot move.
This commit allows them to move by updating their references during the
reference updating step of compaction.
To do this we've introduced another internal function
rb_gc_update_values as a partner to rb_gc_mark_values.
This allows us to refactor rb_gc_mark_values to not pin
The old algorithm could calculate an undercount for the initial pages
due to two issues:
1. It did not take into account that some heap pages will have one less
slot due to alignment. It assumed that every heap page would be able
to be fully filled with slots. Pages that are unaligned with the slot
size will lose one slot. The new algorithm assumes that every page
will be unaligned.
2. It performed integer division, which truncates down. This means that
the number of pages might not actually satisfy the number of slots.
This can cause the heap to grow in `gc_sweep_finish_size_pool` after
allocating all of the allocatable pages because the total number of
slots would be less than the initial configured number of slots.
This commit changes RUBY_GC_HEAP_INIT_SIZE_{40,80,160,320,640}_SLOTS to
RUBY_GC_HEAP_{0,1,2,3,4}_INIT_SLOTS. This is easier to use because the
user does not need to determine the slot sizes (which can vary between
32 and 64 bit systems). They now just use the heap names
(`GC.stat_heap.keys`).
If initial slots is set, then during a minor GC, if we have allocatable
pages but the heap is mostly full, then we will set `grow_heap` to true
since `total_slots` does not count allocatable pages so it will be less
than `init_slots`. This can cause `allocatable_pages` to grow to much
higher than desired since it will appear that the heap is mostly full.
This commit adds `free_empty_pages` which frees all empty heap pages and
moves the number of pages freed to the allocatable pages counter. This
is used in Process.warmup to improve performance because page
invalidation from copy-on-write is slower than allocating a new page.
[Feature #19783]
This commit adds stats about weak references to `GC.latest_gc_info`.
It adds the following two keys:
- `weak_references_count`: number of weak references registered during
the last GC.
- `retained_weak_references_count`: number of weak references that
survived the last GC.
[Feature #19783]
This commit adds support for weak references in the GC through the
function `rb_gc_mark_weak`. Unlike strong references, weak references
does not mark the object, but rather lets the GC know that an object
refers to another one. If the child object is freed, the pointer from
the parent object is overwritten with `Qundef`.
Co-Authored-By: Jean Boussier <byroot@ruby-lang.org>
This commit adds key force_incremental_marking_finish_count to
GC.stat_heap. This statistic returns the number of times the size pool
has forced incremental marking to finish due to running out of slots.
Functions rgengc_remembered, rgengc_remembered_sweep, and
rgengc_remembersetbits_get are just wrappers of RVALUE_REMEMBERED and
doesn't do much more. We can remove all those and use RVALUE_REMEMBERED
directly instead.
We don't need to check stack for moved objects after compaction because
the mutator cannot run between marking the stack and the end of
compaction. However, the stack may have moved objects leftover from
marking and sweeping phases. This means that their pages will be
invalidated and all objects moved back. We don't need to move these
objects back.
This also fixes the issue on Windows where some compaction tests
sometimes fail due to the page of the object being invalidated.
This commit stores the initial slots per size pool, configured with
the environment variables `RUBY_GC_HEAP_INIT_SIZE_%d_SLOTS`. This
ensures that the configured initial slots remains a low bound for the
number of slots in the heap, which can prevent heaps from thrashing in
size.
From Ruby 3.0, refined method invocations are slow because
resolved methods are not cached by inline cache because of
conservertive strategy. However, `using` clears all caches
so that it seems safe to cache resolved method entries.
This patch caches resolved method entries in inline cache
and clear all of inline method caches when `using` is called.
fix [Bug #18572]
```ruby
# without refinements
class C
def foo = :C
end
N = 1_000_000
obj = C.new
require 'benchmark'
Benchmark.bm{|x|
x.report{N.times{
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
}}
}
_END__
user system total real
master 0.362859 0.002544 0.365403 ( 0.365424)
modified 0.357251 0.000000 0.357251 ( 0.357258)
```
```ruby
# with refinment but without using
class C
def foo = :C
end
module R
refine C do
def foo = :R
end
end
N = 1_000_000
obj = C.new
require 'benchmark'
Benchmark.bm{|x|
x.report{N.times{
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
}}
}
__END__
user system total real
master 0.957182 0.000000 0.957182 ( 0.957212)
modified 0.359228 0.000000 0.359228 ( 0.359238)
```
```ruby
# with using
class C
def foo = :C
end
module R
refine C do
def foo = :R
end
end
N = 1_000_000
using R
obj = C.new
require 'benchmark'
Benchmark.bm{|x|
x.report{N.times{
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
}}
}
`cc->klass` and `cc->cme_` can be free'ed while last marking
so that it should be checked bofore updating the pointers.
Note that `T_MOVED` is living, but `is_live_object()` returns false.
cc is callcache.
cc->klass (klass) should not be marked because if the klass is
free'ed, the cc->klass will be cleared by `vm_cc_invalidate()`.
cc->cme (cme) should not be marked because if cc is invalidated
when cme is free'ed.
- klass marks cme if klass uses cme.
- caller classe's ccs->cme marks cc->cme.
- if cc is invalidated (klass doesn't refer the cc),
cc is invalidated by `vm_cc_invalidate()` and cc->cme is
not be accessed.
- On the multi-Ractors, cme will be collected with global GC
so that it is safe if GC is not interleaving while accessing
cc and cme.
fix [Bug #19436]
```ruby
10_000.times{|i|
# p i if (i%1_000) == 0
str = "x" * 1_000_000
def str.foo = nil
eval "def call#{i}(s) = s.foo"
send "call#{i}", str
}
```
Without this patch:
```
real 1m5.639s
user 0m6.637s
sys 0m58.292s
```
and with this patch:
```
real 0m2.045s
user 0m1.627s
sys 0m0.164s
```
This both save time for when it will be eventually needed,
and avoid mutating heap pages after a potential fork.
Instrumenting some large Rails app, I've witnessed up to
58% of String instances having their coderange still unknown.
[Feature #18885]
For now, the optimizations performed are:
- Run a major GC
- Compact the heap
- Promote all surviving objects to oldgen
Other optimizations may follow.
Closes [Feature #19729]
Previously 2 bits of the flags on each RVALUE are reserved to store the
number of GC cycles that each object has survived. This commit
introduces a new bit array on the heap page, called age_bits, to store
that information instead.
This patch still reserves one of the age bits in the flags (the old
FL_PROMOTED0 bit, now renamed FL_PROMOTED).
This is set to 0 for young objects and 1 for old objects, and is used as
a performance optimisation for the write barrier. Fetching the age_bits
from the heap page and doing the required math to calculate if the
object was old or not would slow down the write barrier. So we keep this
bit synced in the flags for fast access.
According to the C99 specification section 7.20.3.2 paragraph 2:
> If ptr is a null pointer, no action occurs.
So we do not need to check that the pointer is a null pointer.
This reverts commit 10621f7cb9.
This was reverted because the gc integrity build started failing. We
have figured out a fix so I'm reopening the PR.
Original commit message:
Fix cvar caching when class is cloned
The class variable cache that was added in
ruby#4544 changed the behavior of class
variables on cloned classes. As reported when a class is cloned AND a
class variable was set, and the class variable was read from the
original class, reading a class variable from the cloned class would
return the value from the original class.
This was happening because the IC (inline cache) is stored on the ISEQ
which is shared between the original and cloned class, therefore they
share the cache too.
To fix this we are now storing the `cref` in the cache so that we can
check if it's equal to the current `cref`. If it's different we don't
want to read from the cache. If it's the same we do. Cloned classes
don't share the same cref with their original class.
This will need to be backported to 3.1 in addition to 3.2 since the bug
exists in both versions.
We also added a marking function which was missing.
Fixes [Bug #19379]
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
The class variable cache that was added in
https://github.com/ruby/ruby/pull/4544 changed the behavior of class
variables on cloned classes. As reported when a class is cloned AND a
class variable was set, and the class variable was read from the
original class, reading a class variable from the cloned class would
return the value from the original class.
This was happening because the IC (inline cache) is stored on the ISEQ
which is shared between the original and cloned class, therefore they
share the cache too.
To fix this we are now storing the `cref` in the cache so that we can
check if it's equal to the current `cref`. If it's different we don't
want to read from the cache. If it's the same we do. Cloned classes
don't share the same cref with their original class.
This will need to be backported to 3.1 in addition to 3.2 since the bug
exists in both versions.
We also added a marking function which was missing.
Fixes [Bug #19379]
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
[Feature #19678]
References from an old object to a write barrier protected young object
will not immediately promote the young object. Instead, the young object
will age just like any other object, meaning that it has to survive
three collections before being promoted to the old generation.
References from an old object to a write barrier unprotected object will
place the parent object in the remember set for marking during minor
collections. This allows the child object to be reclaimed in minor
collections at the cost of increased time for minor collections.
On one of [Shopify's highest traffic Ruby apps, Storefront
Renderer](https://shopify.engineering/how-shopify-reduced-storefront-response-times-rewrite),
we saw significant improvements after deploying this feature in
production. We compare the GC time and response time of web workers that
have the original behaviour (non-experimental group) and this new
behaviour (experimental group). We see that with this feature we spend
significantly less time in the GC, 0.81x on average, 0.88x on p99, and
0.45x on p99.9.
This translates to improvements in average response time (0.96x) and p99
response time (0.92x).
[Feature #19571]
This commit adds the environment variable
`RUBY_GC_HEAP_REMEMBERED_WB_UNPROTECTED_OBJECTS_LIMIT_RATIO` which is
used to calculate the `remembered_wb_unprotected_objects_limit` using a
ratio of `old_objects`. This should improve performance by reducing
major GC because, in a major GC, we mark all of the old objects, so we
should have more uncollectible WB unprotected objects before starting a
major GC. The default has been set to 0.01 (1% of old objects).
On one of [Shopify's highest traffic Ruby apps, Storefront Renderer](https://shopify.engineering/how-shopify-reduced-storefront-response-times-rewrite),
we saw significant improvements after deploying this patch in
production. In the graphs below, we have the `tuned` group which uses
`RUBY_GC_HEAP_REMEMBERED_WB_UNPROTECTED_OBJECTS_LIMIT_RATIO=0.01` (the
default value), and an `untuned` group, which turns this feature off
with `RUBY_GC_HEAP_REMEMBERED_WB_UNPROTECTED_OBJECTS_LIMIT_RATIO=0`. We
see that the tuned group spends significantly less time in GC, on
average 0.67x of the time compared to the untuned group and 0.49x for
p99. We see this improvement in GC time translate to improvements in
response times. The average response time is now 0.96x of the time
compared to the untuned group and 0.86x for p99.
https://user-images.githubusercontent.com/15860699/229559078-e23e8ce4-5f1f-4a2f-b5ef-5769f92b8c70.png
[Bug #19584]
Some C extensions pass a pointer to a global variable to
rb_gc_register_address. However, if a GC is triggered inside of
rb_gc_register_address, then the object could get swept since it does
not exist on the stack.
[Bug #19580]
The real-world scenario motivating this change is libxml2's pthread
code which uses `pthread_key_create` to set up a destructor that is
called at thread exit to free thread-local storage.
There is a small window of time -- after ruby_vm_destruct but before
the process exits -- in which a pthread may exit and the destructor is
called, leading to a segfault.
Please note that this window of time may be relatively large if
`atexit` is being used.
Remove !USE_RVARGC code
[Feature #19579]
The Variable Width Allocation feature was turned on by default in Ruby
3.2. Since then, we haven't received bug reports or backports to the
non-Variable Width Allocation code paths, so we assume that nobody is
using it. We also don't plan on maintaining the non-Variable Width
Allocation code, so we are going to remove it.
Make sure the transient heap is in the right mode when we finish warming
the heap. Also ensure the GC isn't allowed to run while we iterate and
mutate the heap.
[Feature #18885]
For now, the optimizations performed are:
- Run a major GC
- Compact the heap
- Promote all surviving objects to oldgen
Other optimizations may follow.
[Bug #19550]
If !RCLASS_EXT_EMBEDDED (e.g. 32 bit systems) then the rb_classext_t is
allocated throug malloc so it must be freed.
The issue can be seen in the following script:
```
20.times do
100_000.times do
mod = Module.new
Class.new do
include mod
end
end
# Output the Resident Set Size (memory usage, in KB) of the current Ruby process
puts `ps -o rss= -p #{$$}`
end
```
Before this fix, the max RSS is 280MB, while after this change, it's
30MB.
st tables will maintain insertion order so we can marshal dump / load
objects with instance variables in the same order they were set on that
particular instance
[ruby-core:112926] [Bug #19535]
Co-Authored-By: Jemma Issroff <jemmaissroff@gmail.com>
When using rb_data_type_struct to wrap a C struct, that C struct can
contain VALUE references to other Ruby objects.
If this is the case then one must also define dmark and optionally
dcompact callbacks in order to allow these objects to be correctly
handled by the GC. This is suboptimal as it requires GC related logic to
be implemented by extension developers. This can be a cause of subtle
bugs when references are not marked of updated correctly inside these
callbacks.
This commit provides an alternative approach, useful in the simple case
where the C struct contains VALUE members (ie. there isn't any
conditional logic, or data structure manipulation required to traverse
these references).
In this case references can be defined using a declarative syntax
as a list of edges (or, pointers to references).
A flag can be set on the rb_data_type_struct to notify the GC that
declarative references are being used, and a list of those references
can be assigned to the dmark pointer instead of a function callback, on
the rb_data_type_struct.
Macros are also provided for simple declaration of the reference list,
and building edges.
To avoid having to also find space in the struct to define a length for
the references list, I've chosed to always terminate the references list
with RUBY_REF_END - defined as UINTPTR_MAX. My assumption is that no
single struct will ever be large enough that UINTPTR_MAX is actually a
valid reference.
These classes don't belong in gc.c as they're not actually part of the
GC. This commit refactors the code by moving all the code into a
weakmap.c file.
This makes the behavior of classes and modules when there are too many instance variables match the behavior of objects with too many instance variables.
When a Ractor is created whilst a tracepoint for
RUBY_INTERNAL_EVENT_NEWOBJ is active, the interpreter crashes. This is
because during the early setup of the Ractor, the stdio objects are
created, which allocates Ruby objects, which fires the tracepoint.
However, the tracepoint machinery tries to dereference the control frame
(ec->cfp->pc), which isn't set up yet and so crashes with a null pointer
dereference.
Fix this by not firing GC tracepoints if cfp isn't yet set up.
We need to zero out the whole slot when running the newobj hook for a
newly allocated class because the slot could be filled with garbage,
which would cause a crash if a GC runs inside of the newobj hook.
For example, the following script crashes:
```
require "objspace"
GC.stress = true
ObjectSpace.trace_object_allocations {
100.times do
Class.new
end
}
```
[Bug #19482]
This commit adds a function rb_data_free used by obj_free and
rb_objspace_call_finalizer to free T_DATA objects. This change also
means that RUBY_TYPED_FREE_IMMEDIATELY objects can be freed immediately
in rb_objspace_call_finalizer rather than being created into a zombie.
This feature was introduced in commit 2ccf6e5, but I realized that
using rb_warn is a bad idea because it allocates objects, which causes
a different crash ("object allocation during garbage collection phase").
We should just hard crash here instead.
If the previous instruction is not a leaf instruction, then the PC was
incremented before the instruction was ran (meaning the currently
executing instruction is actually the previous instruction), so we
should not increment the PC otherwise we will calculate the source
line for the next instruction.
This bug can be reproduced in the following script:
```
require "objspace"
ObjectSpace.trace_object_allocations_start
a =
1.0 / 0.0
p [ObjectSpace.allocation_sourceline(a), ObjectSpace.allocation_sourcefile(a)]
```
Which outputs: [4, "test.rb"]
This is incorrect because the object was allocated on line 10 and not
line 4. The behaviour is correct when we use a leaf instruction (e.g.
if we replaced `1.0 / 0.0` with `"hello"`), then the output is:
[10, "test.rb"].
[Bug #19456]
We shouldn't run gc_verify_internal_consistency after every GC step
when RGENGC_CHECK_MODE >= 2, only when GC has finished. Running it
on every GC step makes it too slow.
There is a `time` key in GC.stat that gives us the total time spent in
GC. However, we don't know what proportion of the time is spent between
marking and sweeping. This makes it difficult to tune the GC as we're
not sure where to focus our efforts on.
This PR adds keys `marking_time` and `sweeping_time` to GC.stat for the
time spent marking and sweeping, in milliseconds.
[Feature #19437]
This macro is broken when set to anything other than 0. And has had a
comment saying that it's broken for 3 years.
This commit deletes it and the associated logging code. It's clearly
not being used.
Co-Authored-By: Peter Zhu <peter@peterzhu.ca>
Given that signleton classes don't have an allocator,
we can re-use these bytes to store the attached object
in `rb_classext_struct` without making it larger.
The old RUBY_GC_HEAP_INIT_SLOTS isn't really usable anymore as
it initalize all the pools by the same factor, but it's unlikely
that pools will need similar sizes.
In production our 40B pool is 5 to 6 times bigger than our 80B pool.
It's not uncommon for simple binding to wrap structs without
any Ruby object references. Hence with no `mark` function.
Might as well mark them as protected by a write barrier.
There's a memory leak in ObjectSpace::WeakMap due to not freeing
the `struct weakmap`. It can be seen in the following script:
```
100.times do
10000.times do
ObjectSpace::WeakMap.new
end
# Output the Resident Set Size (memory usage, in KB) of the current Ruby process
puts `ps -o rss= -p #{$$}`
end
```
Instance variables held in gen_ivtbl are marked with rb_gc_mark. It
prevents the referenced objects from moving, which is bad for copying
garbage collectors.
This commit allows those instance variables to be updated during
gc_update_object_references.
This commit adds rb_gc_mark_and_move which takes a pointer to an object
and marks it during marking phase and updates references during compaction.
This allows for marking and reference updating to be combined into a
single function, which reduces code duplication and prevents bugs if
marking and reference updating goes out of sync.
This commit also implements rb_gc_mark_and_move on iseq as an example.
This commit moves the classpath (and tmp_classpath) from instance
variables to the rb_classext_t. This improves performance as we no
longer need to set an instance variable when assigning a classpath to
a class.
I benchmarked with the following script:
```ruby
name = :MyClass
puts(Benchmark.measure do
10_000_000.times do |i|
Object.const_set(name, Class.new)
Object.send(:remove_const, name)
end
end)
```
Before this patch:
```
5.440119 0.025264 5.465383 ( 5.467105)
```
After this patch:
```
4.889646 0.028325 4.917971 ( 4.942678)
```
The reference updating code for strings is not re-embedding strings
because the code is incorrectly wrapped inside of a
`if (STR_SHARED_P(obj))` clause. Shared strings can't be re-embedded
so this ends up being a no-op. This means that strings can be moved to a
large size pool during compaction, but won't be re-embedded, which would
waste the space.
There is an integer underflow when the environment variable
RUBY_GC_HEAP_INIT_SLOTS is less than the number of slots currently
in the Ruby heap.
[Bug #19284]
If a size pooll is small, then `min_free_slots < heap_init_slots` is true.
This means that min_free_slots will be set to heap_init_slots. This
causes `swept_slots < min_free_slots` to be true in a later if statement.
The if statement could trigger a major GC which could cause major GC
thrashing.
gc_compact_move incorrectly returns false when destination heap is full
after sweeping. It returns false even if destination heap is different
than source heap (returning false means that the source heap has
finished compacting). This causes the source page to get locked, which
causes a read barrier fire when we try to compact the source heap again.
We eagerly set the new shape of an object when moving an object during
compaction. This new shape may have a different capacity than the
current original shape capacity. This means that we cannot copy from the
original buffer using size of the new capacity. Instead, we should use
the ivar count (which is less than or equal to both the new and original
capacities).
Co-Authored-By: Matt Valentine-House <matt@eightbitraptor.com>
Allocating memory (xmalloc and xrealloc) during GC could cause GC to
trigger, which would crash with `[BUG] during_gc != 0`. This is an
intermittent bug which could be hard to debug.
This commit changes it so that any memory allocation during GC will
emit a warning. When debug flags are enabled it will also cause a crash.
When moving Objects between size pools we have to assign a new shape.
This happened during updating references - we tried to create a new shape
tree that mirrored the existing tree, but based on the root shape of the
new size pool.
This causes allocations to happen if the new tree doesn't already exist,
potentially triggering a GC, during GC.
This commit changes object movement to look for a pre-existing new tree
during object movement, and if that tree does not exist, we don't move
the object to the new pool.
This allows us to remove the shape allocation from update references.
Co-Authored-By: Peter Zhu <peter@peterzhu.ca>
When an object becomes "too complex" (in other words it has too many
variations in the shape tree), we transition it to use a "too complex"
shape and use a hash for storing instance variables.
Without this patch, there were rare cases where shape tree growth could
"explode" and cause performance degradation on what would otherwise have
been cached fast paths.
This patch puts a limit on shape tree growth, and gracefully degrades in
the rare case where there could be a factorial growth in the shape tree.
For example:
```ruby
class NG; end
HUGE_NUMBER.times do
NG.new.instance_variable_set(:"@unique_ivar_#{_1}", 1)
end
```
We consider objects to be "too complex" when the object's class has more
than SHAPE_MAX_VARIATIONS (currently 8) leaf nodes in the shape tree and
the object introduces a new variation (a new leaf node) associated with
that class.
For example, new variations on instances of the following class would be
considered "too complex" because those instances create more than 8
leaves in the shape tree:
```ruby
class Foo; end
9.times { Foo.new.instance_variable_set(":@uniq_#{_1}", 1) }
```
However, the following class is *not* too complex because it only has
one leaf in the shape tree:
```ruby
class Foo
def initialize
@a = @b = @c = @d = @e = @f = @g = @h = @i = nil
end
end
9.times { Foo.new }
``
This case is rare, so we don't expect this change to impact performance
of most applications, but it needs to be handled.
Co-Authored-By: Aaron Patterson <tenderlove@ruby-lang.org>
When moving Objects between size pools we have to assign a new shape.
This happened during updating references - we tried to create a new shape
tree that mirrored the existing tree, but based on the root shape of the
new size pool.
This causes allocations to happen if the new tree doesn't already exist,
potentially triggering a GC, during GC.
This commit changes object movement to look for a pre-existing new tree
during object movement, and if that tree does not exist, we don't move
the object to the new pool.
This allows us to remove the shape allocation from update references.
Co-Authored-By: Peter Zhu <peter@peterzhu.ca>
We can loosely predict the number of ivar sets on a class based on the
number of iv set instructions in the initialize method. This should give
us a more accurate estimate to use for initial size pool allocation,
which should in turn give us more cache hits.
This commit adds RVALUE_OVERHEAD for storing metadata at the end of the
slot. This commit moves the ractor_belonging_id in debug builds from the
flags to RVALUE_OVERHEAD which frees the 16 bits in the headers for
object shapes.
We would like to differentiate types of objects via their shape. This
commit adds a special T_OBJECT shape when we allocate an instance of
T_OBJECT. This allows us to avoid testing whether an object is an
instance of a T_OBJECT or not, we can just check the shape.
Since object shapes store the capacity of an object, we no longer
need the numiv field on RObjects. This gives us one extra slot which
we can use to give embedded objects one more instance variable (for a
total of 3 ivs). This commit removes the concept of numiv from RObject.
This commit adds a `capacity` field to shapes, and adds shape
transitions whenever an object's capacity changes. Objects which are
allocated out of a bigger size pool will also make a transition from the
root shape to the shape with the correct capacity for their size pool
when they are allocated.
This commit will allow us to remove numiv from objects completely, and
will also mean we can guarantee that if two objects share shapes, their
IVs are in the same positions (an embedded and extended object cannot
share shapes). This will enable us to implement ivar sets in YJIT using
object shapes.
Co-Authored-By: Aaron Patterson <tenderlove@ruby-lang.org>
fiber machine stack is placed outside of C stack allocated by wasm-ld,
so highest stack address recorded by `rb_wasm_record_stack_base` is
invalid when running on non-main fiber.
Therefore, we should scan `stack_{start,end}` which always point a valid
stack range in any context.
We were previously incrementing the max_iv_count on a class in gc
freeing. By the time we free an object though, we're not guaranteed its
class is still valid. Instead, we can do this when marking and we're
guaranteed the object still knows its class.
* Avoid RCLASS_IV_TBL in marshal.c
* Avoid RCLASS_IV_TBL for class names
* Avoid RCLASS_IV_TBL for autoload
* Avoid RCLASS_IV_TBL for class variables
* Avoid copying RCLASS_IV_TBL onto ICLASSes
* Use object shapes for Class and Module IVs
`iv_count` is a misleading name because when IVs are unset, the new
shape doesn't decrement this value. `next_iv_count` is an accurate, and
more descriptive name.
Before object shapes, we were using class serial to invalidate
inline caches. Now that we use shape_id for inline cache keys,
the class serial is unnecessary.
Co-Authored-By: Aaron Patterson <tenderlove@ruby-lang.org>
Shapes gives us an almost exact count of instance variables on an
object. Since we know the number of instance variables that have been
set, we will never access slots that haven't been initialized with an
IV.
Shapes provides us with an (almost) exact count of instance variables.
We only need to check for Qundef when an IV has been "undefined"
Prefer to use ROBJECT_IV_COUNT when iterating IVs
GCC 12 introduced a new warning flag `-Wuse-after-free`, however it
has a false positive at `realloc` when optimization is disabled, since
the memory requested for reallocation is guaranteed to not be touched.
This workaround is very unclear why the false warning is suppressed by
a statement-expression GCC extension.
Object Shapes is used for accessing instance variables and representing the
"frozenness" of objects. Object instances have a "shape" and the shape
represents some attributes of the object (currently which instance variables are
set and the "frozenness"). Shapes form a tree data structure, and when a new
instance variable is set on an object, that object "transitions" to a new shape
in the shape tree. Each shape has an ID that is used for caching. The shape
structure is independent of class, so objects of different types can have the
same shape.
For example:
```ruby
class Foo
def initialize
# Starts with shape id 0
@a = 1 # transitions to shape id 1
@b = 1 # transitions to shape id 2
end
end
class Bar
def initialize
# Starts with shape id 0
@a = 1 # transitions to shape id 1
@b = 1 # transitions to shape id 2
end
end
foo = Foo.new # `foo` has shape id 2
bar = Bar.new # `bar` has shape id 2
```
Both `foo` and `bar` instances have the same shape because they both set
instance variables of the same name in the same order.
This technique can help to improve inline cache hits as well as generate more
efficient machine code in JIT compilers.
This commit also adds some methods for debugging shapes on objects. See
`RubyVM::Shape` for more details.
For more context on Object Shapes, see [Feature: #18776]
Co-Authored-By: Aaron Patterson <tenderlove@ruby-lang.org>
Co-Authored-By: Eileen M. Uchitelle <eileencodes@gmail.com>
Co-Authored-By: John Hawthorn <john@hawthorn.email>
Tabs were expanded because the file did not have any tab indentation in unedited lines.
Please update your editor config, and use misc/expand_tabs.rb in the pre-commit hook.
Object Shapes is used for accessing instance variables and representing the
"frozenness" of objects. Object instances have a "shape" and the shape
represents some attributes of the object (currently which instance variables are
set and the "frozenness"). Shapes form a tree data structure, and when a new
instance variable is set on an object, that object "transitions" to a new shape
in the shape tree. Each shape has an ID that is used for caching. The shape
structure is independent of class, so objects of different types can have the
same shape.
For example:
```ruby
class Foo
def initialize
# Starts with shape id 0
@a = 1 # transitions to shape id 1
@b = 1 # transitions to shape id 2
end
end
class Bar
def initialize
# Starts with shape id 0
@a = 1 # transitions to shape id 1
@b = 1 # transitions to shape id 2
end
end
foo = Foo.new # `foo` has shape id 2
bar = Bar.new # `bar` has shape id 2
```
Both `foo` and `bar` instances have the same shape because they both set
instance variables of the same name in the same order.
This technique can help to improve inline cache hits as well as generate more
efficient machine code in JIT compilers.
This commit also adds some methods for debugging shapes on objects. See
`RubyVM::Shape` for more details.
For more context on Object Shapes, see [Feature: #18776]
Co-Authored-By: Aaron Patterson <tenderlove@ruby-lang.org>
Co-Authored-By: Eileen M. Uchitelle <eileencodes@gmail.com>
Co-Authored-By: John Hawthorn <john@hawthorn.email>
Poisoned regions cannot be accessed without unpoisoning outside gc.c.
Specifically, debug.gem is terminated by AddressSanitizer.
```
SUMMARY: AddressSanitizer: use-after-poison iseq_collector.c:39 in iseq_i
```
Tabs were expanded because the file did not have any tab indentation in unedited lines.
Please update your editor config, and use misc/expand_tabs.rb in the pre-commit hook.
rb_ary_tmp_new suggests that the array is temporary in some way, but
that's not true, it just creates an array that's hidden and not on the
transient heap. This commit renames it to rb_ary_hidden_new.
Before this commit, if we don't have enough slots after sweeping but
had pages on the tomb heap, then the GC would frequently allocate and
deallocate pages. This is because after sweeping it would set
allocatable pages (since there were not enough slots) but free the
pages on the tomb heap.
This commit reuses pages on the tomb heap if there's not enough slots
after sweeping.
Prior to this commit it was possible to call `ObjectSpace._id2ref` with
an offset static symbol object_id and get back a new, incorrectly tagged
symbol:
```
> sensible_sym = ObjectSpace._id2ref(:a.object_id)
=> :a
> nonsense_sym = ObjectSpace._id2ref(:a.object_id + 40)
=> :a
> sensible_sym == nonsense_sym
=> false
```
`nonsense_sym` ends up tagged with `RUBY_ID_INSTANCE` instead of
`RB_ID_LOCAL`. That means we can do silly things like:
```
> foo = Object.new
> foo.instance_variable_set(:a, 123)
(irb):2:in `instance_variable_set': `a' is not allowed as an instance variable name (NameError)
> foo.instance_variable_set(ObjectSpace._id2ref(:a.object_id + 40), 123)
=> 123
> foo.instance_variables
=> [:a]
```
This was happening because `get_id_entry` ignores the tag bits when
looking up the symbol. So `rb_id2str(symid)` would return a value and
then we'd continue on with the nonsense `symid`.
This commit prevents the situation by checking that the `symid` actually
matches what we get back from `get_id_entry`. Now we get a `RangeError`
for the nonsense id:
```
> ObjectSpace._id2ref(:a.object_id)
=> :a
> ObjectSpace._id2ref(:a.object_id + 40)
(irb):1:in `_id2ref': 0x000000000013f408 is not symbol id value (RangeError)
```
Co-authored-by: John Hawthorn <jhawthorn@github.com>
In wmap_live_p, if is_pointer_to_heap returns false, then the page is
either in the tomb or has already been freed, so the object is dead. In
this case, wmap_live_p should return false.
This commit implements Objects on Variable Width Allocation. This allows
Objects with more ivars to be embedded (i.e. contents directly follow the
object header) which improves performance through better cache locality.
This commit enables Arrays to move between size pools during compaction.
This can occur if the array is mutated such that it would fit in a
different size pool when embedded.
The move is carried out in two stages:
1. The RVALUE is moved to a destination heap during object movement
phase of compaction
2. The array data is re-embedded and the original buffer free'd if
required. This happens during the update references step
In order to reliably test compaction we need to be able to move objects
between size pools.
In order for this to happen there must be pages in a size pool into
which we can allocate.
The existing implementation of `double_heap` only doubled the existing
number of pages in the heap, so if a size pool had a low number of pages
(or 0) it's not guaranteed that enough space will be created to move
objects into that size pool.
This commit deprecates the `double_heap` option and replaces it with
`expand_heap` instead.
expand heap will expand each heap by enough pages to hold a number of
slots defined by `GC_HEAP_INIT_SLOTS` or by `heap->total_pags` whichever
is larger.
If both `double_heap` and `expand_heap` are present, a deprecation
warning will be shown for `double_heap` and the `expand_heap` behaviour
will take precedence
Given that this is an API intended for debugging and testing GC
compaction I'm not concerned about the extra memory usage or time taken
to create the pages. However, for completeness:
Running the following `test.rb` and using `time` on my Macbook Pro shows
the following memory usage and time impact:
pp "RSS (kb): #{`ps -o rss #{Process.pid}`.lines.last.to_i}"
GC.verify_compaction_references(double_heap: true, toward: :empty)
pp "RSS (kb): #{`ps -o rss #{Process.pid}`.lines.last.to_i}"
❯ time make run
./miniruby -I./lib -I. -I.ext/common -r./arm64-darwin21-fake ./test.rb
"RSS (kb): 24000"
<internal:gc>:251: warning: double_heap is deprecated and will be removed
"RSS (kb): 25232"
________________________________________________________
Executed in 124.37 millis fish external
usr time 82.22 millis 0.09 millis 82.12 millis
sys time 28.76 millis 2.61 millis 26.15 millis
❯ time make run
./miniruby -I./lib -I. -I.ext/common -r./arm64-darwin21-fake ./test.rb
"RSS (kb): 24000"
"RSS (kb): 49040"
________________________________________________________
Executed in 150.13 millis fish external
usr time 103.32 millis 0.10 millis 103.22 millis
sys time 35.73 millis 2.59 millis 33.14 millis
If the page_body is a null pointer, then read_barrier_handler will
crash with an unrelated message. This commit improves the error message.
Before:
test.rb:1: [BUG] Couldn't unprotect page 0x0000000000000000, errno: Cannot allocate memory
After:
test.rb:1: [BUG] read_barrier_handler: segmentation fault at 0x14
The GC compaction mechanism implements a kind of read barrier by marking
some (OS) pages as unreadable, and installing a SIGBUS/SIGSEGV handler
to detect when they're accessed and invalidate an attempt to move the
object.
Unfortunately, when a debugger is attached to the Ruby interpreter on
Mac OS, the debugger will trap the EXC_BAD_ACCES mach exception before
the runtime can transform that into a SIGBUS signal and dispatch it.
Thus, execution gets stuck; any attempt to continue from the debugger
re-executes the line that caused the exception and no forward progress
can be made.
This makes it impossible to debug either the Ruby interpreter or a C
extension whilst compaction is in use.
To fix this, we disable the EXC_BAD_ACCESS handler when installing the
SIGBUS/SIGSEGV handlers, and re-enable them once the compaction is done.
The debugger will still trap on the attempt to read the bad page, but it
will be trapping the SIGBUS signal, rather than the EXC_BAD_ACCESS mach
exception. It's possible to continue from this in the debugger, which
invokes the signal handler and allows forward progress to be made.
Commit 0c36ba5319 changed GC compaction
methods to not be implemented when not supported. However, that commit
only does compile time checks (which currently only checks for WASM),
but there are additional compaction support checks during run time.
This commit changes it so that GC compaction methods aren't defined
during run time if the platform does not support GC compaction.
[Bug #18829]
Only growth heaps are allowed to start major GCs. Before this patch,
growth heaps are defined as size pools that freed more slots than had
empty slots (i.e. there were more dead objects that empty space).
But if the size pool is relatively stable and tightly packed with mostly
old objects and has allocatable pages, then it would be incorrectly
classified as a growth heap and trigger major GC. But since it's stable,
it would not use any of the allocatable pages and forever be classified
as a growth heap, causing major GC thrashing. This commit changes the
definition of growth heap to require that the size pool to have no
allocatable pages.
Having a while loop over `heap_prepare` makes the GC logic difficult to
understand (it is difficult to understand when and why `heap_prepare`
yields a free page). It is also a source of bugs and can cause an infinite
loop if `heap_page` never yields a free page.
Fixes [Bug #18779]
Define the following methods as `rb_f_notimplement` on unsupported
platforms:
- GC.compact
- GC.auto_compact
- GC.auto_compact=
- GC.latest_compact_info
- GC.verify_compaction_references
This change allows users to call `GC.respond_to?(:compact)` to
properly test for compaction support. Previously, it was necessary to
invoke `GC.compact` or `GC.verify_compaction_references` and check if
those methods raised `NotImplementedError` to determine if compaction
was supported.
This follows the precedent set for other platform-specific
methods. For example, in `process.c` for methods such as
`Process.fork`, `Process.setpgid`, and `Process.getpriority`.
These methods are removed from gc.rb and added to gc.c:
- GC.compact
- GC.auto_compact
- GC.auto_compact=
- GC.latest_compact_info
- GC.verify_compaction_references
This is a prefactor to allow setting these methods to
`rb_f_notimplement` in a followup commit.
Some size pools may not have any pages/slots, so total_slots is 0. This
causes a divide-by-zero in the calculation. This commit adds a special
case to catch the case when total_slots is 0 and returns the number of
pages for heap_init_slots.
If the size pool has no or few pages/slots, then min_free_slots will
be a very small number (or even 0). Then the heap won't be eligible to
grow, causing GC thrashing or infinite loops.
Size pools with no pages won't be swept so gc_sweep_finish_size_pool
will never be called on it, but gc_sweep_finish_size_pool must be called
to grow the size pool.
Depending on alignment, the last bitmap plane may not used. Then it will
appear as if all of the objects on that plane is unmarked, which will
cause a buffer overrun when we try to free the object. This commit
changes the loop to calculate the number of planes used
(bitmap_plane_count).
Since 4d8f76286b, we need to dereference
the includer field on iclasses, so we need to mark it to make sure
it's alive.
Sometimes during compaction we crash because the field is dangling,
though I have a hard time constructing such a situation. See
http://ci.rvm.jp/results/trunk@ruby-iga/3947725
We didn't update the includer field during compaction so it could become
a dangling pointer after compaction. It's only recently that we started
to dereference the field, and we were only comparing the pointer before
then, so the omission only recently started to cause crashes.
By instrumenting object.c:833 with `rp(includer);`, you can see the
includer field become `T_NONE` with the following script:
```ruby
mod = Module.new do
protected def foo = 1
end
klass = Class.new do
include Module.new
def run
foo
end
end
klass.include(mod)
GC.verify_compaction_references(double_heap: true, toward: :empty)
klass.new.run
```
I found a crash in a private application that this patch fixes, but
wasn't able to develop a small reproducer. Hence the above demo that
requires instrumentation.
During VM startup, rb_objspace_alloc sets malloc_limit
(objspace->malloc_params.limit) before ruby_gc_set_params is called, thus
nullifying the effect of RUBY_GC_MALLOC_LIMIT before the initial GC run.
The call sequence is as follows:
main.c::main()
ruby_init
ruby_setup
Init_BareVM
rb_objspace_alloc // malloc_limit = gc_params.malloc_limit_min;
ruby_options
ruby_process_options
process_options
ruby_gc_set_params // RUBY_GC_MALLOC_LIMIT => gc_params.malloc_limit_min
With ruby_gc_set_params setting malloc_limit, RUBY_GC_MALLOC_LIMIT
affects the process sooner.
[ruby-core:107170]
Commit dde164e968 decoupled incremental
marking from page sizes. This commit changes Ruby heap page sizes to
64KiB. Doing so will have several benefits:
1. We can use compaction on systems with 64KiB system page sizes (e.g.
PowerPC).
2. Larger page sizes will allow Variable Width Allocation to increase
slot sizes and embed larger objects.
3. Since commit 002fa28599, macOS has 64
KiB pages. Making page sizes 64 KiB will bring these systems to
parity.
I have attached some bechmark results below.
Discourse:
On Discourse, we saw much better p99 performance (e.g. for "categories"
it went from 214ms on master to 134ms on branch, for "home" it went
from 265ms to 251ms). We don’t see much change in p60, p75, and p90
performance. We also see a slight decrease in memory usage by 1.04x.
Branch RSS: 354.9MB
Master RSS: 368.2MB
railsbench:
On rails bench, we don’t see a big change in RPS or p99
performance. We don’t see a big difference in memory usage.
Branch RPS: 826.27
Master RPS: 824.85
Branch p99: 1.67
Master p99: 1.72
Branch RSS: 88.72MB
Master RSS: 88.48MB
liquid:
We don’t see a significant change in liquid performance.
Branch parse & render: 28.653 I/s
Master parse & render: 28.563 i/s
Currently, rb_aligned_malloc uses mmap if Ruby heap pages can be
allocated through mmap (when system heap page size <= Ruby heap page
size). If Ruby heap page sizes is increased to 64KiB, then mmap will
be used on systems with 64KiB system page sizes. However, the transient
heap also uses rb_aligned_malloc and requires 32KiB alignment. This
would break in the current implementation since it would allocate sizes
through mmap that is not a multiple of the system page size.
This commit adds heap_page_body_allocate which will use mmap when
possible and changes rb_aligned_malloc to not use mmap (and only
use posix_memalign).