This commit adds `GC.auto_compact = :empty` which will run
auto-compaction sorting pages by empty slots so the most amount of
objects will be moved. This will make it easier to write tests for
auto-compaction.
pinned_slots is not being reset every GC, which causes this assertion to
fail:
```
Assertion Failed: gc.c:7076:gc_pin:GET_HEAP_PAGE(obj)->pinned_slots <= GET_HEAP_PAGE(obj)->total_slots
```
This commit changes it to reset it at the beginning of every compaction
GC cycle.
Previously with RUBY_FREE_ON_EXIT, ractors where being xfree-ed which is incorrect since they are not xmalloced.
Instead we can free ractors with ractor free during shutdown. This change only effects main ractor freeing when RUBY_FREE_ON_EXIT is set.
Co-authored-by: John Hawthorn <john@hawthorn.email>
Previously, T_DATA and T_FILE objects did not have their instance
variables freed on exit which would be reported as a memory leak with
RUBY_FREE_ON_EXIT. This commit changes it to use obj_free which also
frees the generic instance variables.
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
When compiling with cppflags=-DRGENGC_CHECK_MODE, the following crashes:
```
$ RUBY_FREE_ON_EXIT=1 ./miniruby -e 0
-e: [BUG] obj_free: RVALUE_MARKED(0x0000000103570020 [3LM ] T_CLASS (anon)) != FALSE
```
This commit clears the mark bits when rb_free_on_exit is enabled.
Our current implementation of rb_postponed_job_register suffers from
some safety issues that can lead to interpreter crashes (see bug #1991).
Essentially, the issue is that jobs can be called with the wrong
arguments.
We made two attempts to fix this whilst keeping the promised semantics,
but:
* The first one involved masking/unmasking when flushing jobs, which
was believed to be too expensive
* The second one involved a lock-free, multi-producer, single-consumer
ringbuffer, which was too complex
The critical insight behind this third solution is that essentially the
only user of these APIs are a) internal, or b) profiling gems.
For a), none of the usages actually require variable data; they will
work just fine with the preregistration interface.
For b), generally profiling gems only call a single callback with a
single piece of data (which is actually usually just zero) for the life
of the program. The ringbuffer is complex because it needs to support
multi-word inserts of job & data (which can't be atomic); but nobody
actually even needs that functionality, really.
So, this comit:
* Introduces a pre-registration API for jobs, with a GVL-requiring
rb_postponed_job_prereigster, which returns a handle which can be
used with an async-signal-safe rb_postponed_job_trigger.
* Deprecates rb_postponed_job_register (and re-implements it on top of
the preregister function for compatability)
* Moves all the internal usages of postponed job register
pre-registration
when the RUBY_FREE_ON_SHUTDOWN environment variable is set, manually free memory at shutdown.
Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
Co-authored-by: Peter Zhu <peter@peterzhu.ca>
need_major_gc is set when a major GC is required. However, if
gc_stress_no_major is also set, then it will not actually run a major
GC.
For example, the following script will sometimes crash:
```
GC.stress = 1
50000.times.map { [] }
```
With the following message:
```
[BUG] cannot create a new page after major GC
```
The intention of GC.verify_compaction_references is, I believe, to force
every single movable object to be moved, so that it's possible to debug
native extensions which not correctly updating their references to
objects they mark as movable.
To do this, it doubles the number of allocated pages for each size pool,
and sorts the heap pages so that the free ones are swept first; thus,
every object in an old page should be moved into a free slot in one of
the new pages.
This worked fine until movement of objects _between_ size pools during
compaction was implemented. That causes some problems for
verify_compaction_references:
* We were doubling the number of pages in each size pool, but actually
if some objects need to move into a _different_ pool, there's no
guarantee that they'll be enough room in that one.
* It's possible for the sweep & compact cursors to meet in one size pool
before all the objects that want to move into that size pool from
another are processed by the compaction.
You can see these problems by changing some of the movement tests in
test_gc_compact.rb to try and move e.g. 50,000 objects instead of
500; the test is not able to actually move all of the objects in a
single compaction run.
To fix this, we do two things in verify_compaction_references:
* Firstly, we add enough pages to every size pool to make them the same
size. This ensures that their compact cursors will all have space to
move during compaction (even if that means empty pages are
pointlessly compacted)
* Then, we examine every object and determine where it _wants_ to be
compacted into. We use this information to add additional pages to
each size pool to handle all objects which should live there.
With these two changes, we can move arbitrary amounts of objects into
the correct size pool in a single call to verify_compaction_references.
My _motivation_ for performing this work was to try and fix some test
stability issues in test_gc_compact.rb. I now no longer think that we
actually see this particular bug in rubyci.org, but I also think
verify_compaction_references should do what it says on the tin, so it's
worth keeping.
[Bug #20022]
This works like objspace_each_obj, except instead of being called with
the start & end address of each page, it's called with the page
structure itself.
[Bug #20022]
Objects with the same shape must always have the same "embeddedness"
(either embedded or heap allocated) because YJIT assumes so. However,
using remove_instance_variable, it's possible that some objects are
embedded and some are heap allocated because it does not re-embed heap
allocated objects.
This commit changes remove_instance_variable to re-embed Object
instance variables when it becomes small enough.
Embedded shared strings cannot be moved because strings point into the
slot of the shared string. There may be code using the RSTRING_PTR on
the stack, which would pin the string but not pin the shared string,
causing it to move.
When generic instance variable has a shape, it is marked movable. If it
it transitions to too complex, it needs to update references otherwise
it may have incorrect references.
This is required for the same reason that super CC needs it.
See 36023d5cb7.
Reproducer:
def cached_foo_callsite(obj) = obj.foo
class Foo
def foo = :v1
module R
refine Foo do
def foo = :unused
end
end
end
obj = Foo.new
cached_foo_callsite(obj) # set up cc with cme for foo=:v1
class Foo
def foo = :v2
end
GC.start # cme for foo=:v1 collected, if not reachable by cached_foo_callsite
cached_foo_callsite(obj)
[Bug #19994]
On large Ruby applications, shutdown may be slow if a major GC has just
started because rb_objspace_call_finalizer completes the GC.
This commit adds gc_abort which discards the mark stack if during
incremental marking and stops sweeping if during lazy sweeping.
Previously, because gc_update_object_references() did not update the
VALUEs in the too_complex ivar st_table for T_CLASS and T_MODULE
objects, GC compaction could finish with corrupted objects.
- start with `klass`, not too_complex
- GC incremental step marks `klass` and its ivars
- ruby code makes `klass` too_complex
- GC compaction runs and move `klass` ivars, but because `klass` is
too_complex, its ivars are not updated by gc_update_object_references(),
leaving T_NONE or T_MOVED objects in the ivar table.
Co-authored-by: Peter Zhu <peter@peterzhu.ca>
Marking both keys and values versus marking just values is an important
distinction, but previously, gc_update_tbl_refs() and gc_update_table_refs()
had names that were too similar.
The st_table storing ivars for too_complex T_OBJECTs have IDs as keys,
but we were marking the IDs unnecessary previously, maybe due to the
confusing naming.
Previously, it tripped the assert about too_complex in
ROBJECT_IV_CAPACITY(). This fixes double faults for some crashes and
helps with use during development.
Since the callback defined in the objspace module might give up the GVL,
we need to make sure the right cr->mfd value is set back after the GVL
is re-obtained.
The previous implementation was using the pointer given
by `DATA_PTR` in all cases. But in the case of an embedded
TypedData, that pointer is garbage, we need to use RTYPEDDATA_GET_DATA
to get the proper data pointer.
Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>
This commit adds a new flag RUBY_TYPED_EMBEDDABLE that allows the data
of a TypedData object to be embedded after the object itself. This will
improve cache locality and allow us to save the 8 byte data pointer.
Co-Authored-By: Jean Boussier <byroot@ruby-lang.org>
Tracks other callinfo that references the same kwargs and frees them when all references are cleared.
[bug #19906]
Co-authored-by: Peter Zhu <peter@peterzhu.ca>
fix memory leak in vm_method
This introduces a unified reference_count to clarify who is referencing a method.
This also allows us to treat the refinement method as the def owner since it counts itself as a reference
Co-authored-by: Peter Zhu <peter@peterzhu.ca>
By compacting into slots with pinned objects first, we improve the
efficiency of compaction. As it is less likely that there will exist
pages containing only pinned objects after compaction. This will
increase the number of free pages left after compaction and enable us to
free them.
This used to be the default compaction method before it was removed
(inadvertently?) during the introduction of auto_compaction.
This commit will sort the pages by the pinned slot count at the start of
a major GC that has been triggered by explicitly calling GC.compact (and
thus setting objspace->flags.during_compaction).
It works using the same method by which we sort the heap by empty slot
count during GC.verify_compaction_references.
Previously it was only being sorted during the verify compaction
references stage - so would only happen during testing.
This commit allows us to sort the heap prior to each explicit GC.compact
run
Previously, configuring any GC event hook would cause all allocations to
go through the newobj slowpath. We should only need to do that when the
newobj specifically is subscribed to.
This renames flags.has_hook to flags.has_newobj_hook, to make this new
usage clear. newobj_of0 was the only place which previously checked this
flag.
This should help fix the following flaky test:
```
1) Failure:
TestProcess#test_warmup_frees_pages [test/ruby/test_process.rb:2751]:
<0> expected but was
<1>.
```
If we're during incremental marking, then Ruby code can execute that
deallocates certain memory buffers that have been called with
rb_gc_mark_weak, which can cause use-after-free bugs.
The term "shady object" was renamed to "uncollectible write barrier
unprotected object", so rename `has_uncollectible_shady_objects` to
`has_uncollectible_wb_unprotected_objects` for consistency.
We always sweep at least 2048 slots per sweep step, but only pool one
page. For large size pools, 2048 slots is many pages but one page is
very few slots. This commit changes it so that at least 1024 slots are
placed in the pooled pages per sweep step.
We move all pooled pages to free pages at the start of incremental
marking, so we shouldn't run incremental marking only when we have run
out of free pages. This causes incremental marking to always complete
in a single step.
If we are in a minor GC and the object to mark is old, then the old
object should already be marked and cannot be reclaimed in this GC cycle
so we don't need to add it to the weak refences list.
This is an internal only function not exposed to the C extension API.
It's only use so far is from rb_vm_mark, where it's used to mark the
values in the vm->trap_list.cmd array.
There shouldn't be any reason why these cannot move.
This commit allows them to move by updating their references during the
reference updating step of compaction.
To do this we've introduced another internal function
rb_gc_update_values as a partner to rb_gc_mark_values.
This allows us to refactor rb_gc_mark_values to not pin
The old algorithm could calculate an undercount for the initial pages
due to two issues:
1. It did not take into account that some heap pages will have one less
slot due to alignment. It assumed that every heap page would be able
to be fully filled with slots. Pages that are unaligned with the slot
size will lose one slot. The new algorithm assumes that every page
will be unaligned.
2. It performed integer division, which truncates down. This means that
the number of pages might not actually satisfy the number of slots.
This can cause the heap to grow in `gc_sweep_finish_size_pool` after
allocating all of the allocatable pages because the total number of
slots would be less than the initial configured number of slots.
This commit changes RUBY_GC_HEAP_INIT_SIZE_{40,80,160,320,640}_SLOTS to
RUBY_GC_HEAP_{0,1,2,3,4}_INIT_SLOTS. This is easier to use because the
user does not need to determine the slot sizes (which can vary between
32 and 64 bit systems). They now just use the heap names
(`GC.stat_heap.keys`).
If initial slots is set, then during a minor GC, if we have allocatable
pages but the heap is mostly full, then we will set `grow_heap` to true
since `total_slots` does not count allocatable pages so it will be less
than `init_slots`. This can cause `allocatable_pages` to grow to much
higher than desired since it will appear that the heap is mostly full.
This commit adds `free_empty_pages` which frees all empty heap pages and
moves the number of pages freed to the allocatable pages counter. This
is used in Process.warmup to improve performance because page
invalidation from copy-on-write is slower than allocating a new page.
[Feature #19783]
This commit adds stats about weak references to `GC.latest_gc_info`.
It adds the following two keys:
- `weak_references_count`: number of weak references registered during
the last GC.
- `retained_weak_references_count`: number of weak references that
survived the last GC.
[Feature #19783]
This commit adds support for weak references in the GC through the
function `rb_gc_mark_weak`. Unlike strong references, weak references
does not mark the object, but rather lets the GC know that an object
refers to another one. If the child object is freed, the pointer from
the parent object is overwritten with `Qundef`.
Co-Authored-By: Jean Boussier <byroot@ruby-lang.org>
This commit adds key force_incremental_marking_finish_count to
GC.stat_heap. This statistic returns the number of times the size pool
has forced incremental marking to finish due to running out of slots.
Functions rgengc_remembered, rgengc_remembered_sweep, and
rgengc_remembersetbits_get are just wrappers of RVALUE_REMEMBERED and
doesn't do much more. We can remove all those and use RVALUE_REMEMBERED
directly instead.
We don't need to check stack for moved objects after compaction because
the mutator cannot run between marking the stack and the end of
compaction. However, the stack may have moved objects leftover from
marking and sweeping phases. This means that their pages will be
invalidated and all objects moved back. We don't need to move these
objects back.
This also fixes the issue on Windows where some compaction tests
sometimes fail due to the page of the object being invalidated.
This commit stores the initial slots per size pool, configured with
the environment variables `RUBY_GC_HEAP_INIT_SIZE_%d_SLOTS`. This
ensures that the configured initial slots remains a low bound for the
number of slots in the heap, which can prevent heaps from thrashing in
size.
From Ruby 3.0, refined method invocations are slow because
resolved methods are not cached by inline cache because of
conservertive strategy. However, `using` clears all caches
so that it seems safe to cache resolved method entries.
This patch caches resolved method entries in inline cache
and clear all of inline method caches when `using` is called.
fix [Bug #18572]
```ruby
# without refinements
class C
def foo = :C
end
N = 1_000_000
obj = C.new
require 'benchmark'
Benchmark.bm{|x|
x.report{N.times{
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
}}
}
_END__
user system total real
master 0.362859 0.002544 0.365403 ( 0.365424)
modified 0.357251 0.000000 0.357251 ( 0.357258)
```
```ruby
# with refinment but without using
class C
def foo = :C
end
module R
refine C do
def foo = :R
end
end
N = 1_000_000
obj = C.new
require 'benchmark'
Benchmark.bm{|x|
x.report{N.times{
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
}}
}
__END__
user system total real
master 0.957182 0.000000 0.957182 ( 0.957212)
modified 0.359228 0.000000 0.359228 ( 0.359238)
```
```ruby
# with using
class C
def foo = :C
end
module R
refine C do
def foo = :R
end
end
N = 1_000_000
using R
obj = C.new
require 'benchmark'
Benchmark.bm{|x|
x.report{N.times{
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
}}
}
`cc->klass` and `cc->cme_` can be free'ed while last marking
so that it should be checked bofore updating the pointers.
Note that `T_MOVED` is living, but `is_live_object()` returns false.
cc is callcache.
cc->klass (klass) should not be marked because if the klass is
free'ed, the cc->klass will be cleared by `vm_cc_invalidate()`.
cc->cme (cme) should not be marked because if cc is invalidated
when cme is free'ed.
- klass marks cme if klass uses cme.
- caller classe's ccs->cme marks cc->cme.
- if cc is invalidated (klass doesn't refer the cc),
cc is invalidated by `vm_cc_invalidate()` and cc->cme is
not be accessed.
- On the multi-Ractors, cme will be collected with global GC
so that it is safe if GC is not interleaving while accessing
cc and cme.
fix [Bug #19436]
```ruby
10_000.times{|i|
# p i if (i%1_000) == 0
str = "x" * 1_000_000
def str.foo = nil
eval "def call#{i}(s) = s.foo"
send "call#{i}", str
}
```
Without this patch:
```
real 1m5.639s
user 0m6.637s
sys 0m58.292s
```
and with this patch:
```
real 0m2.045s
user 0m1.627s
sys 0m0.164s
```
This both save time for when it will be eventually needed,
and avoid mutating heap pages after a potential fork.
Instrumenting some large Rails app, I've witnessed up to
58% of String instances having their coderange still unknown.
[Feature #18885]
For now, the optimizations performed are:
- Run a major GC
- Compact the heap
- Promote all surviving objects to oldgen
Other optimizations may follow.
Closes [Feature #19729]
Previously 2 bits of the flags on each RVALUE are reserved to store the
number of GC cycles that each object has survived. This commit
introduces a new bit array on the heap page, called age_bits, to store
that information instead.
This patch still reserves one of the age bits in the flags (the old
FL_PROMOTED0 bit, now renamed FL_PROMOTED).
This is set to 0 for young objects and 1 for old objects, and is used as
a performance optimisation for the write barrier. Fetching the age_bits
from the heap page and doing the required math to calculate if the
object was old or not would slow down the write barrier. So we keep this
bit synced in the flags for fast access.
According to the C99 specification section 7.20.3.2 paragraph 2:
> If ptr is a null pointer, no action occurs.
So we do not need to check that the pointer is a null pointer.
This reverts commit 10621f7cb9.
This was reverted because the gc integrity build started failing. We
have figured out a fix so I'm reopening the PR.
Original commit message:
Fix cvar caching when class is cloned
The class variable cache that was added in
ruby#4544 changed the behavior of class
variables on cloned classes. As reported when a class is cloned AND a
class variable was set, and the class variable was read from the
original class, reading a class variable from the cloned class would
return the value from the original class.
This was happening because the IC (inline cache) is stored on the ISEQ
which is shared between the original and cloned class, therefore they
share the cache too.
To fix this we are now storing the `cref` in the cache so that we can
check if it's equal to the current `cref`. If it's different we don't
want to read from the cache. If it's the same we do. Cloned classes
don't share the same cref with their original class.
This will need to be backported to 3.1 in addition to 3.2 since the bug
exists in both versions.
We also added a marking function which was missing.
Fixes [Bug #19379]
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
The class variable cache that was added in
https://github.com/ruby/ruby/pull/4544 changed the behavior of class
variables on cloned classes. As reported when a class is cloned AND a
class variable was set, and the class variable was read from the
original class, reading a class variable from the cloned class would
return the value from the original class.
This was happening because the IC (inline cache) is stored on the ISEQ
which is shared between the original and cloned class, therefore they
share the cache too.
To fix this we are now storing the `cref` in the cache so that we can
check if it's equal to the current `cref`. If it's different we don't
want to read from the cache. If it's the same we do. Cloned classes
don't share the same cref with their original class.
This will need to be backported to 3.1 in addition to 3.2 since the bug
exists in both versions.
We also added a marking function which was missing.
Fixes [Bug #19379]
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
[Feature #19678]
References from an old object to a write barrier protected young object
will not immediately promote the young object. Instead, the young object
will age just like any other object, meaning that it has to survive
three collections before being promoted to the old generation.
References from an old object to a write barrier unprotected object will
place the parent object in the remember set for marking during minor
collections. This allows the child object to be reclaimed in minor
collections at the cost of increased time for minor collections.
On one of [Shopify's highest traffic Ruby apps, Storefront
Renderer](https://shopify.engineering/how-shopify-reduced-storefront-response-times-rewrite),
we saw significant improvements after deploying this feature in
production. We compare the GC time and response time of web workers that
have the original behaviour (non-experimental group) and this new
behaviour (experimental group). We see that with this feature we spend
significantly less time in the GC, 0.81x on average, 0.88x on p99, and
0.45x on p99.9.
This translates to improvements in average response time (0.96x) and p99
response time (0.92x).
[Feature #19571]
This commit adds the environment variable
`RUBY_GC_HEAP_REMEMBERED_WB_UNPROTECTED_OBJECTS_LIMIT_RATIO` which is
used to calculate the `remembered_wb_unprotected_objects_limit` using a
ratio of `old_objects`. This should improve performance by reducing
major GC because, in a major GC, we mark all of the old objects, so we
should have more uncollectible WB unprotected objects before starting a
major GC. The default has been set to 0.01 (1% of old objects).
On one of [Shopify's highest traffic Ruby apps, Storefront Renderer](https://shopify.engineering/how-shopify-reduced-storefront-response-times-rewrite),
we saw significant improvements after deploying this patch in
production. In the graphs below, we have the `tuned` group which uses
`RUBY_GC_HEAP_REMEMBERED_WB_UNPROTECTED_OBJECTS_LIMIT_RATIO=0.01` (the
default value), and an `untuned` group, which turns this feature off
with `RUBY_GC_HEAP_REMEMBERED_WB_UNPROTECTED_OBJECTS_LIMIT_RATIO=0`. We
see that the tuned group spends significantly less time in GC, on
average 0.67x of the time compared to the untuned group and 0.49x for
p99. We see this improvement in GC time translate to improvements in
response times. The average response time is now 0.96x of the time
compared to the untuned group and 0.86x for p99.
https://user-images.githubusercontent.com/15860699/229559078-e23e8ce4-5f1f-4a2f-b5ef-5769f92b8c70.png
[Bug #19584]
Some C extensions pass a pointer to a global variable to
rb_gc_register_address. However, if a GC is triggered inside of
rb_gc_register_address, then the object could get swept since it does
not exist on the stack.
[Bug #19580]
The real-world scenario motivating this change is libxml2's pthread
code which uses `pthread_key_create` to set up a destructor that is
called at thread exit to free thread-local storage.
There is a small window of time -- after ruby_vm_destruct but before
the process exits -- in which a pthread may exit and the destructor is
called, leading to a segfault.
Please note that this window of time may be relatively large if
`atexit` is being used.
Remove !USE_RVARGC code
[Feature #19579]
The Variable Width Allocation feature was turned on by default in Ruby
3.2. Since then, we haven't received bug reports or backports to the
non-Variable Width Allocation code paths, so we assume that nobody is
using it. We also don't plan on maintaining the non-Variable Width
Allocation code, so we are going to remove it.
Make sure the transient heap is in the right mode when we finish warming
the heap. Also ensure the GC isn't allowed to run while we iterate and
mutate the heap.
[Feature #18885]
For now, the optimizations performed are:
- Run a major GC
- Compact the heap
- Promote all surviving objects to oldgen
Other optimizations may follow.
[Bug #19550]
If !RCLASS_EXT_EMBEDDED (e.g. 32 bit systems) then the rb_classext_t is
allocated throug malloc so it must be freed.
The issue can be seen in the following script:
```
20.times do
100_000.times do
mod = Module.new
Class.new do
include mod
end
end
# Output the Resident Set Size (memory usage, in KB) of the current Ruby process
puts `ps -o rss= -p #{$$}`
end
```
Before this fix, the max RSS is 280MB, while after this change, it's
30MB.
st tables will maintain insertion order so we can marshal dump / load
objects with instance variables in the same order they were set on that
particular instance
[ruby-core:112926] [Bug #19535]
Co-Authored-By: Jemma Issroff <jemmaissroff@gmail.com>
When using rb_data_type_struct to wrap a C struct, that C struct can
contain VALUE references to other Ruby objects.
If this is the case then one must also define dmark and optionally
dcompact callbacks in order to allow these objects to be correctly
handled by the GC. This is suboptimal as it requires GC related logic to
be implemented by extension developers. This can be a cause of subtle
bugs when references are not marked of updated correctly inside these
callbacks.
This commit provides an alternative approach, useful in the simple case
where the C struct contains VALUE members (ie. there isn't any
conditional logic, or data structure manipulation required to traverse
these references).
In this case references can be defined using a declarative syntax
as a list of edges (or, pointers to references).
A flag can be set on the rb_data_type_struct to notify the GC that
declarative references are being used, and a list of those references
can be assigned to the dmark pointer instead of a function callback, on
the rb_data_type_struct.
Macros are also provided for simple declaration of the reference list,
and building edges.
To avoid having to also find space in the struct to define a length for
the references list, I've chosed to always terminate the references list
with RUBY_REF_END - defined as UINTPTR_MAX. My assumption is that no
single struct will ever be large enough that UINTPTR_MAX is actually a
valid reference.
These classes don't belong in gc.c as they're not actually part of the
GC. This commit refactors the code by moving all the code into a
weakmap.c file.
This makes the behavior of classes and modules when there are too many instance variables match the behavior of objects with too many instance variables.
When a Ractor is created whilst a tracepoint for
RUBY_INTERNAL_EVENT_NEWOBJ is active, the interpreter crashes. This is
because during the early setup of the Ractor, the stdio objects are
created, which allocates Ruby objects, which fires the tracepoint.
However, the tracepoint machinery tries to dereference the control frame
(ec->cfp->pc), which isn't set up yet and so crashes with a null pointer
dereference.
Fix this by not firing GC tracepoints if cfp isn't yet set up.
We need to zero out the whole slot when running the newobj hook for a
newly allocated class because the slot could be filled with garbage,
which would cause a crash if a GC runs inside of the newobj hook.
For example, the following script crashes:
```
require "objspace"
GC.stress = true
ObjectSpace.trace_object_allocations {
100.times do
Class.new
end
}
```
[Bug #19482]
This commit adds a function rb_data_free used by obj_free and
rb_objspace_call_finalizer to free T_DATA objects. This change also
means that RUBY_TYPED_FREE_IMMEDIATELY objects can be freed immediately
in rb_objspace_call_finalizer rather than being created into a zombie.
This feature was introduced in commit 2ccf6e5, but I realized that
using rb_warn is a bad idea because it allocates objects, which causes
a different crash ("object allocation during garbage collection phase").
We should just hard crash here instead.
If the previous instruction is not a leaf instruction, then the PC was
incremented before the instruction was ran (meaning the currently
executing instruction is actually the previous instruction), so we
should not increment the PC otherwise we will calculate the source
line for the next instruction.
This bug can be reproduced in the following script:
```
require "objspace"
ObjectSpace.trace_object_allocations_start
a =
1.0 / 0.0
p [ObjectSpace.allocation_sourceline(a), ObjectSpace.allocation_sourcefile(a)]
```
Which outputs: [4, "test.rb"]
This is incorrect because the object was allocated on line 10 and not
line 4. The behaviour is correct when we use a leaf instruction (e.g.
if we replaced `1.0 / 0.0` with `"hello"`), then the output is:
[10, "test.rb"].
[Bug #19456]
We shouldn't run gc_verify_internal_consistency after every GC step
when RGENGC_CHECK_MODE >= 2, only when GC has finished. Running it
on every GC step makes it too slow.
There is a `time` key in GC.stat that gives us the total time spent in
GC. However, we don't know what proportion of the time is spent between
marking and sweeping. This makes it difficult to tune the GC as we're
not sure where to focus our efforts on.
This PR adds keys `marking_time` and `sweeping_time` to GC.stat for the
time spent marking and sweeping, in milliseconds.
[Feature #19437]
This macro is broken when set to anything other than 0. And has had a
comment saying that it's broken for 3 years.
This commit deletes it and the associated logging code. It's clearly
not being used.
Co-Authored-By: Peter Zhu <peter@peterzhu.ca>
Given that signleton classes don't have an allocator,
we can re-use these bytes to store the attached object
in `rb_classext_struct` without making it larger.
The old RUBY_GC_HEAP_INIT_SLOTS isn't really usable anymore as
it initalize all the pools by the same factor, but it's unlikely
that pools will need similar sizes.
In production our 40B pool is 5 to 6 times bigger than our 80B pool.
It's not uncommon for simple binding to wrap structs without
any Ruby object references. Hence with no `mark` function.
Might as well mark them as protected by a write barrier.
There's a memory leak in ObjectSpace::WeakMap due to not freeing
the `struct weakmap`. It can be seen in the following script:
```
100.times do
10000.times do
ObjectSpace::WeakMap.new
end
# Output the Resident Set Size (memory usage, in KB) of the current Ruby process
puts `ps -o rss= -p #{$$}`
end
```
Instance variables held in gen_ivtbl are marked with rb_gc_mark. It
prevents the referenced objects from moving, which is bad for copying
garbage collectors.
This commit allows those instance variables to be updated during
gc_update_object_references.
This commit adds rb_gc_mark_and_move which takes a pointer to an object
and marks it during marking phase and updates references during compaction.
This allows for marking and reference updating to be combined into a
single function, which reduces code duplication and prevents bugs if
marking and reference updating goes out of sync.
This commit also implements rb_gc_mark_and_move on iseq as an example.
This commit moves the classpath (and tmp_classpath) from instance
variables to the rb_classext_t. This improves performance as we no
longer need to set an instance variable when assigning a classpath to
a class.
I benchmarked with the following script:
```ruby
name = :MyClass
puts(Benchmark.measure do
10_000_000.times do |i|
Object.const_set(name, Class.new)
Object.send(:remove_const, name)
end
end)
```
Before this patch:
```
5.440119 0.025264 5.465383 ( 5.467105)
```
After this patch:
```
4.889646 0.028325 4.917971 ( 4.942678)
```
The reference updating code for strings is not re-embedding strings
because the code is incorrectly wrapped inside of a
`if (STR_SHARED_P(obj))` clause. Shared strings can't be re-embedded
so this ends up being a no-op. This means that strings can be moved to a
large size pool during compaction, but won't be re-embedded, which would
waste the space.
There is an integer underflow when the environment variable
RUBY_GC_HEAP_INIT_SLOTS is less than the number of slots currently
in the Ruby heap.
[Bug #19284]
If a size pooll is small, then `min_free_slots < heap_init_slots` is true.
This means that min_free_slots will be set to heap_init_slots. This
causes `swept_slots < min_free_slots` to be true in a later if statement.
The if statement could trigger a major GC which could cause major GC
thrashing.
gc_compact_move incorrectly returns false when destination heap is full
after sweeping. It returns false even if destination heap is different
than source heap (returning false means that the source heap has
finished compacting). This causes the source page to get locked, which
causes a read barrier fire when we try to compact the source heap again.
We eagerly set the new shape of an object when moving an object during
compaction. This new shape may have a different capacity than the
current original shape capacity. This means that we cannot copy from the
original buffer using size of the new capacity. Instead, we should use
the ivar count (which is less than or equal to both the new and original
capacities).
Co-Authored-By: Matt Valentine-House <matt@eightbitraptor.com>
Allocating memory (xmalloc and xrealloc) during GC could cause GC to
trigger, which would crash with `[BUG] during_gc != 0`. This is an
intermittent bug which could be hard to debug.
This commit changes it so that any memory allocation during GC will
emit a warning. When debug flags are enabled it will also cause a crash.
When moving Objects between size pools we have to assign a new shape.
This happened during updating references - we tried to create a new shape
tree that mirrored the existing tree, but based on the root shape of the
new size pool.
This causes allocations to happen if the new tree doesn't already exist,
potentially triggering a GC, during GC.
This commit changes object movement to look for a pre-existing new tree
during object movement, and if that tree does not exist, we don't move
the object to the new pool.
This allows us to remove the shape allocation from update references.
Co-Authored-By: Peter Zhu <peter@peterzhu.ca>
When an object becomes "too complex" (in other words it has too many
variations in the shape tree), we transition it to use a "too complex"
shape and use a hash for storing instance variables.
Without this patch, there were rare cases where shape tree growth could
"explode" and cause performance degradation on what would otherwise have
been cached fast paths.
This patch puts a limit on shape tree growth, and gracefully degrades in
the rare case where there could be a factorial growth in the shape tree.
For example:
```ruby
class NG; end
HUGE_NUMBER.times do
NG.new.instance_variable_set(:"@unique_ivar_#{_1}", 1)
end
```
We consider objects to be "too complex" when the object's class has more
than SHAPE_MAX_VARIATIONS (currently 8) leaf nodes in the shape tree and
the object introduces a new variation (a new leaf node) associated with
that class.
For example, new variations on instances of the following class would be
considered "too complex" because those instances create more than 8
leaves in the shape tree:
```ruby
class Foo; end
9.times { Foo.new.instance_variable_set(":@uniq_#{_1}", 1) }
```
However, the following class is *not* too complex because it only has
one leaf in the shape tree:
```ruby
class Foo
def initialize
@a = @b = @c = @d = @e = @f = @g = @h = @i = nil
end
end
9.times { Foo.new }
``
This case is rare, so we don't expect this change to impact performance
of most applications, but it needs to be handled.
Co-Authored-By: Aaron Patterson <tenderlove@ruby-lang.org>
When moving Objects between size pools we have to assign a new shape.
This happened during updating references - we tried to create a new shape
tree that mirrored the existing tree, but based on the root shape of the
new size pool.
This causes allocations to happen if the new tree doesn't already exist,
potentially triggering a GC, during GC.
This commit changes object movement to look for a pre-existing new tree
during object movement, and if that tree does not exist, we don't move
the object to the new pool.
This allows us to remove the shape allocation from update references.
Co-Authored-By: Peter Zhu <peter@peterzhu.ca>