Current behavior - caches depend on a global counter. All constant mutations cause caches to be invalidated.
```ruby
class A
B = 1
end
def foo
A::B # inline cache depends on global counter
end
foo # populate inline cache
foo # hit inline cache
C = 1 # global counter increments, all caches are invalidated
foo # misses inline cache due to `C = 1`
```
Proposed behavior - caches depend on name components. Only constant mutations with corresponding names will invalidate the cache.
```ruby
class A
B = 1
end
def foo
A::B # inline cache depends constants named "A" and "B"
end
foo # populate inline cache
foo # hit inline cache
C = 1 # caches that depend on the name "C" are invalidated
foo # hits inline cache because IC only depends on "A" and "B"
```
Examples of breaking the new cache:
```ruby
module C
# Breaks `foo` cache because "A" constant is set and the cache in foo depends
# on "A" and "B"
class A; end
end
B = 1
```
We expect the new cache scheme to be invalidated less often because names aren't frequently reused. With the cache being invalidated less, we can rely on its stability more to keep our constant references fast and reduce the need to throw away generated code in YJIT.
During lazy sweeping, the iclass could be a dead object that has not yet
been swept. However, the chain of superclasses of the iclass could
already have been swept (and become a new object), which would cause a
crash when trying to read the object.
Previously, we would build a new `superclasses` array for each class,
even though for all immediate subclasses of a class, the array is
identical.
This avoids duplicating the arrays on leaf classes (those without
subclasses) by calculating and storing a "superclasses including self"
array on a class when it's first inherited and sharing that among all
superclasses.
An additional trick used is that the "superclass array including self"
is valid as "self"'s superclass array. It just has it's own class at the
end. We can use this to avoid an extra pointer of storage and can use
one bit of a flag to track that we've "upgraded" the array.
Previously when checking ancestors, we would walk all the way up the
ancestry chain checking each parent for a matching class or module.
I believe this was especially unfriendly to CPU cache since for each
step we need to check two cache lines (the class and class ext).
This check is used quite often in:
* case statements
* rescue statements
* Calling protected methods
* Class#is_a?
* Module#===
* Module#<=>
I believe it's most common to check a class against a parent class, to
this commit aims to improve that (unfortunately does not help checking
for an included Module).
This is done by storing on each class the number and an array of all
parent classes, in order (BasicObject is at index 0). Using this we can
check whether a class is a subclass of another in constant time since we
know the location to expect it in the hierarchy.
On 32-bit systems, VWA causes class_serial to not be aligned (it only
guarantees 4 byte alignment but class_serial is 8 bytes and requires 8
byte alignment). This commit uses a hack to allocate class_serial
through malloc. Once VWA allocates with 8 byte alignment in the future,
we will revert this commit.
This is to allow Module subclasses that include modules before
calling super in the subclass's initialize.
Remove rb_module_check_initializable from Module#initialize.
Module#initialize only calls module_exec if a block is passed,
it doesn't have other issues that would cause problems if
called multiple times or with an already initialized module.
Move initialization of super to Module#allocate, though I'm not
sure it is required there. However, it's needed to be removed
from Module#initialize for this to work.
Fixes [Bug #18292]
This commit adds a Ractor cache for every size pool. Previously, all VWA
allocated objects used the slowpath and locked the VM.
On a micro-benchmark that benchmarks String allocation:
VWA turned off:
29.196591 0.889709 30.086300 ( 9.434059)
VWA before this commit:
29.279486 41.477869 70.757355 ( 12.527379)
VWA after this commit:
16.782903 0.557117 17.340020 ( 4.255603)
Updating RCLASS_PARENT_SUBCLASSES and RCLASS_MODULE_SUBCLASSES while
compacting can trigger the read barrier. This commit makes
RCLASS_SUBCLASSES a doubly linked list with a dedicated head object so
that we can add and remove entries from the list without having to touch
an object in the Ruby heap
With RVARGC we always store the rb_classext_t in the same slot as the
RClass struct that refers to it. So we don't need to store the pointer
or access through the pointer anymore and can switch the RCLASS_EXT
macro to use an offset
Follow up of 428227472f. The previous fix
uses `rb_ary_new_from_values` to create the result array, but it may
trigger the GC.
This second try is to create the result array by `rb_ary_new_capa`
before the second iteration, and assume that `rb_ary_push` does not
trigger GC. This assumption is very fragile, so should be improved in
future.
[Bug #18282] [Feature #14394]
GC must not be triggered during callback of rb_class_foreach_subclass.
To prevent GC, we can not use rb_ary_push. Instead, this changeset calls
rb_class_foreach_subclass twice: first counts the subclasses, then
allocates a buffer (which may cause GC and reduce subclasses, but not
increase), and finally stores the subclasses to the buffer.
[Bug #18282] [Feature #14394]
Refinement#import_methods imports methods from modules.
Unlike Module#include, it copies methods and adds them into the refinement,
so the refinement is activated in the imported methods.
[Bug #17429] [ruby-core:101639]
Must not be a bad idea to improve documents. [ci skip]
In fact many functions declared in the header file are already
documented more or less. They were just copy & pasted, with applying
some style updates.
Must not be a bad idea to improve documents. [ci skip]
In fact many functions declared in the header file are already
documented more or less. They were just copy & pasted, with applying
some style updates.
This commits implements size classes in the GC for the Variable Width
Allocation feature. Unless `USE_RVARGC` compile flag is set, only a
single size class is created, maintaining current behaviour. See the
redmine ticket for more details.
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
This commits implements size classes in the GC for the Variable Width
Allocation feature. Unless `USE_RVARGC` compile flag is set, only a
single size class is created, maintaining current behaviour. See the
redmine ticket for more details.
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
Redo of 34a2acdac788602c14bf05fb616215187badd504 and
931138b00696419945dc03e10f033b1f53cd50f3 which were reverted.
GitHub PR #4340.
This change implements a cache for class variables. Previously there was
no cache for cvars. Cvar access is slow due to needing to travel all the
way up th ancestor tree before returning the cvar value. The deeper the
ancestor tree the slower cvar access will be.
The benefits of the cache are more visible with a higher number of
included modules due to the way Ruby looks up class variables. The
benchmark here includes 26 modules and shows with the cache, this branch
is 6.5x faster when accessing class variables.
```
compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105c) [x86_64-darwin19]
built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be009) [x86_64-darwin19]
| |compare-ruby|built-ruby|
|:--------|-----------:|---------:|
|vm_cvar | 5.681M| 36.980M|
| | -| 6.51x|
```
Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails
application. ActiveRecord::Base.logger has 71 ancestors. The more
ancestors a tree has, the more clear the speed increase. IE if Base had
only one ancestor we'd see no improvement. This benchmark is run on a
vanilla Rails application.
Benchmark code:
```ruby
require "benchmark/ips"
require_relative "config/environment"
Benchmark.ips do |x|
x.report "logger" do
ActiveRecord::Base.logger
end
end
```
Ruby 3.0 master / Rails 6.1:
```
Warming up --------------------------------------
logger 155.251k i/100ms
Calculating -------------------------------------
```
Ruby 3.0 with cvar cache / Rails 6.1:
```
Warming up --------------------------------------
logger 1.546M i/100ms
Calculating -------------------------------------
logger 14.857M (± 4.8%) i/s - 74.198M in 5.006202s
```
Lastly we ran a benchmark to demonstate the difference between master
and our cache when the number of modules increases. This benchmark
measures 1 ancestor, 30 ancestors, and 100 ancestors.
Ruby 3.0 master:
```
Warming up --------------------------------------
1 module 1.231M i/100ms
30 modules 432.020k i/100ms
100 modules 145.399k i/100ms
Calculating -------------------------------------
1 module 12.210M (± 2.1%) i/s - 61.553M in 5.043400s
30 modules 4.354M (± 2.7%) i/s - 22.033M in 5.063839s
100 modules 1.434M (± 2.9%) i/s - 7.270M in 5.072531s
Comparison:
1 module: 12209958.3 i/s
30 modules: 4354217.8 i/s - 2.80x (± 0.00) slower
100 modules: 1434447.3 i/s - 8.51x (± 0.00) slower
```
Ruby 3.0 with cvar cache:
```
Warming up --------------------------------------
1 module 1.641M i/100ms
30 modules 1.655M i/100ms
100 modules 1.620M i/100ms
Calculating -------------------------------------
1 module 16.279M (± 3.8%) i/s - 82.038M in 5.046923s
30 modules 15.891M (± 3.9%) i/s - 79.459M in 5.007958s
100 modules 16.087M (± 3.6%) i/s - 81.005M in 5.041931s
Comparison:
1 module: 16279458.0 i/s
100 modules: 16087484.6 i/s - same-ish: difference falls within error
30 modules: 15891406.2 i/s - same-ish: difference falls within error
```
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
Instead of on read. Once it's in the inline cache we never have to make
one again. We want to eventually put the value into the cache, and the
best opportunity to do that is when you write the value.
This change implements a cache for class variables. Previously there was
no cache for cvars. Cvar access is slow due to needing to travel all the
way up th ancestor tree before returning the cvar value. The deeper the
ancestor tree the slower cvar access will be.
The benefits of the cache are more visible with a higher number of
included modules due to the way Ruby looks up class variables. The
benchmark here includes 26 modules and shows with the cache, this branch
is 6.5x faster when accessing class variables.
```
compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105ca45) [x86_64-darwin19]
built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be0093ae) [x86_64-darwin19]
| |compare-ruby|built-ruby|
|:--------|-----------:|---------:|
|vm_cvar | 5.681M| 36.980M|
| | -| 6.51x|
```
Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails
application. ActiveRecord::Base.logger has 71 ancestors. The more
ancestors a tree has, the more clear the speed increase. IE if Base had
only one ancestor we'd see no improvement. This benchmark is run on a
vanilla Rails application.
Benchmark code:
```ruby
require "benchmark/ips"
require_relative "config/environment"
Benchmark.ips do |x|
x.report "logger" do
ActiveRecord::Base.logger
end
end
```
Ruby 3.0 master / Rails 6.1:
```
Warming up --------------------------------------
logger 155.251k i/100ms
Calculating -------------------------------------
```
Ruby 3.0 with cvar cache / Rails 6.1:
```
Warming up --------------------------------------
logger 1.546M i/100ms
Calculating -------------------------------------
logger 14.857M (± 4.8%) i/s - 74.198M in 5.006202s
```
Lastly we ran a benchmark to demonstate the difference between master
and our cache when the number of modules increases. This benchmark
measures 1 ancestor, 30 ancestors, and 100 ancestors.
Ruby 3.0 master:
```
Warming up --------------------------------------
1 module 1.231M i/100ms
30 modules 432.020k i/100ms
100 modules 145.399k i/100ms
Calculating -------------------------------------
1 module 12.210M (± 2.1%) i/s - 61.553M in 5.043400s
30 modules 4.354M (± 2.7%) i/s - 22.033M in 5.063839s
100 modules 1.434M (± 2.9%) i/s - 7.270M in 5.072531s
Comparison:
1 module: 12209958.3 i/s
30 modules: 4354217.8 i/s - 2.80x (± 0.00) slower
100 modules: 1434447.3 i/s - 8.51x (± 0.00) slower
```
Ruby 3.0 with cvar cache:
```
Warming up --------------------------------------
1 module 1.641M i/100ms
30 modules 1.655M i/100ms
100 modules 1.620M i/100ms
Calculating -------------------------------------
1 module 16.279M (± 3.8%) i/s - 82.038M in 5.046923s
30 modules 15.891M (± 3.9%) i/s - 79.459M in 5.007958s
100 modules 16.087M (± 3.6%) i/s - 81.005M in 5.041931s
Comparison:
1 module: 16279458.0 i/s
100 modules: 16087484.6 i/s - same-ish: difference falls within error
30 modules: 15891406.2 i/s - same-ish: difference falls within error
```
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
To invalidate some callable method entries, we replace the entry in the
class. Most types of method entries are on the method table of the
origin class, but refinement entries without an orig_me are housed in
the method table of the class itself. They are there because refinements
take priority over prepended methods.
By unconditionally inserting a copy of the refinement entry into the
origin class, clearing the method cache created situations where there
are refinement entry duplicates in the lookup chain, leading to infinite
loops and other problems.
Update the replacement logic to use the right class that houses the
method entry. Also, be more selective about cache invalidation when
moving refinement entries for prepend. This avoids calling
clear_method_cache_by_id_in_class() before refinement entries are in the
place it expects.
[Bug #17806]
In every caller of `rb_class_ivar_set` it checks for the `RCLASS_IV_TBL`
and then creates it if it doesn't exist. Instead of repeating this in
every caller, this can be done once in `rb_class_ivar_set`.
It's important to only make the origin when the prepend goes
through, as the precense of the origin informs whether to do an
origin backfill.
This plus 2d877327e fix [Bug #17590].
Check for cyclic prepend before making any changes. This requires
scanning the module ancestor chain twice, but in general modules
do not have large numbers of ancestors.
Previously, if a class included a module and then prepended the
same module, the prepend had no effect. This changes the behavior
so that the prepend has an effect unless the module is already
prepended the receiver.
While here, rename the origin_seen variable in include_modules_at,
since it is misleading. The variable tracks whether c has been seen,
not whether the origin of klass has been.
Fixes [Bug #17423]
cee02d754d resets pCMC and `me`
will be a invalidated and continuing the invalidated `me`,
it will break the data structure. This patch tris to clear
all methods of specified class before manipulating the `me`s.
[Issue #17417]
Module#include should only be able to insert modules after the origin,
otherwise it ends up working like Module#prepend.
This fixes the case where one of the modules in the included module
chain is included in a module that is already prepended to the receiver.
Fixes [Bug #7844]
Before this commit, `clone` gave different results depending on whether the original object
had an attached singleton class or not.
Consider the following setup:
```
class Foo; end
Foo.singleton_class.define_method(:foo) {}
obj = Foo.new
obj.singleton_class if $call_singleton
clone = obj.clone
```
When `$call_singleton = false`, neither `obj.singleton_class.singleton_class` nor
`clone.singleton_class.singleton_class` own any methods.
However, when `$call_singleton = true`, `clone.singleton_class.singleton_class` would own a copy of
`foo` from `Foo.singleton_class`, even though `obj.singleton_class.singleton_class` does not.
The latter case is unexpected and results in a visibly different clone, depending on if the original object
had an attached class or not.
Co-authored-by: Ufuk Kayserilioglu <ufuk.kayserilioglu@shopify.com>
Before this commit, iclasses were "shady", or not protected by write
barriers. Because of that, the GC needs to spend more time marking these
objects than otherwise.
Applications that make heavy use of modules should see reduction in GC
time as they have a significant number of live iclasses on the heap.
- Put logic for iclass method table ownership into a function
- Remove calls to WB_UNPROTECT and insert write barriers for iclasses
This commit relies on the following invariant: for any non oirigin
iclass `I`, `RCLASS_M_TBL(I) == RCLASS_M_TBL(RBasic(I)->klass)`. This
invariant did not hold prior to 98286e9 for classes and modules that
have prepended modules.
[Feature #16984]
98286e9850 made it so that
`Module#include` allocates an origin iclass on each use. Since `include`
is widely used, the extra allocation can contribute significantly to
memory usage.
Instead of always allocating in anticipation of prepend, this change
takes a different approach. The new setup inserts a origin iclass into
the super chains of all the children of the module when prepend happens
for the first time.
rb_ensure_origin is made static again since now that adding an origin
now means walking over all usages, we want to limit the number of places
where we do it.
3556a834a2 added support for
Module#include to affect the iclasses of the module. It didn't add
support for Module#prepend because there were bugs in the object model
and GC at the time that prevented it. Those problems have been
addressed in ad729a1d11 and
98286e9850, and now adding support for
it is straightforward and does not break any tests or specs.
Fixes [Bug #9573]
This fixes various issues when a module is included in or prepended
to a module or class, and then refined, or refined and then included
or prepended to a module or class.
Implement by renaming ensure_origin to rb_ensure_origin, making it
non-static, and calling it when refining a module.
Fix Module#initialize_copy to handle origins correctly. Previously,
Module#initialize_copy did not handle origins correctly. For example,
this code:
```ruby
module B; end
class A
def b; 2 end
prepend B
end
a = A.dup.new
class A
def b; 1 end
end
p a.b
```
Printed 1 instead of 2. This is because the super chain for
a.singleton_class was:
```
a.singleton_class
A.dup
B(iclass)
B(iclass origin)
A(origin) # not A.dup(origin)
```
The B iclasses would not be modified, so the includer entry would be
still be set to A and not A.dup.
This modifies things so that if the class/module has an origin,
all iclasses between the class/module and the origin are duplicated
and have the correct includer entry set, and the correct origin
is created.
This requires other changes to make sure all tests still pass:
* rb_undef_methods_from doesn't automatically handle classes with
origins, so pass it the origin for Comparable when undefing
methods in Complex. This fixed a failure in the Complex tests.
* When adding a method, the method cache was not cleared
correctly if klass has an origin. Clear the method cache for
the klass before switching to the origin of klass. This fixed
failures in the autoload tests related to overridding require,
without breaking the optimization tests. Also clear the method
cache for both the module and origin when removing a method.
* Module#include? is fixed to skip origin iclasses.
* Refinements are fixed to use the origin class of the module that
has an origin.
* RCLASS_REFINED_BY_ANY is removed as it was only used in a single
place and is no longer needed.
* Marshal#dump is fixed to skip iclass origins.
* rb_method_entry_make is fixed to handled overridden optimized
methods for modules that have origins.
Fixes [Bug #16852]
If a module has an origin, and that module is included in another
module or class, previously the iclass created for the module had
an origin pointer to the module's origin instead of the iclass's
origin.
Setting the origin pointer correctly requires using a stack, since
the origin iclass is not created until after the iclass itself.
Use a hidden ruby array to implement that stack.
Correctly assigning the origin pointers in the iclass caused a
use-after-free in GC. If a module with an origin is included
in a class, the iclass shares a method table with the module
and the iclass origin shares a method table with module origin.
Mark iclass origin with a flag that notes that even though the
iclass is an origin, it shares a method table, so the method table
should not be garbage collected. The shared method table will be
garbage collected when the module origin is garbage collected.
I've tested that this does not introduce a memory leak.
This change caused a VM assertion failure, which was traced to callable
method entries using the incorrect defined_class. Update
rb_vm_check_redefinition_opt_method and find_defined_class_by_owner
to treat iclass origins different than class origins to avoid this
issue.
This also includes a fix for Module#included_modules to skip
iclasses with origins.
Fixes [Bug #16736]
If a module has an origin, and that module is included in another
module or class, previously the iclass created for the module had
an origin pointer to the module's origin instead of the iclass's
origin.
Setting the origin pointer correctly requires using a stack, since
the origin iclass is not created until after the iclass itself.
Use a hidden ruby array to implement that stack.
Correctly assigning the origin pointers in the iclass caused a
use-after-free in GC. If a module with an origin is included
in a class, the iclass shares a method table with the module
and the iclass origin shares a method table with module origin.
Mark iclass origin with a flag that notes that even though the
iclass is an origin, it shares a method table, so the method table
should not be garbage collected. The shared method table will be
garbage collected when the module origin is garbage collected.
I've tested that this does not introduce a memory leak.
This also includes a fix for Module#included_modules to skip
iclasses with origins.
Fixes [Bug #16736]
When calling Module#include, if the receiver is a module,
walk the subclasses list and include the argument module in each
iclass.
This does not affect Module#prepend, as fixing that is significantly
more involved.
Fixes [Bug #9573]
This patch contains several ideas:
(1) Disposable inline method cache (IMC) for race-free inline method cache
* Making call-cache (CC) as a RVALUE (GC target object) and allocate new
CC on cache miss.
* This technique allows race-free access from parallel processing
elements like RCU.
(2) Introduce per-Class method cache (pCMC)
* Instead of fixed-size global method cache (GMC), pCMC allows flexible
cache size.
* Caching CCs reduces CC allocation and allow sharing CC's fast-path
between same call-info (CI) call-sites.
(3) Invalidate an inline method cache by invalidating corresponding method
entries (MEs)
* Instead of using class serials, we set "invalidated" flag for method
entry itself to represent cache invalidation.
* Compare with using class serials, the impact of method modification
(add/overwrite/delete) is small.
* Updating class serials invalidate all method caches of the class and
sub-classes.
* Proposed approach only invalidate the method cache of only one ME.
See [Feature #16614] for more details.
1. By substituting `n_var` with its initializer, `0 < n_var` is
equivalent to `argc > argi + n_trail`.
2. As `argi` is non-negative, so `argi + n_trail >= n_trail`, and
the above expression is equivalent to `argc > n_trail`.
3. Therefore, `f_last` is always false, and `last_hash` is no
longer used.