Renames rb_id_table_foreach_with_replace to
rb_id_table_foreach_values_with_replace and passes only the value to the
callback. We can use this in GC compaction when we cannot access the
global symbol array.
This reverts commit 530e485265.
`rb_id_table_foreach_with_replace` is used during GC compaction,
and the global symbols array can have been moved at that time.
Redo of 34a2acdac788602c14bf05fb616215187badd504 and
931138b00696419945dc03e10f033b1f53cd50f3 which were reverted.
GitHub PR #4340.
This change implements a cache for class variables. Previously there was
no cache for cvars. Cvar access is slow due to needing to travel all the
way up th ancestor tree before returning the cvar value. The deeper the
ancestor tree the slower cvar access will be.
The benefits of the cache are more visible with a higher number of
included modules due to the way Ruby looks up class variables. The
benchmark here includes 26 modules and shows with the cache, this branch
is 6.5x faster when accessing class variables.
```
compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105c) [x86_64-darwin19]
built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be009) [x86_64-darwin19]
| |compare-ruby|built-ruby|
|:--------|-----------:|---------:|
|vm_cvar | 5.681M| 36.980M|
| | -| 6.51x|
```
Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails
application. ActiveRecord::Base.logger has 71 ancestors. The more
ancestors a tree has, the more clear the speed increase. IE if Base had
only one ancestor we'd see no improvement. This benchmark is run on a
vanilla Rails application.
Benchmark code:
```ruby
require "benchmark/ips"
require_relative "config/environment"
Benchmark.ips do |x|
x.report "logger" do
ActiveRecord::Base.logger
end
end
```
Ruby 3.0 master / Rails 6.1:
```
Warming up --------------------------------------
logger 155.251k i/100ms
Calculating -------------------------------------
```
Ruby 3.0 with cvar cache / Rails 6.1:
```
Warming up --------------------------------------
logger 1.546M i/100ms
Calculating -------------------------------------
logger 14.857M (± 4.8%) i/s - 74.198M in 5.006202s
```
Lastly we ran a benchmark to demonstate the difference between master
and our cache when the number of modules increases. This benchmark
measures 1 ancestor, 30 ancestors, and 100 ancestors.
Ruby 3.0 master:
```
Warming up --------------------------------------
1 module 1.231M i/100ms
30 modules 432.020k i/100ms
100 modules 145.399k i/100ms
Calculating -------------------------------------
1 module 12.210M (± 2.1%) i/s - 61.553M in 5.043400s
30 modules 4.354M (± 2.7%) i/s - 22.033M in 5.063839s
100 modules 1.434M (± 2.9%) i/s - 7.270M in 5.072531s
Comparison:
1 module: 12209958.3 i/s
30 modules: 4354217.8 i/s - 2.80x (± 0.00) slower
100 modules: 1434447.3 i/s - 8.51x (± 0.00) slower
```
Ruby 3.0 with cvar cache:
```
Warming up --------------------------------------
1 module 1.641M i/100ms
30 modules 1.655M i/100ms
100 modules 1.620M i/100ms
Calculating -------------------------------------
1 module 16.279M (± 3.8%) i/s - 82.038M in 5.046923s
30 modules 15.891M (± 3.9%) i/s - 79.459M in 5.007958s
100 modules 16.087M (± 3.6%) i/s - 81.005M in 5.041931s
Comparison:
1 module: 16279458.0 i/s
100 modules: 16087484.6 i/s - same-ish: difference falls within error
30 modules: 15891406.2 i/s - same-ish: difference falls within error
```
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
This change implements a cache for class variables. Previously there was
no cache for cvars. Cvar access is slow due to needing to travel all the
way up th ancestor tree before returning the cvar value. The deeper the
ancestor tree the slower cvar access will be.
The benefits of the cache are more visible with a higher number of
included modules due to the way Ruby looks up class variables. The
benchmark here includes 26 modules and shows with the cache, this branch
is 6.5x faster when accessing class variables.
```
compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105ca45) [x86_64-darwin19]
built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be0093ae) [x86_64-darwin19]
| |compare-ruby|built-ruby|
|:--------|-----------:|---------:|
|vm_cvar | 5.681M| 36.980M|
| | -| 6.51x|
```
Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails
application. ActiveRecord::Base.logger has 71 ancestors. The more
ancestors a tree has, the more clear the speed increase. IE if Base had
only one ancestor we'd see no improvement. This benchmark is run on a
vanilla Rails application.
Benchmark code:
```ruby
require "benchmark/ips"
require_relative "config/environment"
Benchmark.ips do |x|
x.report "logger" do
ActiveRecord::Base.logger
end
end
```
Ruby 3.0 master / Rails 6.1:
```
Warming up --------------------------------------
logger 155.251k i/100ms
Calculating -------------------------------------
```
Ruby 3.0 with cvar cache / Rails 6.1:
```
Warming up --------------------------------------
logger 1.546M i/100ms
Calculating -------------------------------------
logger 14.857M (± 4.8%) i/s - 74.198M in 5.006202s
```
Lastly we ran a benchmark to demonstate the difference between master
and our cache when the number of modules increases. This benchmark
measures 1 ancestor, 30 ancestors, and 100 ancestors.
Ruby 3.0 master:
```
Warming up --------------------------------------
1 module 1.231M i/100ms
30 modules 432.020k i/100ms
100 modules 145.399k i/100ms
Calculating -------------------------------------
1 module 12.210M (± 2.1%) i/s - 61.553M in 5.043400s
30 modules 4.354M (± 2.7%) i/s - 22.033M in 5.063839s
100 modules 1.434M (± 2.9%) i/s - 7.270M in 5.072531s
Comparison:
1 module: 12209958.3 i/s
30 modules: 4354217.8 i/s - 2.80x (± 0.00) slower
100 modules: 1434447.3 i/s - 8.51x (± 0.00) slower
```
Ruby 3.0 with cvar cache:
```
Warming up --------------------------------------
1 module 1.641M i/100ms
30 modules 1.655M i/100ms
100 modules 1.620M i/100ms
Calculating -------------------------------------
1 module 16.279M (± 3.8%) i/s - 82.038M in 5.046923s
30 modules 15.891M (± 3.9%) i/s - 79.459M in 5.007958s
100 modules 16.087M (± 3.6%) i/s - 81.005M in 5.041931s
Comparison:
1 module: 16279458.0 i/s
100 modules: 16087484.6 i/s - same-ish: difference falls within error
30 modules: 15891406.2 i/s - same-ish: difference falls within error
```
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
This patch contains several ideas:
(1) Disposable inline method cache (IMC) for race-free inline method cache
* Making call-cache (CC) as a RVALUE (GC target object) and allocate new
CC on cache miss.
* This technique allows race-free access from parallel processing
elements like RCU.
(2) Introduce per-Class method cache (pCMC)
* Instead of fixed-size global method cache (GMC), pCMC allows flexible
cache size.
* Caching CCs reduces CC allocation and allow sharing CC's fast-path
between same call-info (CI) call-sites.
(3) Invalidate an inline method cache by invalidating corresponding method
entries (MEs)
* Instead of using class serials, we set "invalidated" flag for method
entry itself to represent cache invalidation.
* Compare with using class serials, the impact of method modification
(add/overwrite/delete) is small.
* Updating class serials invalidate all method caches of the class and
sub-classes.
* Proposed approach only invalidate the method cache of only one ME.
See [Feature #16614] for more details.
This reverts commits: 10d6a3aca78ba48c1b85fba8627dc1dd883de5ba6c6a25feca167e6b48f17cb96d41a53207979278595b3c4fdd1521f7cf89c11c5e69accf336082033632a812c0f56506be0d86427a3219 .
The reason for the revert is that we observe ABA problem around
inline method cache. When a cache misshits, we search for a
method entry. And if the entry is identical to what was cached
before, we reuse the cache. But the commits we are reverting here
introduced situations where a method entry is freed, then the
identical memory region is used for another method entry. An
inline method cache cannot detect that ABA.
Here is a code that reproduce such situation:
```ruby
require 'prime'
class << Integer
alias org_sqrt sqrt
def sqrt(n)
raise
end
GC.stress = true
Prime.each(7*37){} rescue nil # <- Here we populate CC
class << Object.new; end
# These adjacent remove-then-alias maneuver
# frees a method entry, then immediately
# reuses it for another.
remove_method :sqrt
alias sqrt org_sqrt
end
Prime.each(7*37).to_a # <- SEGV
```
For some reason symbols (or classes) are being overridden in trunk
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67598 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This commit adds the new method `GC.compact` and compacting GC support.
Please see this issue for caveats:
https://bugs.ruby-lang.org/issues/15626
[Feature #15626]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67576 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Because hard to specify commits related to r67479 only.
So please commit again.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67499 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This commit adds the new method `GC.compact` and compacting GC support.
Please see this issue for caveats:
https://bugs.ruby-lang.org/issues/15626
[Feature #15626]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67479 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* id_table.c: swtich to "simple open addressing with quadratic probing"
by Yura Sokolov. For more detail measurements, see [Feature #12180]
* id_table.c: remove other algorithms to simplify the source code.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57416 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* id_table.c (hash_table_extend): should not shrink the table than
the previous capacity. [ruby-core:76534] [Bug #12614]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55896 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* id_table.h (rb_id_table_iterator_result): add dummy sentinel
member because C standard prohibits a trailing comma.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55821 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* id_table.c (list_id_table_init): When unaligned word access is
prohibited and sizeof(VALUE) is 8 (64-bit machines),
capa should always be even number for 8-byte word alignment
of the values of a table. This code assumes that sizeof(ID) is 4,
sizeof(VALUE) is 8, and xmalloc() returns 8-byte aligned memory.
This fixes bus error on 64-bit SPARC Solaris 10.
[Bug #12406][ruby-dev:49631]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55086 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* error.c (rb_assert_failure): assertion with stack dump.
* ruby_assert.h (RUBY_ASSERT): new header for the assertion.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53615 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This allows us to swap in rb_id_table_memsize for st_memsize
(which takes a "const st_table *") more easily.
It also makes sense to do the same for rb_id_table_size,
too; as the table cannot be altered when accessing size.
* id_table.h (rb_id_table_size): const arg
(rb_id_table_memsize): ditto
* id_table.c (st_id_table_size): ditto
(st_id_table_memsize): ditto
(list_id_table_size): ditto
(list_id_table_memsize): ditto
(hash_id_table_size): ditto
(hash_id_table_memsize): ditto
(mix_id_table_size): ditto
(mix_id_table_memsize): ditto
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52428 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
list->hash transition because GC can run during transition.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52421 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* id_table.c (UNUSED): mark implementation functions maybe-unused
to suppress warnings by old gcc.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@51948 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* id_table.c (IMPL_TYPE, IMPL_VOID): make aliases if supported on
the platform.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@51683 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* id_table.h (rb_id_table_foreach_func_t): define callback
function type for rb_id_table_foreach().
* id_table.h (rb_id_table_foreach_values_func_t): ditto for
rb_id_table_foreach_values().
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@51682 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* id_table.c (IMPL1): use TOKEN_PASTE, and prevent `op' from
expansion.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@51562 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* id_table.c (IMPL): prepend id_table to the argument before its
expansion.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@51550 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
because jemalloc seems to replace the word `free' to `je_free'.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@51549 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
[Feature #11420]
This table only manage ID->VALUE table to reduce overhead of st.
Some functions prefixed rb_id_table_* are provided.
* id_table.c: implement rb_id_table_*.
There are several algorithms to implement it.
Now, there are roughly 4 types:
* st
* array
* hash (implemented by Yura Sokolov)
* mix of array and hash
The macro ID_TABLE_IMPL can choose implementation.
You can see detailes about them at the head of id_table.c.
At the default, I choose 34 (mix of list and hash).
This is not final decision.
Please report your suitable parameters or
your data structure.
* symbol.c: introduce rb_id_serial_t and rb_id_to_serial()
to represent ID by serial number.
* internal.h: use id_table for method tables.
* class.c, gc.c, marshal.c, vm.c, vm_method.c: ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@51541 b2dd03c8-39d4-4d8f-98ff-823fe69b080e