This commit adds an assertion has been added after `gc_page_sweep` to
verify that the freelist length is equal to the number of free slots in
the page.
When a Ractor is removed, the freelist in the Ractor cache is not
returned to the GC, leaving the freelist permanently lost. This commit
recycles the freelist when the Ractor is destroyed, preventing a memory
leak from occurring.
If we force recycle an object before the page is swept, we should clear
it in the mark bitmap. If we don't clear it in the bitmap, then during
sweeping we won't account for this free slot so the `free_slots` count
of the page will be incorrect.
When running btest there is a crash when compiled with
RGENGC_CHECK_MODE=4. The crash happens because `during_gc` is not
turned off before `gc_marks_check` is called, causing the marking to
happen on the main mark stack instead of mark stack created in
`objspace_allrefs`.
Redo of 34a2acdac788602c14bf05fb616215187badd504 and
931138b00696419945dc03e10f033b1f53cd50f3 which were reverted.
GitHub PR #4340.
This change implements a cache for class variables. Previously there was
no cache for cvars. Cvar access is slow due to needing to travel all the
way up th ancestor tree before returning the cvar value. The deeper the
ancestor tree the slower cvar access will be.
The benefits of the cache are more visible with a higher number of
included modules due to the way Ruby looks up class variables. The
benchmark here includes 26 modules and shows with the cache, this branch
is 6.5x faster when accessing class variables.
```
compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105c) [x86_64-darwin19]
built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be009) [x86_64-darwin19]
| |compare-ruby|built-ruby|
|:--------|-----------:|---------:|
|vm_cvar | 5.681M| 36.980M|
| | -| 6.51x|
```
Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails
application. ActiveRecord::Base.logger has 71 ancestors. The more
ancestors a tree has, the more clear the speed increase. IE if Base had
only one ancestor we'd see no improvement. This benchmark is run on a
vanilla Rails application.
Benchmark code:
```ruby
require "benchmark/ips"
require_relative "config/environment"
Benchmark.ips do |x|
x.report "logger" do
ActiveRecord::Base.logger
end
end
```
Ruby 3.0 master / Rails 6.1:
```
Warming up --------------------------------------
logger 155.251k i/100ms
Calculating -------------------------------------
```
Ruby 3.0 with cvar cache / Rails 6.1:
```
Warming up --------------------------------------
logger 1.546M i/100ms
Calculating -------------------------------------
logger 14.857M (± 4.8%) i/s - 74.198M in 5.006202s
```
Lastly we ran a benchmark to demonstate the difference between master
and our cache when the number of modules increases. This benchmark
measures 1 ancestor, 30 ancestors, and 100 ancestors.
Ruby 3.0 master:
```
Warming up --------------------------------------
1 module 1.231M i/100ms
30 modules 432.020k i/100ms
100 modules 145.399k i/100ms
Calculating -------------------------------------
1 module 12.210M (± 2.1%) i/s - 61.553M in 5.043400s
30 modules 4.354M (± 2.7%) i/s - 22.033M in 5.063839s
100 modules 1.434M (± 2.9%) i/s - 7.270M in 5.072531s
Comparison:
1 module: 12209958.3 i/s
30 modules: 4354217.8 i/s - 2.80x (± 0.00) slower
100 modules: 1434447.3 i/s - 8.51x (± 0.00) slower
```
Ruby 3.0 with cvar cache:
```
Warming up --------------------------------------
1 module 1.641M i/100ms
30 modules 1.655M i/100ms
100 modules 1.620M i/100ms
Calculating -------------------------------------
1 module 16.279M (± 3.8%) i/s - 82.038M in 5.046923s
30 modules 15.891M (± 3.9%) i/s - 79.459M in 5.007958s
100 modules 16.087M (± 3.6%) i/s - 81.005M in 5.041931s
Comparison:
1 module: 16279458.0 i/s
100 modules: 16087484.6 i/s - same-ish: difference falls within error
30 modules: 15891406.2 i/s - same-ish: difference falls within error
```
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
heap_set_increment essentially only calls heap_allocatable_pages_set.
They only differ in behaviour when `additional_pages == 0`. However,
this is only possible because heap_extend_pages may return 0. This
commit also changes heap_extend_pages to always return at least 1.
If we are during incremental sweeping when calling gc_set_initial_pages
there is an assertion error. The following patch will artificially
produce the bug:
```
diff --git a/gc.c b/gc.c
index c3157dbe2c..d7282cf8f0 100644
--- a/gc.c
+++ b/gc.c
@@ -404,7 +404,7 @@ int ruby_rgengc_debug;
* 5: show all references
*/
#ifndef RGENGC_CHECK_MODE
-#define RGENGC_CHECK_MODE 0
+#define RGENGC_CHECK_MODE 1
#endif
// Note: using RUBY_ASSERT_WHEN() extend a macro in expr (info by nobu).
@@ -10821,6 +10821,10 @@ gc_set_initial_pages(void)
void
ruby_gc_set_params(void)
{
+ for (int i = 0; i < 10000; i++) {
+ rb_ary_new();
+ }
+
/* RUBY_GC_HEAP_FREE_SLOTS */
if (get_envparam_size("RUBY_GC_HEAP_FREE_SLOTS", &gc_params.heap_free_slots, 0)) {
/* ok */
```
The crash looks like:
```
Assertion Failed: ../gc.c:2038:heap_add_page:!(heap == heap_eden && heap->sweeping_page)
```
NUM_IN_PAGE(page->start) will sometimes return a 0 or a 1 depending on
how the alignment of the 40 byte slots work out. This commit uses the
NUM_IN_PAGE function to shift the bitmap down on the first bitmap plane.
Iterating on the first bitmap plane is "special", but this commit allows
us to align object addresses on something besides 40 bytes, and also
eliminates the need to fill guard bits.
We are only iterating over the eden heap so `heap_eden->total_pages`
contains the exact number of pages we need to allocate for.
`heap_allocated_pages` may contain pages in the tomb.
Instead of keeping track of the current bit plane, keep track of the
actual slot when compacting. This means we don't need to re-scan
objects inside the same bit plane when we continue with movement
When objects are popped from the mark stack, we check that the object is
the right type (otherwise an rb_bug happens). The problem is that when
we pop a bad object from the stack, we have no idea what pushed the bad
object on the stack.
This change makes an error happen when a bad object is pushed on the
mark stack, that way we can track down the source of the bug.
Since compaction can be concurrent, the machine stack is allowed to
change while compaction is happening. When compaction finishes, there
may be references on the machine stack that need to be reverted so that
we can remove the read barrier.
This change implements a cache for class variables. Previously there was
no cache for cvars. Cvar access is slow due to needing to travel all the
way up th ancestor tree before returning the cvar value. The deeper the
ancestor tree the slower cvar access will be.
The benefits of the cache are more visible with a higher number of
included modules due to the way Ruby looks up class variables. The
benchmark here includes 26 modules and shows with the cache, this branch
is 6.5x faster when accessing class variables.
```
compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105ca45) [x86_64-darwin19]
built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be0093ae) [x86_64-darwin19]
| |compare-ruby|built-ruby|
|:--------|-----------:|---------:|
|vm_cvar | 5.681M| 36.980M|
| | -| 6.51x|
```
Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails
application. ActiveRecord::Base.logger has 71 ancestors. The more
ancestors a tree has, the more clear the speed increase. IE if Base had
only one ancestor we'd see no improvement. This benchmark is run on a
vanilla Rails application.
Benchmark code:
```ruby
require "benchmark/ips"
require_relative "config/environment"
Benchmark.ips do |x|
x.report "logger" do
ActiveRecord::Base.logger
end
end
```
Ruby 3.0 master / Rails 6.1:
```
Warming up --------------------------------------
logger 155.251k i/100ms
Calculating -------------------------------------
```
Ruby 3.0 with cvar cache / Rails 6.1:
```
Warming up --------------------------------------
logger 1.546M i/100ms
Calculating -------------------------------------
logger 14.857M (± 4.8%) i/s - 74.198M in 5.006202s
```
Lastly we ran a benchmark to demonstate the difference between master
and our cache when the number of modules increases. This benchmark
measures 1 ancestor, 30 ancestors, and 100 ancestors.
Ruby 3.0 master:
```
Warming up --------------------------------------
1 module 1.231M i/100ms
30 modules 432.020k i/100ms
100 modules 145.399k i/100ms
Calculating -------------------------------------
1 module 12.210M (± 2.1%) i/s - 61.553M in 5.043400s
30 modules 4.354M (± 2.7%) i/s - 22.033M in 5.063839s
100 modules 1.434M (± 2.9%) i/s - 7.270M in 5.072531s
Comparison:
1 module: 12209958.3 i/s
30 modules: 4354217.8 i/s - 2.80x (± 0.00) slower
100 modules: 1434447.3 i/s - 8.51x (± 0.00) slower
```
Ruby 3.0 with cvar cache:
```
Warming up --------------------------------------
1 module 1.641M i/100ms
30 modules 1.655M i/100ms
100 modules 1.620M i/100ms
Calculating -------------------------------------
1 module 16.279M (± 3.8%) i/s - 82.038M in 5.046923s
30 modules 15.891M (± 3.9%) i/s - 79.459M in 5.007958s
100 modules 16.087M (± 3.6%) i/s - 81.005M in 5.041931s
Comparison:
1 module: 16279458.0 i/s
100 modules: 16087484.6 i/s - same-ish: difference falls within error
30 modules: 15891406.2 i/s - same-ish: difference falls within error
```
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
The current fix for PAGE_SIZE macro detection in autoconf does not work
correctly. I see the following output with running configure on Linux:
```
checking PAGE_SIZE is defined... no
```
Linux has PAGE_SIZE macro. This is happening because the macro exists in
sys/user.h and not in the malloc headers.
This allows us to allocate the right size for the object in advance,
meaning that we don't have to pay the cost of ivar table extension
later. The idea is that if an object type ever became "extended" at
some point, then it is very likely it will become extended again. So we
may as well allocate the ivar table up front.
Previously imemo_ast was handled as WB-protected which caused a segfault
of the following code:
# shareable_constant_value: literal
M0 = {}
M1 = {}
...
M100000 = {}
My analysis is here: `shareable_constant_value: literal` creates many
Hash instances during parsing, and add them to node_buffer of imemo_ast.
However, the contents are missed because imemo_ast is incorrectly
WB-protected.
This changeset makes imemo_ast as WB-unprotected.