If the iseq only contains `opt_invokebuiltin_delegate_leave` insn and
the builtin-function (bf) is inline-able, the caller doesn't need to
build a method frame.
`vm_call_single_noarg_inline_builtin` is fast path for such cases.
* Revert "Remove special handling of `SIGCHLD`. (#7482)"
This reverts commit 44a0711eab.
* Revert "Remove prototypes for functions that are no longer used. (#7497)"
This reverts commit 4dce12bead.
* Revert "Remove SIGCHLD `waidpid`. (#7476)"
This reverts commit 1658e7d966.
* Fix change to rjit variable name.
We used to require MJIT is supported when YJIT is supported. However,
now that RJIT dropped some platforms that YJIT supports, it no longer
makes sense. We should be able to enable only YJIT, and vice versa.
This patch is follo-up of 0a82bfe.
Without this patch, if env is escaped (Proc'ed), strange svar
can be touched.
This patch tracks escaped env and use it.
The interrupt check will unintentionally release the VM lock when loading an iseq.
And this will cause issues with the `debug` gem's
[`ObjectSpace.each_iseq` method](0fcfc28aca/ext/debug/iseq_collector.c (L61-L67)),
which wraps iseqs with a wrapper and exposes their internal states when they're actually not ready to be used.
And when that happens, errors like this would occur and kill the `debug` gem's thread:
```
DEBUGGER: ReaderThreadError: uninitialized InstructionSequence
┃ DEBUGGER: Disconnected.
┃ ["/opt/rubies/ruby-3.2.0/lib/ruby/gems/3.2.0/gems/debug-1.7.1/lib/debug/breakpoint.rb:247:in `absolute_path'",
┃ "/opt/rubies/ruby-3.2.0/lib/ruby/gems/3.2.0/gems/debug-1.7.1/lib/debug/breakpoint.rb:247:in `block in iterate_iseq'",
┃ "/opt/rubies/ruby-3.2.0/lib/ruby/gems/3.2.0/gems/debug-1.7.1/lib/debug/breakpoint.rb:246:in `each_iseq'",
...
```
A way to reproduce the issue is to satisfy these conditions at the same time:
1. `debug` gem calling `ObjectSpace.each_iseq` (e.g. [activating a `LineBreakpoint`](0fcfc28aca/lib/debug/breakpoint.rb (L246))).
2. A large amount of iseq being loaded from another thread (possibly through the `bootsnap` gem).
3. 1 and 2 iterating through the same iseq(s) at the same time.
Because this issue requires external dependencies and a rather complicated timing setup to reproduce, I wasn't able to write a test case for it.
But here's some pseudo code to help reproduce it:
```rb
require "debug/session"
Thread.new do
100.times do
ObjectSpace.each_iseq do |iseq|
iseq.absolute_path
end
end
end
sleep 0.1
load_a_bunch_of_iseq
possibly_through_bootsnap
```
[Bug #19348]
Co-authored-by: Peter Zhu <peter@peterzhu.ca>
On the cfunc methods, if a splat argument is given, all array elements
are expanded on the VM stack and it can cause SystemStackError.
The idea to avoid it is making a hidden array to contain all parameters
and use this array as an argv.
This patch is reviesed version of https://github.com/ruby/ruby/pull/6816
The main change is all changes are closed around calling cfunc logic.
Fixes [Bug #4040]
Co-authored-by: Jeremy Evans <code@jeremyevans.net>
With this change, we're storing the iv name on an inline cache on
setinstancevariable instructions. This allows us to check the inline
cache to count instance variables set in initialize and give us an
estimate of iv capacity for an object.
For the purpose of estimating the number of instance variables required
for an object, we're assuming that all initialize methods will call
`super`.
This change allows us to estimate the number of instance variables
required without disassembling instruction sequences.
Co-Authored-By: Aaron Patterson <tenderlove@ruby-lang.org>
This commit moves ruby_basic_operators and the unredefined macros out of
vm_core.h and into basic_operators.h so that we can use them more
broadly in places where we currently use a method look up via
`rb_method_basic_definition_p` (e.g. object.c, numeric.c, complex.c,
enum.c, but also in internal/compar.h after introducing BOP_CMP and
elsewhere if we introduce more BOPs)
The most controversial part of this change is probably moving
redefined_flag out of rb_vm_t. [vm_opt_method_def_table and
vm_opt_mid_table](9da2a5204f/vm.c)
are not part of rb_vm_t either, and I think this fits well with those.
But more significantly it seems to result in one fewer instruction. For
example:
Before:
```
(lldb) disassemble -n vm_opt_str_freeze
miniruby`vm_exec_core:
miniruby[0x10028233e] <+14558>: movq 0x11a86b(%rip), %rax ; ruby_current_vm_ptr
miniruby[0x100282345] <+14565>: testb $0x4, 0x242c(%rax)
```
After:
```
(lldb) disassemble -n vm_opt_str_freeze
ruby`vm_exec_core:
ruby[0x100280ebe] <+14510>: testb $0x4, 0x120147(%rip) ; ruby_vm_redefined_flag + 43
```
Co-authored-by: John Hawthorn <jhawthorn@github.com>
By moving the two bools into a packing gap above the mark_bits
pointer/union we can save 8 bytes in the struct and avoid an extra cache
line (328 bytes vs 320 bytes).
Co-authored-by: Adam Hess <HParker@github.com>
Previously, for statically-linked extensions, we used
`vm->loading_table` to delay calling the init function until the
extensions are required. This caused the extensions to look like they
are in the middle of being loaded even before they're required.
(`rb_feature_p()` returned true with a loading path output.) Combined
with autoload, queries like `defined?(CONST)` and `Module#autoload?`
were confused by this and returned nil incorrectly. RubyGems uses
`defined?` to detect if OpenSSL is available and failed when OpenSSL was
available in builds using `--with-static-linked-ext`.
Use a dedicated table for the init functions instead of adding them to
the loading table. This lets us remove some logic from non-EXTSTATIC
builds.
[Bug #19115]
We can loosely predict the number of ivar sets on a class based on the
number of iv set instructions in the initialize method. This should give
us a more accurate estimate to use for initial size pool allocation,
which should in turn give us more cache hits.
Since object shapes store the capacity of an object, we no longer
need the numiv field on RObjects. This gives us one extra slot which
we can use to give embedded objects one more instance variable (for a
total of 3 ivs). This commit removes the concept of numiv from RObject.
This commit adds a `capacity` field to shapes, and adds shape
transitions whenever an object's capacity changes. Objects which are
allocated out of a bigger size pool will also make a transition from the
root shape to the shape with the correct capacity for their size pool
when they are allocated.
This commit will allow us to remove numiv from objects completely, and
will also mean we can guarantee that if two objects share shapes, their
IVs are in the same positions (an embedded and extended object cannot
share shapes). This will enable us to implement ivar sets in YJIT using
object shapes.
Co-Authored-By: Aaron Patterson <tenderlove@ruby-lang.org>
This patch pushes dummy frames when loading code for the
profiling purpose.
The following methods push a dummy frame:
* `Kernel#require`
* `Kernel#load`
* `RubyVM::InstructionSequence.compile_file`
* `RubyVM::InstructionSequence.load_from_binary`
https://bugs.ruby-lang.org/issues/18559
Prior to this commit, we were reading and writing ivar index and
shape ID in inline caches in two separate instructions when
getting and setting ivars. This meant there was a race condition
with ractors and these caches where one ractor could change
a value in the cache while another was still reading from it.
This commit instead reads and writes shape ID and ivar index to
inline caches atomically so there is no longer a race condition.
Co-Authored-By: Aaron Patterson <tenderlove@ruby-lang.org>
Co-Authored-By: John Hawthorn <john@hawthorn.email>
Object Shapes is used for accessing instance variables and representing the
"frozenness" of objects. Object instances have a "shape" and the shape
represents some attributes of the object (currently which instance variables are
set and the "frozenness"). Shapes form a tree data structure, and when a new
instance variable is set on an object, that object "transitions" to a new shape
in the shape tree. Each shape has an ID that is used for caching. The shape
structure is independent of class, so objects of different types can have the
same shape.
For example:
```ruby
class Foo
def initialize
# Starts with shape id 0
@a = 1 # transitions to shape id 1
@b = 1 # transitions to shape id 2
end
end
class Bar
def initialize
# Starts with shape id 0
@a = 1 # transitions to shape id 1
@b = 1 # transitions to shape id 2
end
end
foo = Foo.new # `foo` has shape id 2
bar = Bar.new # `bar` has shape id 2
```
Both `foo` and `bar` instances have the same shape because they both set
instance variables of the same name in the same order.
This technique can help to improve inline cache hits as well as generate more
efficient machine code in JIT compilers.
This commit also adds some methods for debugging shapes on objects. See
`RubyVM::Shape` for more details.
For more context on Object Shapes, see [Feature: #18776]
Co-Authored-By: Aaron Patterson <tenderlove@ruby-lang.org>
Co-Authored-By: Eileen M. Uchitelle <eileencodes@gmail.com>
Co-Authored-By: John Hawthorn <john@hawthorn.email>
Object Shapes is used for accessing instance variables and representing the
"frozenness" of objects. Object instances have a "shape" and the shape
represents some attributes of the object (currently which instance variables are
set and the "frozenness"). Shapes form a tree data structure, and when a new
instance variable is set on an object, that object "transitions" to a new shape
in the shape tree. Each shape has an ID that is used for caching. The shape
structure is independent of class, so objects of different types can have the
same shape.
For example:
```ruby
class Foo
def initialize
# Starts with shape id 0
@a = 1 # transitions to shape id 1
@b = 1 # transitions to shape id 2
end
end
class Bar
def initialize
# Starts with shape id 0
@a = 1 # transitions to shape id 1
@b = 1 # transitions to shape id 2
end
end
foo = Foo.new # `foo` has shape id 2
bar = Bar.new # `bar` has shape id 2
```
Both `foo` and `bar` instances have the same shape because they both set
instance variables of the same name in the same order.
This technique can help to improve inline cache hits as well as generate more
efficient machine code in JIT compilers.
This commit also adds some methods for debugging shapes on objects. See
`RubyVM::Shape` for more details.
For more context on Object Shapes, see [Feature: #18776]
Co-Authored-By: Aaron Patterson <tenderlove@ruby-lang.org>
Co-Authored-By: Eileen M. Uchitelle <eileencodes@gmail.com>
Co-Authored-By: John Hawthorn <john@hawthorn.email>
Previously YARV bytecode implemented constant caching by having a pair
of instructions, opt_getinlinecache and opt_setinlinecache, wrapping a
series of getconstant calls (with putobject providing supporting
arguments).
This commit replaces that pattern with a new instruction,
opt_getconstant_path, handling both getting/setting the inline cache and
fetching the constant on a cache miss.
This is implemented by storing the full constant path as a
null-terminated array of IDs inside of the IC structure. idNULL is used
to signal an absolute constant reference.
$ ./miniruby --dump=insns -e '::Foo::Bar::Baz'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,13)> (catch: FALSE)
0000 opt_getconstant_path <ic:0 ::Foo::Bar::Baz> ( 1)[Li]
0002 leave
The motivation for this is that we had increasingly found the need to
disassemble the instructions between the opt_getinlinecache and
opt_setinlinecache in order to determine the constant we are fetching,
or otherwise store metadata.
This disassembly was done:
* In opt_setinlinecache, to register the IC against the constant names
it is using for granular invalidation.
* In rb_iseq_free, to unregister the IC from the invalidation table.
* In YJIT to find the position of a opt_getinlinecache instruction to
invalidate it when the cache is populated
* In YJIT to register the constant names being used for invalidation.
With this change we no longe need disassemly for these (in fact
rb_iseq_each is now unused), as the list of constant names being
referenced is held in the IC. This should also make it possible to make
more optimizations in the future.
This may also reduce the size of iseqs, as previously each segment
required 32 bytes (on 64-bit platforms) for each constant segment. This
implementation only stores one ID per-segment.
There should be no significant performance change between this and the
previous implementation. Previously opt_getinlinecache was a "leaf"
instruction, but it included a jump (almost always to a separate cache
line). Now opt_getconstant_path is a non-leaf (it may
raise/autoload/call const_missing) but it does not jump. These seem to
even out.
catch_excep_t is a field that exists for MJIT. In the process of
rewriting MJIT in Ruby, I added API to convert 1/0 of _Bool to
true/false, and it seemed confusing and hard to maintain if you
don't use _Bool for *_p fields.
* Simplify around `USE_YJIT` macro
- Use `USE_YJIT` macro only instead of `YJIT_BUILD`.
- An intermediate macro `YJIT_SUPPORTED_P` is no longer used.
* Bail out if YJIT is enabled on unsupported platforms
This commit adds a bitfield to the iseq body that stores offsets inside
the iseq buffer that contain values we need to mark. We can use this
bitfield to mark objects instead of disassembling the instructions.
This commit also groups inline storage entries and adds a counter for
each entry. This allows us to iterate and mark each entry without
disassembling instructions
Since we have a bitfield and grouped inline caches, we can mark all
VALUE objects associated with instructions without actually
disassembling the instructions at mark time.
[Feature #18875] [ruby-core:109042]
`NON_SCALAR_THREAD_ID` shows `pthread_t` is non-scalar (non-pointer)
and only s390x is known platform. However, the supporting code is
very complex and it is only used for deubg print information.
So this patch removes the support of `NON_SCALAR_THREAD_ID`
and make the code simple.
`rb_thread_t::serial` is auto-incremented serial number for
threads and it can overflow, it means the serial is not a ID
for each thread, it is only for debug print.
`RUBY_DEBUG_LOG` shows this information.
Also skip EC related information if EC is NULL. This patch
enable to use `RUBY_DEBUG_LOG` without setup EC.
In December 2021, we opened an [issue] to solicit feedback regarding the
porting of the YJIT codebase from C99 to Rust. There were some
reservations, but this project was given the go ahead by Ruby core
developers and Matz. Since then, we have successfully completed the port
of YJIT to Rust.
The new Rust version of YJIT has reached parity with the C version, in
that it passes all the CRuby tests, is able to run all of the YJIT
benchmarks, and performs similarly to the C version (because it works
the same way and largely generates the same machine code). We've even
incorporated some design improvements, such as a more fine-grained
constant invalidation mechanism which we expect will make a big
difference in Ruby on Rails applications.
Because we want to be careful, YJIT is guarded behind a configure
option:
```shell
./configure --enable-yjit # Build YJIT in release mode
./configure --enable-yjit=dev # Build YJIT in dev/debug mode
```
By default, YJIT does not get compiled and cargo/rustc is not required.
If YJIT is built in dev mode, then `cargo` is used to fetch development
dependencies, but when building in release, `cargo` is not required,
only `rustc`. At the moment YJIT requires Rust 1.60.0 or newer.
The YJIT command-line options remain mostly unchanged, and more details
about the build process are documented in `doc/yjit/yjit.md`.
The CI tests have been updated and do not take any more resources than
before.
The development history of the Rust port is available at the following
commit for interested parties:
1fd9573d8b
Our hope is that Rust YJIT will be compiled and included as a part of
system packages and compiled binaries of the Ruby 3.2 release. We do not
anticipate any major problems as Rust is well supported on every
platform which YJIT supports, but to make sure that this process works
smoothly, we would like to reach out to those who take care of building
systems packages before the 3.2 release is shipped and resolve any
issues that may come up.
[issue]: https://bugs.ruby-lang.org/issues/18481
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
Co-authored-by: Noah Gibbs <the.codefolio.guy@gmail.com>
Co-authored-by: Kevin Newton <kddnewton@gmail.com>
`rb_thread_t` contained `native_thread_data_t` to represent
thread implementation dependent data. This patch separates
them and rename it `rb_native_thread` and point it from
`rb_thraed_t`.
Now, 1 Ruby thread (`rb_thread_t`) has 1 native thread (`rb_native_thread`).
Now GVL is not process *Global* so this patch try to use
another words.
* `rb_global_vm_lock_t` -> `struct rb_thread_sched`
* `gvl->owner` -> `sched->running`
* `gvl->waitq` -> `sched->readyq`
* `rb_gvl_init` -> `rb_thread_sched_init`
* `gvl_destroy` -> `rb_thread_sched_destroy`
* `gvl_acquire` -> `thread_sched_to_running` # waiting -> ready -> running
* `gvl_release` -> `thread_sched_to_waiting` # running -> waiting
* `gvl_yield` -> `thread_sched_yield`
* `GVL_UNLOCK_BEGIN` -> `THREAD_BLOCKING_BEGIN`
* `GVL_UNLOCK_END` -> `THREAD_BLOCKING_END`
* removed
* `rb_ractor_gvl`
* `rb_vm_gvl_destroy` (not used)
There are GVL functions such as `rb_thread_call_without_gvl()` yet
but I don't have good name to replace them. Maybe GVL stands for
"Greate Valuable Lock" or something like that.
This commit reintroduces finer-grained constant cache invalidation.
After 8008fb7 got merged, it was causing issues on token-threaded
builds (such as on Windows).
The issue was that when you're iterating through instruction sequences
and using the translator functions to get back the instruction structs,
you're either using `rb_vm_insn_null_translator` or
`rb_vm_insn_addr2insn2` depending if it's a direct-threading build.
`rb_vm_insn_addr2insn2` does some normalization to always return to
you the non-trace version of whatever instruction you're looking at.
`rb_vm_insn_null_translator` does not do that normalization.
This means that when you're looping through the instructions if you're
trying to do an opcode comparison, it can change depending on the type
of threading that you're using. This can be very confusing. So, this
commit creates a new translator function
`rb_vm_insn_normalizing_translator` to always return the non-trace
version so that opcode comparisons don't have to worry about different
configurations.
[Feature #18589]
This reverts commits for [Feature #18589]:
* 8008fb7352
"Update formatting per feedback"
* 8f6eaca2e1
"Delete ID from constant cache table if it becomes empty on ISEQ free"
* 629908586b
"Finer-grained inline constant cache invalidation"
MSWin builds on AppVeyor have been crashing since the merger.
Current behavior - caches depend on a global counter. All constant mutations cause caches to be invalidated.
```ruby
class A
B = 1
end
def foo
A::B # inline cache depends on global counter
end
foo # populate inline cache
foo # hit inline cache
C = 1 # global counter increments, all caches are invalidated
foo # misses inline cache due to `C = 1`
```
Proposed behavior - caches depend on name components. Only constant mutations with corresponding names will invalidate the cache.
```ruby
class A
B = 1
end
def foo
A::B # inline cache depends constants named "A" and "B"
end
foo # populate inline cache
foo # hit inline cache
C = 1 # caches that depend on the name "C" are invalidated
foo # hits inline cache because IC only depends on "A" and "B"
```
Examples of breaking the new cache:
```ruby
module C
# Breaks `foo` cache because "A" constant is set and the cache in foo depends
# on "A" and "B"
class A; end
end
B = 1
```
We expect the new cache scheme to be invalidated less often because names aren't frequently reused. With the cache being invalidated less, we can rely on its stability more to keep our constant references fast and reduce the need to throw away generated code in YJIT.
Use ISEQ_BODY macro to get the rb_iseq_constant_body of the ISeq. Using
this macro will make it easier for us to change the allocation strategy
of rb_iseq_constant_body when using Variable Width Allocation.
`overloaded_cme_table` keeps cme -> monly_cme pairs to manage
corresponding `monly_cme` for `cme`. The lifetime of the `monly_cme`
should be longer than `monly_cme`, but the previous patch losts the
reference to the living `monly_cme`.
Now `overloaded_cme_table` values are always root (keys are only weak
reference), it means `monly_cme` does not freed until corresponding
`cme` is invalidated.
To make managing easy, move `overloaded_cme_table` to `rb_vm_t`.
It free `rb_hook_list_t` itself if needed. To recognize the
need, this patch introduced `rb_hook_list_t::is_local` flag.
This patch is succession of https://github.com/ruby/ruby/pull/4652
Compare with the C methods, A built-in methods written in Ruby is
slower if only mandatory parameters are given because it needs to
check the argumens and fill default values for optional and keyword
parameters (C methods can check the number of parameters with `argc`,
so there are no overhead). Passing mandatory arguments are common
(optional arguments are exceptional, in many cases) so it is important
to provide the fast path for such common cases.
`Primitive.mandatory_only?` is a special builtin function used with
`if` expression like that:
```ruby
def self.at(time, subsec = false, unit = :microsecond, in: nil)
if Primitive.mandatory_only?
Primitive.time_s_at1(time)
else
Primitive.time_s_at(time, subsec, unit, Primitive.arg!(:in))
end
end
```
and it makes two ISeq,
```
def self.at(time, subsec = false, unit = :microsecond, in: nil)
Primitive.time_s_at(time, subsec, unit, Primitive.arg!(:in))
end
def self.at(time)
Primitive.time_s_at1(time)
end
```
and (2) is pointed by (1). Note that `Primitive.mandatory_only?`
should be used only in a condition of an `if` statement and the
`if` statement should be equal to the methdo body (you can not
put any expression before and after the `if` statement).
A method entry with `mandatory_only?` (`Time.at` on the above case)
is marked as `iseq_overload`. When the method will be dispatch only
with mandatory arguments (`Time.at(0)` for example), make another
method entry with ISeq (2) as mandatory only method entry and it
will be cached in an inline method cache.
The idea is similar discussed in https://bugs.ruby-lang.org/issues/16254
but it only checks mandatory parameters or more, because many cases
only mandatory parameters are given. If we find other cases (optional
or keyword parameters are used frequently and it hurts performance),
we can extend the feature.
`RubyVM.keep_script_lines` enables to keep script lines
for each ISeq and AST. This feature is for debugger/REPL
support.
```ruby
RubyVM.keep_script_lines = true
RubyVM::keep_script_lines = true
eval("def foo = nil\ndef bar = nil")
pp RubyVM::InstructionSequence.of(method(:foo)).script_lines
```
When YJIT make calls to routines without reconstructing interpreter
state through jit_prepare_routine_call(), it relies on the routine to
never allocate, raise, and push/pop control frames. Comment about this
on the routines that YJTI calls.
This is probably something we should dynamically verify on debug builds.
It's hard to statically verify this as it requires verifying all
functions in the call tree. Maybe something to look at in the future.
Make sure `opt_getinlinecache` is in a block all on its own, and
invalidate it from the interpreter when `opt_setinlinecache`.
It will recompile with a filled cache the second time around.
This lets YJIT runs well when the IC for constant is cold.
* Tie lifetime of uJIT blocks to iseqs
Blocks weren't being freed when iseqs are collected.
* Add rb_dary. Use it for method dependency table
* Keep track of blocks per iseq
Remove global version_tbl
* Block version bookkeeping fix
* dary -> darray
* free ujit_blocks
* comment about size of ujit_blocks
This fixes issues with paths being loaded twice in certain cases
when symlinks are used.
It took me multiple attempts to get this working. My original
attempt tried to convert paths to realpaths before adding them
to $LOADED_FEATURES. Unfortunately, this doesn't work well
with the loaded feature index, which is based off load paths
and not realpaths. While I was able to get require working, I'm
fairly sure the loaded feature index was not being used as
expected, which would have significant performance implications.
Additionally, I was never able to get that approach working with
autoload when autoloading a non-realpath file. It also broke
some specs.
This takes a more conservative approach. Directly before loading the
file, if the file with the same realpath has been required, the
loading of the file is skipped. The realpaths are stored as
fstrings in a hidden hash.
When rebuilding the loaded feature index, the hash of realpaths
is also rebuilt. I'm guessing this makes rebuilding process
slower, but I don think that is a hot path. In general, modifying
loaded features is only done when reloading, and that tends to be
in non-production environments.
Change test_require_with_loaded_features_pop test to use 30 threads
and 300 iterations, instead of 4 threads and 1000 iterations.
I saw only sporadic failures with 4/1000, but consistent failures
30/300 threads. These failures were due to the fact that the
concurrent deletions from $LOADED_FEATURES in other threads can
result in rb_ary_entry returning nil when rebuilding the loaded
features index.
To avoid concurrency issues when rebuilding the loaded features
index, the building of the index itself is left alone, and
afterwards, a separate loop is done on a copy of the loaded feature
snapshot in order to rebuild the realpaths hash.
Fixes [Bug #17885]
This fixes issues with paths being loaded twice in certain cases
when symlinks are used.
It took me multiple attempts to get this working. My original
attempt tried to convert paths to realpaths before adding them
to $LOADED_FEATURES. Unfortunately, this doesn't work well
with the loaded feature index, which is based off load paths
and not realpaths. While I was able to get require working, I'm
fairly sure the loaded feature index was not being used as
expected, which would have significant performance implications.
Additionally, I was never able to get that approach working with
autoload when autoloading a non-realpath file. It also broke
some specs.
This takes a more conservative approach. Directly before loading the
file, if the file with the same realpath has been required, the
loading of the file is skipped. The realpaths are stored as
fstrings in a hidden hash.
When rebuilding the loaded feature index, the hash of realpaths
is also rebuilt. I'm guessing this makes rebuilding process
slower, but I don think that is a hot path. In general, modifying
loaded features is only done when reloading, and that tends to be
in non-production environments.
Change test_require_with_loaded_features_pop test to use 30 threads
and 300 iterations, instead of 4 threads and 1000 iterations.
I saw only sporadic failures with 4/1000, but consistent failures
30/300 threads. These failures were due to the fact that the
concurrent deletions from $LOADED_FEATURES in other threads can
result in rb_ary_entry returning nil when rebuilding the loaded
features index.
To avoid concurrency issues when rebuilding the loaded features
index, the building of the index itself is left alone, and
afterwards, a separate loop is done on a copy of the loaded feature
snapshot in order to rebuild the realpaths hash.
Fixes [Bug #17885]
The altstack memory of a thread may be free'ed even after the VM is
destructed. After that, GC is no longer available, so calling xfree
may lead to a segfault.
This changeset uses the bare free function to free the altstack memory
instead of xfree. [Bug #18126]
Redo of 34a2acdac788602c14bf05fb616215187badd504 and
931138b00696419945dc03e10f033b1f53cd50f3 which were reverted.
GitHub PR #4340.
This change implements a cache for class variables. Previously there was
no cache for cvars. Cvar access is slow due to needing to travel all the
way up th ancestor tree before returning the cvar value. The deeper the
ancestor tree the slower cvar access will be.
The benefits of the cache are more visible with a higher number of
included modules due to the way Ruby looks up class variables. The
benchmark here includes 26 modules and shows with the cache, this branch
is 6.5x faster when accessing class variables.
```
compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105c) [x86_64-darwin19]
built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be009) [x86_64-darwin19]
| |compare-ruby|built-ruby|
|:--------|-----------:|---------:|
|vm_cvar | 5.681M| 36.980M|
| | -| 6.51x|
```
Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails
application. ActiveRecord::Base.logger has 71 ancestors. The more
ancestors a tree has, the more clear the speed increase. IE if Base had
only one ancestor we'd see no improvement. This benchmark is run on a
vanilla Rails application.
Benchmark code:
```ruby
require "benchmark/ips"
require_relative "config/environment"
Benchmark.ips do |x|
x.report "logger" do
ActiveRecord::Base.logger
end
end
```
Ruby 3.0 master / Rails 6.1:
```
Warming up --------------------------------------
logger 155.251k i/100ms
Calculating -------------------------------------
```
Ruby 3.0 with cvar cache / Rails 6.1:
```
Warming up --------------------------------------
logger 1.546M i/100ms
Calculating -------------------------------------
logger 14.857M (± 4.8%) i/s - 74.198M in 5.006202s
```
Lastly we ran a benchmark to demonstate the difference between master
and our cache when the number of modules increases. This benchmark
measures 1 ancestor, 30 ancestors, and 100 ancestors.
Ruby 3.0 master:
```
Warming up --------------------------------------
1 module 1.231M i/100ms
30 modules 432.020k i/100ms
100 modules 145.399k i/100ms
Calculating -------------------------------------
1 module 12.210M (± 2.1%) i/s - 61.553M in 5.043400s
30 modules 4.354M (± 2.7%) i/s - 22.033M in 5.063839s
100 modules 1.434M (± 2.9%) i/s - 7.270M in 5.072531s
Comparison:
1 module: 12209958.3 i/s
30 modules: 4354217.8 i/s - 2.80x (± 0.00) slower
100 modules: 1434447.3 i/s - 8.51x (± 0.00) slower
```
Ruby 3.0 with cvar cache:
```
Warming up --------------------------------------
1 module 1.641M i/100ms
30 modules 1.655M i/100ms
100 modules 1.620M i/100ms
Calculating -------------------------------------
1 module 16.279M (± 3.8%) i/s - 82.038M in 5.046923s
30 modules 15.891M (± 3.9%) i/s - 79.459M in 5.007958s
100 modules 16.087M (± 3.6%) i/s - 81.005M in 5.041931s
Comparison:
1 module: 16279458.0 i/s
100 modules: 16087484.6 i/s - same-ish: difference falls within error
30 modules: 15891406.2 i/s - same-ish: difference falls within error
```
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>