`overloaded_cme_table` keeps cme -> monly_cme pairs to manage
corresponding `monly_cme` for `cme`. The lifetime of the `monly_cme`
should be longer than `monly_cme`, but the previous patch losts the
reference to the living `monly_cme`.
Now `overloaded_cme_table` values are always root (keys are only weak
reference), it means `monly_cme` does not freed until corresponding
`cme` is invalidated.
To make managing easy, move `overloaded_cme_table` to `rb_vm_t`.
Previously when redefining an alias of a BOP, we would unnecessarily
invalidate the bop. For example:
class String
alias len length
private :len
end
This commit avoids this by checking that the called_id on the method
entry matches the original_id on the definition.
* Lazily create singletons on instance_{exec,eval}
Previously when instance_exec or instance_eval was called on an object,
that object would be given a singleton class so that method
definitions inside the block would be added to the object rather than
its class.
This commit aims to improve performance by delaying the creation of the
singleton class unless/until one is needed for method definition. Most
of the time instance_eval is used without any method definition.
This was implemented by adding a flag to the cref indicating that it
represents a singleton of the object rather than a class itself. In this
case CREF_CLASS returns the object's existing class, but in cases that
we are defining a method (either via definemethod or
VM_SPECIAL_OBJECT_CBASE which is used for undef and alias).
This also happens to fix what I believe is a bug. Previously
instance_eval behaved differently with regards to constant access for
true/false/nil than for all other objects. I don't think this was
intentional.
String::Foo = "foo"
"".instance_eval("Foo") # => "foo"
Integer::Foo = "foo"
123.instance_eval("Foo") # => "foo"
TrueClass::Foo = "foo"
true.instance_eval("Foo") # NameError: uninitialized constant Foo
This also slightly changes the error message when trying to define a method
through instance_eval on an object which can't have a singleton class.
Before:
$ ruby -e '123.instance_eval { def foo; end }'
-e:1:in `block in <main>': no class/module to add method (TypeError)
After:
$ ./ruby -e '123.instance_eval { def foo; end }'
-e:1:in `block in <main>': can't define singleton (TypeError)
IMO this error is a small improvement on the original and better matches
the (both old and new) message when definging a method using `def self.`
$ ruby -e '123.instance_eval{ def self.foo; end }'
-e:1:in `block in <main>': can't define singleton (TypeError)
Co-authored-by: Matthew Draper <matthew@trebex.net>
* Remove "under" argument from yield_under
* Move CREF_SINGLETON_SET into vm_cref_new
* Simplify vm_get_const_base
* Fix leaf VM_SPECIAL_OBJECT_CONST_BASE
Co-authored-by: Matthew Draper <matthew@trebex.net>
The main impetus for this change is to fix [Bug #13392]. Previously, we
fired the "return" TracePoint event after popping the stack frame for
the block running as method (BMETHOD). This gave undesirable source
location outputs as the return event normally fires right before the
frame going away.
The iseq for each block can run both as a block and as a method. To
accommodate that, this commit makes vm_trace() fire call/return events for
instructions that have b_call/b_return events attached when the iseq is
running as a BMETHOD. The logic for rewriting to "trace_*" instruction
is tweaked so that when the user listens to call/return events,
instructions with b_call/b_return become trace variants.
To continue to provide the return value for non-local returns done using
the "return" or "break" keyword inside BMETHODs, the stack unwinding
code is tweaked. b_return events now provide the same return value as
return events for these non-local cases. A pre-existing test deemed not
providing a return value for these b_return events as a limitation.
This commit removes the checks for call/return TracePoint events that
happen when calling into BMETHODs when no TracePoints are active.
Technically, migrating just the return event is enough to fix the bug,
but migrating both call and return removes our reliance on
`VM_FRAME_FLAG_FINISH` and re-entering the interpreter when the caller
is already in the interpreter.
The implementation of a local variable tables was represented as `ID*`,
but it was very hacky: the first element is not an ID but the size of
the table, and, the last element is (sometimes) a link to the next local
table only when the id tables are a linked list.
This change converts the hacky implementation to a normal struct.
As the ID serial is 32bit value and internal IDs created in the
parser are assigned from its maximum value, Symbol converted from
it will exceed 32bit and overflow on 32bit platforms.
Refinement#import_methods imports methods from modules.
Unlike Module#include, it copies methods and adds them into the refinement,
so the refinement is activated in the imported methods.
[Bug #17429] [ruby-core:101639]
`RubyVM.keep_script_lines` enables to keep script lines
for each ISeq and AST. This feature is for debugger/REPL
support.
```ruby
RubyVM.keep_script_lines = true
RubyVM::keep_script_lines = true
eval("def foo = nil\ndef bar = nil")
pp RubyVM::InstructionSequence.of(method(:foo)).script_lines
```
The interpreter instruction count was enabled based on RUBY_DEBUG as
opposed to YJIT_STATS. In builds with YJIT_STATS=1 but RUBY_DEBUG=0,
the count was not available.
Move YJIT_STATS in yjit.h where declarations are expoed to code outside
of YJIT. Also reduce the changes made to the interpreter for calling
into YJIT's instruction counting function.
env_copy() uses rb_ary_delete_at() with a loop counting up while
iterating through the list of read only locals. rb_ary_delete_at() can
shift elements in the array to an index lesser than the loop index,
causing locals to be missed and set to Qfalse in the returned
environment.
Iterate through the locals in reverse instead, this way the shifting
never happens for locals that are yet to be visited and we process all
the locals in the array.
[Bug #18023]
This fixes issues with paths being loaded twice in certain cases
when symlinks are used.
It took me multiple attempts to get this working. My original
attempt tried to convert paths to realpaths before adding them
to $LOADED_FEATURES. Unfortunately, this doesn't work well
with the loaded feature index, which is based off load paths
and not realpaths. While I was able to get require working, I'm
fairly sure the loaded feature index was not being used as
expected, which would have significant performance implications.
Additionally, I was never able to get that approach working with
autoload when autoloading a non-realpath file. It also broke
some specs.
This takes a more conservative approach. Directly before loading the
file, if the file with the same realpath has been required, the
loading of the file is skipped. The realpaths are stored as
fstrings in a hidden hash.
When rebuilding the loaded feature index, the hash of realpaths
is also rebuilt. I'm guessing this makes rebuilding process
slower, but I don think that is a hot path. In general, modifying
loaded features is only done when reloading, and that tends to be
in non-production environments.
Change test_require_with_loaded_features_pop test to use 30 threads
and 300 iterations, instead of 4 threads and 1000 iterations.
I saw only sporadic failures with 4/1000, but consistent failures
30/300 threads. These failures were due to the fact that the
concurrent deletions from $LOADED_FEATURES in other threads can
result in rb_ary_entry returning nil when rebuilding the loaded
features index.
To avoid concurrency issues when rebuilding the loaded features
index, the building of the index itself is left alone, and
afterwards, a separate loop is done on a copy of the loaded feature
snapshot in order to rebuild the realpaths hash.
Fixes [Bug #17885]
This fixes issues with paths being loaded twice in certain cases
when symlinks are used.
It took me multiple attempts to get this working. My original
attempt tried to convert paths to realpaths before adding them
to $LOADED_FEATURES. Unfortunately, this doesn't work well
with the loaded feature index, which is based off load paths
and not realpaths. While I was able to get require working, I'm
fairly sure the loaded feature index was not being used as
expected, which would have significant performance implications.
Additionally, I was never able to get that approach working with
autoload when autoloading a non-realpath file. It also broke
some specs.
This takes a more conservative approach. Directly before loading the
file, if the file with the same realpath has been required, the
loading of the file is skipped. The realpaths are stored as
fstrings in a hidden hash.
When rebuilding the loaded feature index, the hash of realpaths
is also rebuilt. I'm guessing this makes rebuilding process
slower, but I don think that is a hot path. In general, modifying
loaded features is only done when reloading, and that tends to be
in non-production environments.
Change test_require_with_loaded_features_pop test to use 30 threads
and 300 iterations, instead of 4 threads and 1000 iterations.
I saw only sporadic failures with 4/1000, but consistent failures
30/300 threads. These failures were due to the fact that the
concurrent deletions from $LOADED_FEATURES in other threads can
result in rb_ary_entry returning nil when rebuilding the loaded
features index.
To avoid concurrency issues when rebuilding the loaded features
index, the building of the index itself is left alone, and
afterwards, a separate loop is done on a copy of the loaded feature
snapshot in order to rebuild the realpaths hash.
Fixes [Bug #17885]
[0] => [0, *, a]
#=> [0] length mismatch (given 1, expected 2+) (NoMatchingPatternError)
Ignore test failures of typeprof caused by this change for now.
If the thread termination invokes user code after `th->status` becomes
`THREAD_KILLED`, and the user unblock function causes that `th->status` to
become something else (e.g. `THREAD_RUNNING`), threads waiting in
`thread_join_sleep` will hang forever. We move the unblock function call
to before the thread status is updated, and allow threads to join as soon
as `th->value` becomes defined.
This reverts commit 6505c77501.
`vm_opt_method_table` is me=>bop table to manage the optimized
methods (by specialized instruction). However, `me` can be invalidated
to invalidate the method cache entry.
[Bug #17725]
To solve the issue, use `me-def` instead of `me` which simply copied
at invalidation timing.
A test by @jeremyevans https://github.com/ruby/ruby/pull/4376
VM patch from wanabe.
Test based on example from buzztaiki (Taiki Sugawara).
Test fails when compiles with -DRUBY_DEBUG, as that can
can use rb_bug instead of NoMemoryError, which doesn't
allow testing this case. Test also fails on MingW, as
RangeError is used instead of NoMemoryError. Skip the
test in either case.
Fixes [Bug #15779]
If the thread termination invokes user code after `th->status` becomes
`THREAD_KILLED`, and the user unblock function causes that `th->status` to
become something else (e.g. `THREAD_RUNNING`), threads waiting in
`thread_join_sleep` will hang forever. We move the unblock function call
to before the thread status is updated, and allow threads to join as soon
as `th->value` becomes defined.
Redo of 34a2acdac788602c14bf05fb616215187badd504 and
931138b00696419945dc03e10f033b1f53cd50f3 which were reverted.
GitHub PR #4340.
This change implements a cache for class variables. Previously there was
no cache for cvars. Cvar access is slow due to needing to travel all the
way up th ancestor tree before returning the cvar value. The deeper the
ancestor tree the slower cvar access will be.
The benefits of the cache are more visible with a higher number of
included modules due to the way Ruby looks up class variables. The
benchmark here includes 26 modules and shows with the cache, this branch
is 6.5x faster when accessing class variables.
```
compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105c) [x86_64-darwin19]
built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be009) [x86_64-darwin19]
| |compare-ruby|built-ruby|
|:--------|-----------:|---------:|
|vm_cvar | 5.681M| 36.980M|
| | -| 6.51x|
```
Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails
application. ActiveRecord::Base.logger has 71 ancestors. The more
ancestors a tree has, the more clear the speed increase. IE if Base had
only one ancestor we'd see no improvement. This benchmark is run on a
vanilla Rails application.
Benchmark code:
```ruby
require "benchmark/ips"
require_relative "config/environment"
Benchmark.ips do |x|
x.report "logger" do
ActiveRecord::Base.logger
end
end
```
Ruby 3.0 master / Rails 6.1:
```
Warming up --------------------------------------
logger 155.251k i/100ms
Calculating -------------------------------------
```
Ruby 3.0 with cvar cache / Rails 6.1:
```
Warming up --------------------------------------
logger 1.546M i/100ms
Calculating -------------------------------------
logger 14.857M (± 4.8%) i/s - 74.198M in 5.006202s
```
Lastly we ran a benchmark to demonstate the difference between master
and our cache when the number of modules increases. This benchmark
measures 1 ancestor, 30 ancestors, and 100 ancestors.
Ruby 3.0 master:
```
Warming up --------------------------------------
1 module 1.231M i/100ms
30 modules 432.020k i/100ms
100 modules 145.399k i/100ms
Calculating -------------------------------------
1 module 12.210M (± 2.1%) i/s - 61.553M in 5.043400s
30 modules 4.354M (± 2.7%) i/s - 22.033M in 5.063839s
100 modules 1.434M (± 2.9%) i/s - 7.270M in 5.072531s
Comparison:
1 module: 12209958.3 i/s
30 modules: 4354217.8 i/s - 2.80x (± 0.00) slower
100 modules: 1434447.3 i/s - 8.51x (± 0.00) slower
```
Ruby 3.0 with cvar cache:
```
Warming up --------------------------------------
1 module 1.641M i/100ms
30 modules 1.655M i/100ms
100 modules 1.620M i/100ms
Calculating -------------------------------------
1 module 16.279M (± 3.8%) i/s - 82.038M in 5.046923s
30 modules 15.891M (± 3.9%) i/s - 79.459M in 5.007958s
100 modules 16.087M (± 3.6%) i/s - 81.005M in 5.041931s
Comparison:
1 module: 16279458.0 i/s
100 modules: 16087484.6 i/s - same-ish: difference falls within error
30 modules: 15891406.2 i/s - same-ish: difference falls within error
```
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
by merging `rb_ast_body_t#line_count` and `#script_lines`.
Fortunately `line_count == RARRAY_LEN(script_lines)` was always
satisfied. When script_lines is saved, it has an array of lines, and
when not saved, it has a Fixnum that represents the old line_count.
This change implements a cache for class variables. Previously there was
no cache for cvars. Cvar access is slow due to needing to travel all the
way up th ancestor tree before returning the cvar value. The deeper the
ancestor tree the slower cvar access will be.
The benefits of the cache are more visible with a higher number of
included modules due to the way Ruby looks up class variables. The
benchmark here includes 26 modules and shows with the cache, this branch
is 6.5x faster when accessing class variables.
```
compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105ca45) [x86_64-darwin19]
built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be0093ae) [x86_64-darwin19]
| |compare-ruby|built-ruby|
|:--------|-----------:|---------:|
|vm_cvar | 5.681M| 36.980M|
| | -| 6.51x|
```
Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails
application. ActiveRecord::Base.logger has 71 ancestors. The more
ancestors a tree has, the more clear the speed increase. IE if Base had
only one ancestor we'd see no improvement. This benchmark is run on a
vanilla Rails application.
Benchmark code:
```ruby
require "benchmark/ips"
require_relative "config/environment"
Benchmark.ips do |x|
x.report "logger" do
ActiveRecord::Base.logger
end
end
```
Ruby 3.0 master / Rails 6.1:
```
Warming up --------------------------------------
logger 155.251k i/100ms
Calculating -------------------------------------
```
Ruby 3.0 with cvar cache / Rails 6.1:
```
Warming up --------------------------------------
logger 1.546M i/100ms
Calculating -------------------------------------
logger 14.857M (± 4.8%) i/s - 74.198M in 5.006202s
```
Lastly we ran a benchmark to demonstate the difference between master
and our cache when the number of modules increases. This benchmark
measures 1 ancestor, 30 ancestors, and 100 ancestors.
Ruby 3.0 master:
```
Warming up --------------------------------------
1 module 1.231M i/100ms
30 modules 432.020k i/100ms
100 modules 145.399k i/100ms
Calculating -------------------------------------
1 module 12.210M (± 2.1%) i/s - 61.553M in 5.043400s
30 modules 4.354M (± 2.7%) i/s - 22.033M in 5.063839s
100 modules 1.434M (± 2.9%) i/s - 7.270M in 5.072531s
Comparison:
1 module: 12209958.3 i/s
30 modules: 4354217.8 i/s - 2.80x (± 0.00) slower
100 modules: 1434447.3 i/s - 8.51x (± 0.00) slower
```
Ruby 3.0 with cvar cache:
```
Warming up --------------------------------------
1 module 1.641M i/100ms
30 modules 1.655M i/100ms
100 modules 1.620M i/100ms
Calculating -------------------------------------
1 module 16.279M (± 3.8%) i/s - 82.038M in 5.046923s
30 modules 15.891M (± 3.9%) i/s - 79.459M in 5.007958s
100 modules 16.087M (± 3.6%) i/s - 81.005M in 5.041931s
Comparison:
1 module: 16279458.0 i/s
100 modules: 16087484.6 i/s - same-ish: difference falls within error
30 modules: 15891406.2 i/s - same-ish: difference falls within error
```
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
We can take advantage of fstrings to de-duplicate the defined strings.
This means we don't need to keep the list of defined strings on the VM
(or register them as mark objects)
rb_funcall* (rb_funcall(), rb_funcallv(), ...) functions invokes
Ruby's method with given receiver. Ruby 2.7 introduced inline method
cache with static memory area. However, Ruby 3.0 reimplemented the
method cache data structures and the inline cache was removed.
Without inline cache, rb_funcall* searched methods everytime.
Most of cases per-Class Method Cache (pCMC) will be helped but
pCMC requires VM-wide locking and it hurts performance on
multi-Ractor execution, especially all Ractors calls methods
with rb_funcall*.
This patch introduced Global Call-Cache Cache Table (gccct) for
rb_funcall*. Call-Cache was introduced from Ruby 3.0 to manage
method cache entry atomically and gccct enables method-caching
without VM-wide locking. This table solves the performance issue
on multi-ractor execution.
[Bug #17497]
Ruby-level method invocation does not use gccct because it has
inline-method-cache and the table size is limited. Basically
rb_funcall* is not used frequently, so 1023 entries can be enough.
We will revisit the table size if it is not enough.
because the name "MJIT" is an internal code name, it's inconsistent with
--jit while they are related to each other, and I want to discourage future
JIT implementation-specific (e.g. MJIT-specific) APIs by this rename.
[Feature #17490]
"experimental_everything" makes the assigned value, it means
the assignment change the state of assigned value.
"experimental_copy" tries to make a deep copy and make copyied object
sharable.
When `literal`, check if the literal about to be assigned to a
constant is ractor-shareable, otherwise raise `Ractor::Error` at
runtime instead of `SyntaxError`.
Ractor has several restrictions to keep each ractor being isolated
and some operation such as `CONST="foo"` in non-main ractor raises
an exception. This kind of operation raises an error but there is
confusion (some code raises RuntimeError and some code raises
NameError).
To make clear we introduce Ractor::IsolationError which is raised
when the isolation between ractors is violated.
to avoid SEGV on mjit_recompile and compact_all_jit_code.
For some reason, ISeqs on stack are sometimes GC-ed (why?) and therefore
it may run mjit_recompile on a GC-ed ISeq, which I expected d07183ec85
to fix but apparently it may refer to random things if already GC-ed.
Marking active_units would workaround the situation.
http://ci.rvm.jp/results/trunk-mjit-wait@phosphorus-docker/3292740
Also, while compact_all_jit_code was executed, we saw some SEGVs where
CCs seemed to be already GC-ed, meaning their owner ISeq was not marked
properly. Even if units are still in active_units, it's not guaranteed
that their ISeqs are in use. So in this case we need to mark active_units
for a legitimate reason.
http://ci.rvm.jp/results/trunk-mjit-wait@phosphorus-docker/3293277http://ci.rvm.jp/results/trunk-mjit-wait@phosphorus-docker/3293090
The original motivation of this marking was https://github.com/k0kubun/yarv-mjit/issues/20.
As wanabe said, there are multiple options to mitigate the issue, and
Eric Wong introduced another fix at 143776f6fe by checking unit->iseq
inside the lock.
Therefore this particular condition has been covered in two ways, and
the script given by wanabe no longer crashes without mjit_mark().
On windows, MJIT doesn't work without this patch because of
the declaration of ruby_single_main_ractor. This patch fix this
issue and move the definition of it from ractor.c to vm.c to locate
near place of ruby_current_vm_ptr.
C extensions can violate the ractor-safety, so only ractor-safe
C extensions (C methods) can run on non-main ractors.
rb_ext_ractor_safe(true) declares that the successive
defined methods are ractor-safe. Otherwiwze, defined methods
checked they are invoked in main ractor and raise an error
if invoked at non-main ractors.
[Feature #17307]
The vm mark function should only check if the current frame is a local
or not and then mark values in that frame. Since it's walking up the
stack looking at each cfp, then all ep's should be examined.
This fixes a bug in the Rails tests where we're seeing segv in railties.
Thanks Yasuo Honda for giving me a reliable repro!
To make some kind of Ractor related extensions, some functions
should be exposed.
* include/ruby/thread_native.h
* rb_native_mutex_*
* rb_native_cond_*
* include/ruby/ractor.h
* RB_OBJ_SHAREABLE_P(obj)
* rb_ractor_shareable_p(obj)
* rb_ractor_std*()
* rb_cRactor
and rm ractor_pub.h
and rename srcdir/ractor.h to srcdir/ractor_core.h
(to avoid conflict with include/ruby/ractor.h)
* `GC.auto_compact=`, `GC.auto_compact` can be used to control when
compaction runs. Setting `auto_compact=` to true will cause
compaction to occurr duing major collections. At the moment,
compaction adds significant overhead to major collections, so please
test first!
[Feature #17176]
Ractor.make_shareable() supports Proc object if
(1) a Proc only read outer local variables (no assignments)
(2) read outer local variables are shareable.
Read local variables are stored in a snapshot, so after making
shareable Proc, any assignments are not affeect like that:
```ruby
a = 1
pr = Ractor.make_shareable(Proc.new{p a})
pr.call #=> 1
a = 2
pr.call #=> 1 # `a = 2` doesn't affect
```
[Feature #17284]
To access TLS, it is faster to use language TLS specifier instead
of using pthread_get/setspecific functions.
Original proposal is: Use native thread locals. #3665
We should let fibers update their own references on compaction. I don't
think we need the thread to update the associated fiber because there
will be a fiber object on the heap that knows how to update itself.
This commit introduces Ractor mechanism to run Ruby program in
parallel. See doc/ractor.md for more details about Ractor.
See ticket [Feature #17100] to see the implementation details
and discussions.
[Feature #17100]
This commit does not complete the implementation. You can find
many bugs on using Ractor. Also the specification will be changed
so that this feature is experimental. You will see a warning when
you make the first Ractor with `Ractor.new`.
I hope this feature can help programmers from thread-safety issues.
If a module has an origin, and that module is included in another
module or class, previously the iclass created for the module had
an origin pointer to the module's origin instead of the iclass's
origin.
Setting the origin pointer correctly requires using a stack, since
the origin iclass is not created until after the iclass itself.
Use a hidden ruby array to implement that stack.
Correctly assigning the origin pointers in the iclass caused a
use-after-free in GC. If a module with an origin is included
in a class, the iclass shares a method table with the module
and the iclass origin shares a method table with module origin.
Mark iclass origin with a flag that notes that even though the
iclass is an origin, it shares a method table, so the method table
should not be garbage collected. The shared method table will be
garbage collected when the module origin is garbage collected.
I've tested that this does not introduce a memory leak.
This change caused a VM assertion failure, which was traced to callable
method entries using the incorrect defined_class. Update
rb_vm_check_redefinition_opt_method and find_defined_class_by_owner
to treat iclass origins different than class origins to avoid this
issue.
This also includes a fix for Module#included_modules to skip
iclasses with origins.
Fixes [Bug #16736]
Previously, passing a keyword splat to a method always allocated
a hash on the caller side, and accepting arbitrary keywords in
a method allocated a separate hash on the callee side. Passing
explicit keywords to a method that accepted a keyword splat
did not allocate a hash on the caller side, but resulted in two
hashes allocated on the callee side.
This commit makes passing a single keyword splat to a method not
allocate a hash on the caller side. Passing multiple keyword
splats or a mix of explicit keywords and a keyword splat still
generates a hash on the caller side. On the callee side,
if arbitrary keywords are not accepted, it does not allocate a
hash. If arbitrary keywords are accepted, it will allocate a
hash, but this commit uses a callinfo flag to indicate whether
the caller already allocated a hash, and if so, the callee can
use the passed hash without duplicating it. So this commit
should make it so that a maximum of a single hash is allocated
during method calls.
To set the callinfo flag appropriately, method call argument
compilation checks if only a single keyword splat is given.
If only one keyword splat is given, the VM_CALL_KW_SPLAT_MUT
callinfo flag is not set, since in that case the keyword
splat is passed directly and not mutable. If more than one
splat is used, a new hash needs to be generated on the caller
side, and in that case the callinfo flag is set, indicating
the keyword splat is mutable by the callee.
In compile_hash, used for both hash and keyword argument
compilation, if compiling keyword arguments and only a
single keyword splat is used, pass the argument directly.
On the caller side, in vm_args.c, the callinfo flag needs to
be recognized and handled. Because the keyword splat
argument may not be a hash, it needs to be converted to a
hash first if not. Then, unless the callinfo flag is set,
the hash needs to be duplicated. The temporary copy of the
callinfo flag, kw_flag, is updated if a hash was duplicated,
to prevent the need to duplicate it again. If we are
converting to a hash or duplicating a hash, we need to update
the argument array, which can including duplicating the
positional splat array if one was passed. CALLER_SETUP_ARG
and a couple other places needs to be modified to handle
similar issues for other types of calls.
This includes fairly comprehensive tests for different ways
keywords are handled internally, checking that you get equal
results but that keyword splats on the caller side result in
distinct objects for keyword rest parameters.
Included are benchmarks for keyword argument calls.
Brief results when compiled without optimization:
def kw(a: 1) a end
def kws(**kw) kw end
h = {a: 1}
kw(a: 1) # about same
kw(**h) # 2.37x faster
kws(a: 1) # 1.30x faster
kws(**h) # 2.19x faster
kw(a: 1, **h) # 1.03x slower
kw(**h, **h) # about same
kws(a: 1, **h) # 1.16x faster
kws(**h, **h) # 1.14x faster
This patch contains several ideas:
(1) Disposable inline method cache (IMC) for race-free inline method cache
* Making call-cache (CC) as a RVALUE (GC target object) and allocate new
CC on cache miss.
* This technique allows race-free access from parallel processing
elements like RCU.
(2) Introduce per-Class method cache (pCMC)
* Instead of fixed-size global method cache (GMC), pCMC allows flexible
cache size.
* Caching CCs reduces CC allocation and allow sharing CC's fast-path
between same call-info (CI) call-sites.
(3) Invalidate an inline method cache by invalidating corresponding method
entries (MEs)
* Instead of using class serials, we set "invalidated" flag for method
entry itself to represent cache invalidation.
* Compare with using class serials, the impact of method modification
(add/overwrite/delete) is small.
* Updating class serials invalidate all method caches of the class and
sub-classes.
* Proposed approach only invalidate the method cache of only one ME.
See [Feature #16614] for more details.
Now, rb_call_info contains how to call the method with tuple of
(mid, orig_argc, flags, kwarg). Most of cases, kwarg == NULL and
mid+argc+flags only requires 64bits. So this patch packed
rb_call_info to VALUE (1 word) on such cases. If we can not
represent it in VALUE, then use imemo_callinfo which contains
conventional callinfo (rb_callinfo, renamed from rb_call_info).
iseq->body->ci_kw_size is removed because all of callinfo is VALUE
size (packed ci or a pointer to imemo_callinfo).
To access ci information, we need to use these functions:
vm_ci_mid(ci), _flag(ci), _argc(ci), _kwarg(ci).
struct rb_call_info_kw_arg is renamed to rb_callinfo_kwarg.
rb_funcallv_with_cc() and rb_method_basic_definition_p_with_cc()
is temporary removed because cd->ci should be marked.
`make leaked-globals` points out that this function is leaked. This has
not been detected in our CI because it is defined only when
VM_CHECK_MODE is nonzero.
Just make it static and everytihng goes well.
It was set to 1000 in a4a2b9be7a.
However on ruby-2.7.0p0, there are much more than 1k frozen string right after boot:
```
$ ruby -robjspace -e 'p ObjectSpace.each_object(String).select { |s| s.frozen? && ObjectSpace.dump(s).include?(%{"fstring":true})}.uniq.count'
5948
```
This removes the warnings added in 2.7, and changes the behavior
so that a final positional hash is not treated as keywords or
vice-versa.
To handle the arg_setup_block splat case correctly with keyword
arguments, we need to check if we are taking a keyword hash.
That case didn't have a test, but it affects real-world code,
so add a test for it.
This removes rb_empty_keyword_given_p() and related code, as
that is not needed in Ruby 3. The empty keyword case is the
same as the no keyword case in Ruby 3.
This changes rb_scan_args to implement keyword argument
separation for C functions when the : character is used.
For backwards compatibility, it returns a duped hash.
This is a bad idea for performance, but not duping the hash
breaks at least Enumerator::ArithmeticSequence#inspect.
Instead of having RB_PASS_CALLED_KEYWORDS be a number,
simplify the code by just making it be rb_keyword_given_p().
Saves comitters' daily life by avoid #include-ing everything from
internal.h to make each file do so instead. This would significantly
speed up incremental builds.
We take the following inclusion order in this changeset:
1. "ruby/config.h", where _GNU_SOURCE is defined (must be the very
first thing among everything).
2. RUBY_EXTCONF_H if any.
3. Standard C headers, sorted alphabetically.
4. Other system headers, maybe guarded by #ifdef
5. Everything else, sorted alphabetically.
Exceptions are those win32-related headers, which tend not be self-
containing (headers have inclusion order dependencies).
Previously every time a method was defined on a module, we would
recursively walk all subclasses to see if the module was included in a
class which the VM optimizes for (such as Integer#+).
For most method definitions we can tell immediately that this won't be
the case based on the method's name. To do this we just keep a hash with
method IDs of optimized methods and if our new method isn't in that list
we don't need to check subclasses at all.
This removes the related tests, and puts the related specs behind
version guards. This affects all code in lib, including some
libraries that may want to support older versions of Ruby.
Support loading builtin features written in Ruby, which implement
with C builtin functions.
[Feature #16254]
Several features:
(1) Load .rb file at boottime with native binary.
Now, prelude.rb is loaded at boottime. However, this file is contained
into the interpreter as a text format and we need to compile it.
This patch contains a feature to load from binary format.
(2) __builtin_func() in Ruby call func() written in C.
In Ruby file, we can write `__builtin_func()` like method call.
However this is not a method call, but special syntax to call
a function `func()` written in C. C functions should be defined
in a file (same compile unit) which load this .rb file.
Functions (`func` in above example) should be defined with
(a) 1st parameter: rb_execution_context_t *ec
(b) rest parameters (0 to 15).
(c) VALUE return type.
This is very similar requirements for functions used by
rb_define_method(), however `rb_execution_context_t *ec`
is new requirement.
(3) automatic C code generation from .rb files.
tool/mk_builtin_loader.rb creates a C code to load .rb files
needed by miniruby and ruby command. This script is run by
BASERUBY, so *.rb should be written in BASERUBY compatbile
syntax. This script load a .rb file and find all of __builtin_
prefix method calls, and generate a part of C code to export
functions.
tool/mk_builtin_binary.rb creates a C code which contains
binary compiled Ruby files needed by ruby command.
This reverts commits: 10d6a3aca78ba48c1b85fba8627dc1dd883de5ba6c6a25feca167e6b48f17cb96d41a53207979278595b3c4fdd1521f7cf89c11c5e69accf336082033632a812c0f56506be0d86427a3219 .
The reason for the revert is that we observe ABA problem around
inline method cache. When a cache misshits, we search for a
method entry. And if the entry is identical to what was cached
before, we reuse the cache. But the commits we are reverting here
introduced situations where a method entry is freed, then the
identical memory region is used for another method entry. An
inline method cache cannot detect that ABA.
Here is a code that reproduce such situation:
```ruby
require 'prime'
class << Integer
alias org_sqrt sqrt
def sqrt(n)
raise
end
GC.stress = true
Prime.each(7*37){} rescue nil # <- Here we populate CC
class << Object.new; end
# These adjacent remove-then-alias maneuver
# frees a method entry, then immediately
# reuses it for another.
remove_method :sqrt
alias sqrt org_sqrt
end
Prime.each(7*37).to_a # <- SEGV
```
Now that we have eliminated most destructive operations over the
rb_method_entry_t / rb_callable_method_entry_t, let's make them
mostly immutabe and mark them const.
One exception is rb_export_method(), which destructively modifies
visibilities of method entries. I have left that operation as is
because I suspect that destructiveness is the nature of that
function.
This fixes instance_exec and similar methods. It also fixes
Enumerator::Yielder#yield, rb_yield_block, and a couple of cases
with Proc#{<<,>>}.
This support requires the addition of rb_yield_values_kw, similar to
rb_yield_values2, for passing the keyword flag.
Unlike earlier attempts at this, this does not modify the rb_block_call_func
type or add a separate function type. The functions of type
rb_block_call_func are called by Ruby with a separate VM frame, and we can
get the keyword flag information from the VM frame flags, so it doesn't need
to be passed as a function argument.
These changes require the following VM functions accept a keyword flag:
* vm_yield_with_cref
* vm_yield
* vm_yield_with_block
It says:
vm.c:2519:34: warning: expression does not compute the number of elements in this array; element type is 'const struct __jmp_buf_tag', not 'VALUE' (aka 'unsigned long') [-Wsizeof-array-div]
sizeof(ec->machine.regs) / sizeof(VALUE));
~~~~~~~~~~~~~~~~ ^
vm.c:2519:34: note: place parentheses around the 'sizeof(VALUE)' expression to silence this warning
Also add keyword argument separation warnings for Class#new and Method#call.
To allow for keyword argument to required positional hash converstion in
cfuncs, add a vm frame flag indicating the cfunc was called with an empty
keyword hash (which was removed before calling the cfunc). The cfunc can
check this frame flag and add back an empty hash if it is passing its
arguments to another Ruby method. Add rb_empty_keyword_given_p function
for checking if called with an empty keyword hash, and
rb_add_empty_keyword for adding back an empty hash to argv.
All of this empty keyword argument support is only for 2.7. It will be
removed in 3.0 as Ruby 3 will not convert empty keyword arguments to
required positional hash arguments. Comment all of the relevent code
to make it obvious this is expected to be removed.
Add rb_funcallv_kw as an public C-API function, just like rb_funcallv
but with a keyword flag. This is used by rb_obj_call_init (internals
of Class#new). This also required expected call_type enum with
CALL_FCALL_KW, similar to the recent addition of CALL_PUBLIC_KW.
Add rb_vm_call_kw as a internal function, used by call_method_data
(internals of Method#call and UnboundMethod#bind_call). Add tests
for UnboundMethod#bind_call keyword handling.
The kw_splat flag is whether the original call passes keyword or not.
Some types of methods (e.g., bmethod and sym_proc) drops the
information. This change tries to propagate the flag to the final
callee, as far as I can.
Treat the ** syntax as passing a copy of the hash as the last
positional argument. If the hash being double splatted is empty, do
not add a positional argument.
Remove rb_no_keyword_hash, no longer needed.
We can check the function pointer passed to rb_define_method_id
like how we do so in rb_define_method. This method is relatively
rarely used so there are less problems found than the other APIs.
* Make it clear as possible that RubyVM is MRI-specific and only exists on MRI
* See [Bug #15743].
* Use "CRuby VM" instead of "Ruby VM" for clarity.
* Use YARV rather than "CRuby VM" for documenting RubyVM::InstructionSequence
* Avoid introducing a new "CRuby VM" term in documentation
Renaming this function. "No pin" leaks some implementation details. We
just want users to know that if they mark this object, the reference may
move and they'll need to update the reference accordingly.
Without this patch, "raise" event invoked twice when raise an
exception in "load"ed script.
This patch by danielwaterworth (Daniel Waterworth).
[Bug #15877]
Add RubyVM::USAGE_ANALYSIS_INSN_CLEAR, RubyVM::USAGE_ANALYSIS_OPERAND_CLEAR,
and RubyVM::USAGE_ANALYSIS_REGISTER_CLEAR to clear VM instruction hash constants.
Closes: https://github.com/ruby/ruby/pull/2258
Add RubyVM::USAGE_ANALYSIS_INSN_START, RubyVM::USAGE_ANALYSIS_OPERAND_START,
and RubyVM::USAGE_ANALYSIS_REGISTER_START to begin collecting VM instructions.
Add RubyVM::USAGE_ANALYSIS_INSN_RUNNING, RubyVM::USAGE_ANALYSIS_OPERAND_RUNNING,
and RubyVM::USAGE_ANALYSIS_REGISTER_RUNNING to check if VM instructions
are being collected.
Closes: https://github.com/ruby/ruby/pull/2258
I am trying to study debug counters inside a Rails application.
Accessing debug counters by killing the process is hard because child
processes don't get the same TRAP as the parent, and Rails seems to
intercept calls to `exit`. Adding this method lets me print the debug
counters when I want (at the end of requests for example)
Previously, attr* methods could be private even if not in the
private section of a class/module block.
This uses the same approach that ruby started using for define_method
in 1fc3319973.
Fixes [Bug #4537]
If `vm_stack` is left dangling in a forked process, the gc attempts to scan
it, but it is invalid and will cause a segfault. Therefore, we clear it
before forking.
In order to simplify this, `rb_ec_clear_vm_stack` was introduced.
only when its receiver and the argument are both Integers.
Since 6bedbf4625, Integer#[] has supported a range extraction.
This means that Integer#[] now accepts multiple arguments, which made
the method very slow unfortunately.
This change fixes the performance issue by adding a special handling for
its traditional use case: `num[idx]` where both `num` and `idx` are
Integers.
* internal.h (UNALIGNED_MEMBER_ACCESS, UNALIGNED_MEMBER_PTR):
moved from eval_intern.h.
* compile.c iseq.c, vm.c: use UNALIGNED_MEMBER_PTR for `entries`
in `struct iseq_catch_table`.
* vm_eval.c, vm_insnhelper.c: use UNALIGNED_MEMBER_PTR for `body`
in `rb_method_definition_t`.
`vm->orig_progname` can be different from `vm->progname` when user
code assigns to `$0`. While `vm->progname` is kept alive by the
global table, nothing marked `vm->orig_progname`.
[Bug #15887]
* variable.c: make the hidden ivars `classpath` and `tmp_classpath` the source
of truth for module and constant names. Assign to them when modules are bind
to constants.
* variable.c: remove references to module name cache, as what used to be the cache
is now the source of truth. Remove rb_class_path_no_cache().
* variable.c: remove the hidden ivar `classid`. This existed for the purposes of
module name search, which is now replaced. Also, remove the associated
rb_name_class().
* class.c: use rb_set_class_path_string to set the name of Object during boot.
Must use a fstring as this runs before rb_cString is initialized and
creating a normal string leads to a VALUE without a class.
* spec/ruby/core/module/name_spec.rb: add a few specs to specify what happens
to Module#name across multiple operations. These specs pass without other
code changes in this commit.
[Feature #15765]
Before this commit, classes and modules would be registered with the
VM's `defined_module_hash`. The key was the ID of the class, but that
meant that it was possible for hash collisions to occur. The compactor
doesn't allow classes in the `defined_module_hash` to move, but if there
is a conflict, then it's possible a class would be removed from the hash
and not get pined.
This commit changes the key / value of the hash just to be the class
itself, thus preventing movement.
For some reason symbols (or classes) are being overridden in trunk
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67598 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This commit adds the new method `GC.compact` and compacting GC support.
Please see this issue for caveats:
https://bugs.ruby-lang.org/issues/15626
[Feature #15626]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67576 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
is defined. It's 0 by default and so it dissappears on actual build.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67544 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Because hard to specify commits related to r67479 only.
So please commit again.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67499 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This commit adds the new method `GC.compact` and compacting GC support.
Please see this issue for caveats:
https://bugs.ruby-lang.org/issues/15626
[Feature #15626]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67479 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* insns.def: add definemethod and definesmethod (singleton method)
instructions. Old YARV contains these instructions, but it is moved
to methods of FrozenCore class because remove number of instructions
can improve performance for some techniques (static stack caching
and so on). However, we don't employ these technique and it is hard
to optimize/analysis definition sequence. So I decide to introduce
them (and remove definition methods). `putiseq` insn is also removed.
* vm_method.c (rb_scope_visibility_get): renamed to
`vm_scope_visibility_get()` and make it accept `ec`.
Same for `vm_scope_module_func_check()`.
These fixes are result of refactoring `vm_define_method`.
* vm_insnhelper.c (rb_vm_get_cref): renamed to `vm_get_cref`
because of consistency with other functions.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67442 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
In addition to detect dead canary, we try to detect the very moment
when we smash the stack top. Requested by k0kubun:
https://twitter.com/k0kubun/status/1085180749899194368
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66981 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* include/ruby/backward.h (rb_frame_method_id_and_class): we had labeled
`rb_frame_method_id_and_class()` as deprecated because MRI internal
doesn't use it, but we found there are user of this API in external
C-extensions. Now we don't have proper alternative API and no time
to make alternative API, so I remove "deprecated" label.
[Bug #15300]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66522 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* vm_args.c (refine_sym_proc_call): resolve refinements when the
proc is invoked, instead of resolving at making the proc, to
enable refinements on symbol-proc in ruby-level methods
* vm.c (vm_cref_dup): clear cached symbol-procs when duplicating.
[Bug #15114] [Fix GH-2039]
From: manga_osyo <manga.osyo@gmail.com>
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66439 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* iseq.c: before this patch, RubyVM::InstructionSequence.of(src) (ISeq in
short) returns different ISeq (wrapper) objects point to one ISeq internal
object. This patch changes this behavior to cache created ISeq (wrapper)
objects and return same ISeq object for an internal ISeq object.
* iseq.h (ISEQ_EXECUTABLE_P): introduced to check executable ISeq objects.
* iseq.h (ISEQ_COMPILE_DATA_ALLOC): reordr setting flag line to avoid
ISEQ_USE_COMPILE_DATA but compiled_data == NULL case.
* vm_core.h (rb_iseq_t): introduce `rb_iseq_t::wrapper` and
`rb_iseq_t::aux::exec`. Move `rb_iseq_t::local_hooks` to
`rb_iseq_t::aux::exec::local_hooks`.
* test/ruby/test_iseq.rb: add ISeq.of() tests.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66246 b2dd03c8-39d4-4d8f-98ff-823fe69b080e