A "return" statement in a Proc in a lambda like:
`lambda{ proc{ return }.call }`
should return outer lambda block. However, the inner Proc can become
orphan Proc from the lambda block. This "return" escape outer-scope
like method, but this behavior was decieded as a bug.
[Bug #17105]
This patch raises LocalJumpError by checking the proc is orphan or
not from lambda blocks before escaping by "return".
Most of tests are written by Jeremy Evans
https://github.com/ruby/ruby/pull/4223
This patch improves the performance of sequential and parallel
execution of rb_equal() (and rb_eql()).
[Bug #17497]
rb_equal_opt (and rb_eql_opt) does not have own cd and it waste
a time to initialize cd. This patch introduces opt_equality_by_mid()
to check equality without cd.
Furthermore, current master uses "static" cd on rb_equal_opt
(and rb_eql_opt) and it hurts CPU caches on multi-thread execution.
Now they are gone so there are no bottleneck on parallel execution.
rb_funcall* (rb_funcall(), rb_funcallv(), ...) functions invokes
Ruby's method with given receiver. Ruby 2.7 introduced inline method
cache with static memory area. However, Ruby 3.0 reimplemented the
method cache data structures and the inline cache was removed.
Without inline cache, rb_funcall* searched methods everytime.
Most of cases per-Class Method Cache (pCMC) will be helped but
pCMC requires VM-wide locking and it hurts performance on
multi-Ractor execution, especially all Ractors calls methods
with rb_funcall*.
This patch introduced Global Call-Cache Cache Table (gccct) for
rb_funcall*. Call-Cache was introduced from Ruby 3.0 to manage
method cache entry atomically and gccct enables method-caching
without VM-wide locking. This table solves the performance issue
on multi-ractor execution.
[Bug #17497]
Ruby-level method invocation does not use gccct because it has
inline-method-cache and the table size is limited. Basically
rb_funcall* is not used frequently, so 1023 entries can be enough.
We will revisit the table size if it is not enough.
e7fc353f04 reverted vm_ic_hit_p's signature change made in 53babf35ef,
which broke JIT compilation of getinlinecache.
To make sure it doesn't happen again, I separated vm_inlined_ic_hit_p to
make the intention clear.
constant cache `IC` is accessed by non-atomic manner and there are
thread-safety issues, so Ruby 3.0 disables to use const cache on
non-main ractors.
This patch enables it by introducing `imemo_constcache` and allocates
it by every re-fill of const cache like `imemo_callcache`.
[Bug #17510]
Now `IC` only has one entry `IC::entry` and it points to
`iseq_inline_constant_cache_entry`, managed by T_IMEMO object.
`IC` is atomic data structure so `rb_mjit_before_vm_ic_update()` and
`rb_mjit_after_vm_ic_update()` is not needed.
separate some fields from rb_ractor_t to rb_ractor_pub and put it
at the beggining of rb_ractor_t and declare it in vm_core.h so
vm_core.h can access rb_ractor_pub fields.
Now rb_ec_ractor_hooks() is a complete inline function and no
MJIT related issue.
Ractor has several restrictions to keep each ractor being isolated
and some operation such as `CONST="foo"` in non-main ractor raises
an exception. This kind of operation raises an error but there is
confusion (some code raises RuntimeError and some code raises
NameError).
To make clear we introduce Ractor::IsolationError which is raised
when the isolation between ractors is violated.
Some tunings.
* add `inline` for vm_sendish()
* pass enum instead of func ptr to vm_sendish()
* reorder initial order of `calling` struct.
* add ALWAYS_INLINE for vm_search_method_fastpath()
* call vm_search_method_fastpath() from vm_sendish()
add cc_found_in_ccs (renamed from cc_found_ccs), cc_not_found_in_ccs,
call0_public, call0_other debug counters to measure more details.
also it contains several modification.
`cd` is passed to method call functions to method invocation
functions, but `cd` can be manipulated by other ractors simultaneously
so it contains thread-safety issue.
To solve this issue, this patch stores `ci` and found `cc` to `calling`
and stops to pass `cd`.
This speeds up all instance variable access, even when not in
verbose mode. Uninitialized instance variable warnings were
rarely helpful, and resulted in slower code if you wanted to
avoid warnings when run in verbose mode.
Implements [Feature #17055]
C extensions can violate the ractor-safety, so only ractor-safe
C extensions (C methods) can run on non-main ractors.
rb_ext_ractor_safe(true) declares that the successive
defined methods are ractor-safe. Otherwiwze, defined methods
checked they are invoked in main ractor and raise an error
if invoked at non-main ractors.
[Feature #17307]
Performance is probably improved?
$ benchmark-driver -v --rbenv 'before --jit;after --jit' --repeat-count=12 --alternate --output=all benchmark.yml
before --jit: ruby 3.0.0dev (2020-11-27T04:37:47Z master 69e77e81dc) +JIT [x86_64-linux]
after --jit: ruby 3.0.0dev (2020-11-27T05:28:19Z master df6b05c6dd) +JIT [x86_64-linux]
last_commit=Set VM_FRAME_FLAG_FINISH at once
Calculating -------------------------------------
before --jit after --jit
Optcarrot Lan_Master.nes 80.89292998533379 82.19497327502751 fps
80.93130641142331 85.13943315260148
81.06214830270119 87.43757879797808
82.29172808453910 87.89942441487113
84.61206450455929 87.91309779491075
85.44545883567997 87.98026086648694
86.02923132404449 88.03081060383973
86.07411817365879 88.14650206137341
86.34348799602836 88.32791633649961
87.90257338977324 88.57599644892220
88.58006509876580 88.67426384743277
89.26611118140011 88.81669430874207
This should have no bad impact on VM because this function is ALWAYS_INLINE.
Allocating an instance of a class uses the allocator for the class. When
the class has no allocator set, Ruby looks for it in the super class
(see rb_get_alloc_func()).
It's uncommon for classes created from Ruby code to ever have an
allocator set, so it's common during the allocation process to search
all the way to BasicObject from the class with which the allocation is
being performed. This makes creating instances of classes that have
long ancestry chains more expensive than creating instances of classes
have that shorter ancestry chains.
Setting the allocator at class creation time removes the need to perform
a search for the alloctor during allocation.
This is a breaking change for C-extensions that assume that classes
created from Ruby code have no allocator set. Libraries that setup a
class hierarchy in Ruby code and then set the allocator on some parent
class, for example, can experience breakage. This seems like an unusual
use case and hopefully it is rare or non-existent in practice.
Rails has many classes that have upwards of 60 elements in the ancestry
chain and benchmark shows a significant improvement for allocating with
a class that includes 64 modules.
```
pre: ruby 3.0.0dev (2020-11-12T14:39:27Z master 6325866421)
post: ruby 3.0.0dev (2020-11-12T20:15:30Z cut-allocator-lookup)
Comparison:
allocate_8_deep
post: 10336985.6 i/s
pre: 8691873.1 i/s - 1.19x slower
allocate_32_deep
post: 10423181.2 i/s
pre: 6264879.1 i/s - 1.66x slower
allocate_64_deep
post: 10541851.2 i/s
pre: 4936321.5 i/s - 2.14x slower
allocate_128_deep
post: 10451505.0 i/s
pre: 3031313.5 i/s - 3.45x slower
```
This commit adds a debug counter for the case where the inline cache
*missed* but the ivar index table has an entry for that ivar. This is a
case where a polymorphic cache could help
iv tables cannot shrink. If the inline cache was ever set, then there
must be an entry for the instance variable in the iv table. Just set
the iv list on the object to be equal to the iv index table size, then
set the iv.
When the inline cache is written, the iv table will contain an entry for
the instance variable. If we get an inline cache hit, then we know the
iv table must contain a value for the index written to the inline cache.
If the index in the inline cache is larger than the list on the object,
but *smaller* than the iv index table on the class, then we can just
eagerly allocate the iv list to be the same size as the iv index table.
This avoids duplicate work of checking frozen as well as looking up the
index for the particular instance variable name.
Ractor.make_shareable() supports Proc object if
(1) a Proc only read outer local variables (no assignments)
(2) read outer local variables are shareable.
Read local variables are stored in a snapshot, so after making
shareable Proc, any assignments are not affeect like that:
```ruby
a = 1
pr = Ractor.make_shareable(Proc.new{p a})
pr.call #=> 1
a = 2
pr.call #=> 1 # `a = 2` doesn't affect
```
[Feature #17284]
iv_index_tbl manages instance variable indexes (ID -> index).
This data structure should be synchronized with other ractors
so introduce some VM locks.
This patch also introduced atomic ivar cache used by
set/getinlinecache instructions. To make updating ivar cache (IVC),
we changed iv_index_tbl data structure to manage (ID -> entry)
and an entry points serial and index. IVC points to this entry so
that cache update becomes atomically.
Buggy native extensions could have mark functions that cause stack
overflow. When a stack overflow happens during GC, Ruby used to recover
by raising an exception, which runs the interpreter. It's not safe to
run the interpreter during GC since the GC is in an inconsistent state.
This could cause object allocation during GC, for example.
Instead of running the interpreter and potentially causing a crash down
the line, fail fast and abort.
generic_ivtbl is a process global table to maintain instance variables
for non T_OBJECT/T_CLASS/... objects. So we need to protect them
for multi-Ractor exection.
Hint: we can make them Ractor local for unshareable objects, but
now it is premature optimization.
The changes here include:
* Using `FL_TEST_RAW` instead of `FL_TEST` in the first check in
`vm_search_super_method`. While the profile showed us spending a fair
amount of time here, the subsequent benchmarks didn't show much
improvement when adding this. Regardless, we know this does less work
than `FL_TEST` and we know that `FL_TEST_RAW` is safe due to the
previous check so it's a small but accurate optimization.
* Set `mid` only once. Both `vm_ci_new_runtime` and `vm_ci_mid` were
getting the `original_id` for the method entry. We can do this once
and pass the variable to the 2 callers that need it. This also doesn't
have a huge performance improvement but cleans up the code a bit.
Benchmark:
```
| |compare-ruby|built-ruby|
|:----------------|-----------:|---------:|
|vm_iclass_super | 3.540M| 3.940M|
| | -| 1.11x|
```
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>