Previously, each of these methods returned self, but it is
more useful to return arguments, to allow for simpler method
decorators, such as:
```ruby
cached private def foo; some_long_calculation; end
```
Where cached sets up caching for the method.
For each of these methods, the following behavior is used:
1) No arguments returns nil
2) Single argument is returned
3) Multiple arguments are returned as an array
The single argument case is really the case we are trying to
optimize for, for the same reason that def was changed to return
a symbol for the method.
Idea and initial patch from Herwin Quarantainenet.
Implements [Feature #12495]
Builtin methods do not always have their mandatory_only_cme created (it
is only created when called with only mandatory parameters), so it could
be null. If we try to clear the cme, it will crash because it is null.
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
Compare with the C methods, A built-in methods written in Ruby is
slower if only mandatory parameters are given because it needs to
check the argumens and fill default values for optional and keyword
parameters (C methods can check the number of parameters with `argc`,
so there are no overhead). Passing mandatory arguments are common
(optional arguments are exceptional, in many cases) so it is important
to provide the fast path for such common cases.
`Primitive.mandatory_only?` is a special builtin function used with
`if` expression like that:
```ruby
def self.at(time, subsec = false, unit = :microsecond, in: nil)
if Primitive.mandatory_only?
Primitive.time_s_at1(time)
else
Primitive.time_s_at(time, subsec, unit, Primitive.arg!(:in))
end
end
```
and it makes two ISeq,
```
def self.at(time, subsec = false, unit = :microsecond, in: nil)
Primitive.time_s_at(time, subsec, unit, Primitive.arg!(:in))
end
def self.at(time)
Primitive.time_s_at1(time)
end
```
and (2) is pointed by (1). Note that `Primitive.mandatory_only?`
should be used only in a condition of an `if` statement and the
`if` statement should be equal to the methdo body (you can not
put any expression before and after the `if` statement).
A method entry with `mandatory_only?` (`Time.at` on the above case)
is marked as `iseq_overload`. When the method will be dispatch only
with mandatory arguments (`Time.at(0)` for example), make another
method entry with ISeq (2) as mandatory only method entry and it
will be cached in an inline method cache.
The idea is similar discussed in https://bugs.ruby-lang.org/issues/16254
but it only checks mandatory parameters or more, because many cases
only mandatory parameters are given. If we find other cases (optional
or keyword parameters are used frequently and it hurts performance),
we can extend the feature.
I'm looking through the places where YJIT needs notifications. It looks
like these changes to gc.c and vm_callinfo.h have become unnecessary
since 84ab77ba59. This commit just makes the diff against upstream
smaller, but otherwise shouldn't change any behavior.
Added UJIT_CHECK_MODE. Set to 1 to double check method dispatch in
generated code.
It's surprising to me that we need to watch both cc and cme. There might
be opportunities to simplify there.
Point out that the method should be used for backwards compatibility
with code prior to Ruby 3.0 instead of Ruby 2.7. It's still needed
in Ruby 2.7. It isn't needed in Ruby 3.0, as the methods using it
could switch to delegating both positional and keyword arguments.
Add a link to the www.ruby-lang.org web page that goes into detail
describing when and how ruby2_keywords should be used.
Since refinement search is always performed, these entries should always
be public. The method entry that the refinement search returns decides
the visibility.
Fixes [Bug #17822]
To invalidate some callable method entries, we replace the entry in the
class. Most types of method entries are on the method table of the
origin class, but refinement entries without an orig_me are housed in
the method table of the class itself. They are there because refinements
take priority over prepended methods.
By unconditionally inserting a copy of the refinement entry into the
origin class, clearing the method cache created situations where there
are refinement entry duplicates in the lookup chain, leading to infinite
loops and other problems.
Update the replacement logic to use the right class that houses the
method entry. Also, be more selective about cache invalidation when
moving refinement entries for prepend. This avoids calling
clear_method_cache_by_id_in_class() before refinement entries are in the
place it expects.
[Bug #17806]
If a class has been refined but does not have an origin class,
there is a single method entry marked with VM_METHOD_TYPE_REFINED,
but it contains the original method entry. If the original method
entry is present, we shouldn't skip the method when searching even
when skipping refined methods.
Fixes [Bug #17519]
Previously, attempting to change the visibility of a method in a
singleton class for a class/module that is prepended to and refined
would raise a NoMethodError.
Fixes [Bug #17519]
negative cache on a class which does not have subclasses was not
invalidated, but it should be invalidated because other classes
can cache this negative cache.
[Bug #17553]
rb_funcall* (rb_funcall(), rb_funcallv(), ...) functions invokes
Ruby's method with given receiver. Ruby 2.7 introduced inline method
cache with static memory area. However, Ruby 3.0 reimplemented the
method cache data structures and the inline cache was removed.
Without inline cache, rb_funcall* searched methods everytime.
Most of cases per-Class Method Cache (pCMC) will be helped but
pCMC requires VM-wide locking and it hurts performance on
multi-Ractor execution, especially all Ractors calls methods
with rb_funcall*.
This patch introduced Global Call-Cache Cache Table (gccct) for
rb_funcall*. Call-Cache was introduced from Ruby 3.0 to manage
method cache entry atomically and gccct enables method-caching
without VM-wide locking. This table solves the performance issue
on multi-ractor execution.
[Bug #17497]
Ruby-level method invocation does not use gccct because it has
inline-method-cache and the table size is limited. Basically
rb_funcall* is not used frequently, so 1023 entries can be enough.
We will revisit the table size if it is not enough.
Method cache can be cleared during lazy sweeping. An object that will
be collected during lazy sweep *should not* have it's method cache
cleared. Soon-to-be-collected objects can be in an inconsistent state and
this can lead to a crash. This patch just leaves early if the object is
going to be collected.
Fixes [Bug #17536]
Co-Authored-By: John Hawthorn <john@hawthorn.email>
Co-Authored-By: Alan Wu <XrXr@users.noreply.github.com>
separate some fields from rb_ractor_t to rb_ractor_pub and put it
at the beggining of rb_ractor_t and declare it in vm_core.h so
vm_core.h can access rb_ractor_pub fields.
Now rb_ec_ractor_hooks() is a complete inline function and no
MJIT related issue.
Also document that both :deprecated and :experimental are supported
:category option values.
The locations where warnings were marked as deprecation warnings
was previously reviewed by shyouhei.
Comment a couple locations where deprecation warnings should probably
be used but are not currently used because deprecation warning
enablement has not occurred at the time they are called
(RUBY_FREE_MIN, RUBY_HEAP_MIN_SLOTS, -K).
Add assert_deprecated_warn to test assertions. Use this to simplify
some tests, and fix failing tests after marking some warnings with
deprecated category.
C extensions can violate the ractor-safety, so only ractor-safe
C extensions (C methods) can run on non-main ractors.
rb_ext_ractor_safe(true) declares that the successive
defined methods are ractor-safe. Otherwiwze, defined methods
checked they are invoked in main ractor and raise an error
if invoked at non-main ractors.
[Feature #17307]
Before this commit, iclasses were "shady", or not protected by write
barriers. Because of that, the GC needs to spend more time marking these
objects than otherwise.
Applications that make heavy use of modules should see reduction in GC
time as they have a significant number of live iclasses on the heap.
- Put logic for iclass method table ownership into a function
- Remove calls to WB_UNPROTECT and insert write barriers for iclasses
This commit relies on the following invariant: for any non oirigin
iclass `I`, `RCLASS_M_TBL(I) == RCLASS_M_TBL(RBasic(I)->klass)`. This
invariant did not hold prior to 98286e9 for classes and modules that
have prepended modules.
[Feature #16984]
Doing so modifies the class's method table, but not in a way that should
be detectable from Ruby, so it may be safe to avoid checking if the
class is frozen.
Fixes [Bug #11669]
This fixes various issues when a module is included in or prepended
to a module or class, and then refined, or refined and then included
or prepended to a module or class.
Implement by renaming ensure_origin to rb_ensure_origin, making it
non-static, and calling it when refining a module.
Fix Module#initialize_copy to handle origins correctly. Previously,
Module#initialize_copy did not handle origins correctly. For example,
this code:
```ruby
module B; end
class A
def b; 2 end
prepend B
end
a = A.dup.new
class A
def b; 1 end
end
p a.b
```
Printed 1 instead of 2. This is because the super chain for
a.singleton_class was:
```
a.singleton_class
A.dup
B(iclass)
B(iclass origin)
A(origin) # not A.dup(origin)
```
The B iclasses would not be modified, so the includer entry would be
still be set to A and not A.dup.
This modifies things so that if the class/module has an origin,
all iclasses between the class/module and the origin are duplicated
and have the correct includer entry set, and the correct origin
is created.
This requires other changes to make sure all tests still pass:
* rb_undef_methods_from doesn't automatically handle classes with
origins, so pass it the origin for Comparable when undefing
methods in Complex. This fixed a failure in the Complex tests.
* When adding a method, the method cache was not cleared
correctly if klass has an origin. Clear the method cache for
the klass before switching to the origin of klass. This fixed
failures in the autoload tests related to overridding require,
without breaking the optimization tests. Also clear the method
cache for both the module and origin when removing a method.
* Module#include? is fixed to skip origin iclasses.
* Refinements are fixed to use the origin class of the module that
has an origin.
* RCLASS_REFINED_BY_ANY is removed as it was only used in a single
place and is no longer needed.
* Marshal#dump is fixed to skip iclass origins.
* rb_method_entry_make is fixed to handled overridden optimized
methods for modules that have origins.
Fixes [Bug #16852]
search_method() can return invalid method, but vm_defined() checks
it as valid method entry. This is why defined?(foo) if foo is undef'ed.
To solve this problem, check invalidation and return NULL.
[Bug #16669]
https://twitter.com/kamipo/status/1252881930103558144
Tests will be merged by nobu soon.
To invalidate cached method entry, existing method entry (ment)
is marked as invalidated and replace with copied ment. However,
complemented method entry (method entries in Module) should not
be set to Module's m_tbl.
[Bug #16669]
Starting GCC 7, warnings about uninitialized variables are issued around
them. Such warnings could be false positives (all versions of clang do
not warn), but adding initializers there could never be bad things.
This patch contains several ideas:
(1) Disposable inline method cache (IMC) for race-free inline method cache
* Making call-cache (CC) as a RVALUE (GC target object) and allocate new
CC on cache miss.
* This technique allows race-free access from parallel processing
elements like RCU.
(2) Introduce per-Class method cache (pCMC)
* Instead of fixed-size global method cache (GMC), pCMC allows flexible
cache size.
* Caching CCs reduces CC allocation and allow sharing CC's fast-path
between same call-info (CI) call-sites.
(3) Invalidate an inline method cache by invalidating corresponding method
entries (MEs)
* Instead of using class serials, we set "invalidated" flag for method
entry itself to represent cache invalidation.
* Compare with using class serials, the impact of method modification
(add/overwrite/delete) is small.
* Updating class serials invalidate all method caches of the class and
sub-classes.
* Proposed approach only invalidate the method cache of only one ME.
See [Feature #16614] for more details.
Now, rb_call_info contains how to call the method with tuple of
(mid, orig_argc, flags, kwarg). Most of cases, kwarg == NULL and
mid+argc+flags only requires 64bits. So this patch packed
rb_call_info to VALUE (1 word) on such cases. If we can not
represent it in VALUE, then use imemo_callinfo which contains
conventional callinfo (rb_callinfo, renamed from rb_call_info).
iseq->body->ci_kw_size is removed because all of callinfo is VALUE
size (packed ci or a pointer to imemo_callinfo).
To access ci information, we need to use these functions:
vm_ci_mid(ci), _flag(ci), _argc(ci), _kwarg(ci).
struct rb_call_info_kw_arg is renamed to rb_callinfo_kwarg.
rb_funcallv_with_cc() and rb_method_basic_definition_p_with_cc()
is temporary removed because cd->ci should be marked.
Starting clang 11, casts between pointer and (narrower-than-pointer) int
are now warned. However all such thing in our repository are guaranteed
safe. Let's suppress the warnings.
VALUE and rb_serial_t do not agree with their width. We have to be
consistent. Assigning an rb_serial_t value to a VALUE variable is
practically a problem on a ILP32 environment.
Methods and their definitions can be allocated/deallocated on-the-fly.
One pathological situation is when a method is deallocated then another
one is allocated immediately after that. Address of those old/new method
entries/definitions can be the same then, depending on underlying
malloc/free implementation.
So pointer comparison is insufficient. We have to check the contents.
To do so we introduce def->method_serial, which is an integer unique to
that specific method definition.
PS: Note that method_serial being uintptr_t rather than rb_serial_t is
intentional. This is because rb_serial_t can be bigger than a pointer
on a 32bit system (rb_serial_t is at least 64bit). In order to preserve
old packing of struct rb_call_cache, rb_serial_t is inappropriate.
Previously every time a method was defined on a module, we would
recursively walk all subclasses to see if the module was included in a
class which the VM optimizes for (such as Integer#+).
For most method definitions we can tell immediately that this won't be
the case based on the method's name. To do this we just keep a hash with
method IDs of optimized methods and if our new method isn't in that list
we don't need to check subclasses at all.
We know that this is a heap-allocated object (a CLASS, MODULE, or
ICLASS) so we don't need to check if it is an immediate value. This
should be very slightly faster.
Avoids genereating a "throwaway" sentinel class serial. There wasn't any
read harm in doing so (we're at no risk of exhaustion and there'd be no
measurable performance impact), but if feels cleaner that all class
serials actually end up assigned and used (especially now that we won't
overwrite them in a single method definition).
rb_clear_method_cache_by_class calls rb_class_clear_method_cache
recursively on subclasses, where it will bump the class serial and clear
some other data (callable_m_tbl, and some mjit data).
Previously this could end up taking a long time to clear all the classes
if the module was included a few levels deep and especially if there
were multiple paths to it in the dependency tree (ie. a class includes
two modules which both include the same other module) as we end up
revisiting class/iclass/module objects multiple times.
This commit avoids revisiting the same object, by short circuiting when
revisit the same object. We can check this efficiently by comparing the
class serial of each object we visit with the next class serial at the
start. We know that any objects with a higher class serial have already
been visited.
These functions are used from within a compilation unit so we can
make them static, for better binary size. This changeset reduces
the size of generated ruby binary from 26,590,128 bytes to
26,584,472 bytes on my macihne.
Noticed that rb_method_basic_definition_p is frequently called.
Its callers include vm_caller_setup_args_block(),
rb_hash_default_value(), rb_num_neative_int_p(), and a lot more.
It seems worth caching the method resolution part. Majority of
rb_method_basic_definion_p() usages take fixed class and fixed
method id combinations.
Calculating -------------------------------------
ours trunk
so_matrix 2.379 2.115 i/s - 1.000 times in 0.420409s 0.472879s
Comparison:
so_matrix
ours: 2.4 i/s
trunk: 2.1 i/s - 1.12x slower
There are libraries that use define_method with argument splats
where they would like to pass keywords through the method. To
more easily allow such libraries to use ruby2_keywords to handle
backwards compatibility, it is necessary for ruby2_keywords to
support bmethods.
This reverts commits: 10d6a3aca78ba48c1b85fba8627dc1dd883de5ba6c6a25feca167e6b48f17cb96d41a53207979278595b3c4fdd1521f7cf89c11c5e69accf336082033632a812c0f56506be0d86427a3219 .
The reason for the revert is that we observe ABA problem around
inline method cache. When a cache misshits, we search for a
method entry. And if the entry is identical to what was cached
before, we reuse the cache. But the commits we are reverting here
introduced situations where a method entry is freed, then the
identical memory region is used for another method entry. An
inline method cache cannot detect that ABA.
Here is a code that reproduce such situation:
```ruby
require 'prime'
class << Integer
alias org_sqrt sqrt
def sqrt(n)
raise
end
GC.stress = true
Prime.each(7*37){} rescue nil # <- Here we populate CC
class << Object.new; end
# These adjacent remove-then-alias maneuver
# frees a method entry, then immediately
# reuses it for another.
remove_method :sqrt
alias sqrt org_sqrt
end
Prime.each(7*37).to_a # <- SEGV
```
Now that we have eliminated most destructive operations over the
rb_method_entry_t / rb_callable_method_entry_t, let's make them
mostly immutabe and mark them const.
One exception is rb_export_method(), which destructively modifies
visibilities of method entries. I have left that operation as is
because I suspect that destructiveness is the nature of that
function.
Tired of rb_method_entry_create(..., rb_method_definition_create(
..., &(rb_method_foo_t) {...})) maneuver. Provide a function that
does the thing to reduce copy&paste.
The deleted function was to destructively overwrite existing method
entries, which is now considered to be a bad idea. Delete it, and
assign a newly created method entry instead.
Before this changeset rb_method_definition_create only allocated a
memory region and we had to destructively initialize it later.
That is not a good design so we change the API to return a complete
struct instead.
Did you know that C functions can return structs? That has been
true since the beginning, but not that very useful until C99. Now
that we can write compound literals, this feature is much easier
to use. By allowing struct-returning functions, some formerly big
functions can be split into smaller ones, like this changeset.
At least GCC is smart enough to inline the split functions back
into one. No performance penalty is observed.
Most (if not all) of the fields of rb_method_definition_t are never
meant to be modified once after they are stored. Marking them const
makes it possible for compilers to warn on unintended modifications.
This approach uses a flag bit on the final hash object in the regular splat,
as opposed to a previous approach that used a VM frame flag. The hash flag
approach is less invasive, and handles some cases that the VM frame flag
approach does not, such as saving the argument splat array and splatting it
later:
ruby2_keywords def foo(*args)
@args = args
bar
end
def bar
baz(*@args)
end
def baz(*args, **kw)
[args, kw]
end
foo(a:1) #=> [[], {a: 1}]
foo({a: 1}, **{}) #=> [[{a: 1}], {}]
foo({a: 1}) #=> 2.7: [[], {a: 1}] # and warning
foo({a: 1}) #=> 3.0: [[{a: 1}], {}]
It doesn't handle some cases that the VM frame flag handles, such as when
the final hash object is replaced using Hash#merge, but those cases are
probably less common and are unlikely to properly support keyword
argument separation.
Use ruby2_keywords to handle argument delegation in the delegate library.