We started to use fastpath on invokesuper when a method is not refinements
since 5c27681813, but we shouldn't have used fastpath for attr_writer either.
`cc->aux_.attr_index` is for an actual receiver class, while we store
its superclass in `cc->klass` and therefore there's no way to properly
invalidate attr_writer's inline cache when it's called by super.
[Bug #16785]
I suspect the same bug also exists in attr_reader. I'll address that in
another commit.
Fastpath has not been used for invokesuper since it has set vm_call_super_method on every invocation.
Because it seems to be blocked only by refinements, try enabling fastpath on invokesuper when cme is not for refinements.
While this patch itself should be helpful for VM performance, a part of this patch's motivation is to unblock inlining invokesuper on JIT.
$ benchmark-driver -v --rbenv 'before;after' benchmark/vm2_super.yml --repeat-count=4
before: ruby 2.8.0dev (2020-04-11T15:19:58Z master a01bda5949) [x86_64-linux]
after: ruby 2.8.0dev (2020-04-12T02:00:08Z invokesuper-fastpath c171984ee3) [x86_64-linux]
Calculating -------------------------------------
before after
vm2_super 20.031M 32.860M i/s - 6.000M times in 0.299534s 0.182593s
Comparison:
vm2_super
after: 32859885.2 i/s
before: 20031097.3 i/s - 1.64x slower
This changes the following warnings:
* warning: class variable access from toplevel
* warning: class variable @foo of D is overtaken by C
into RuntimeErrors. Handle defined?(@@foo) at toplevel
by returning nil instead of raising an exception (the previous
behavior warned before returning nil when defined? was used).
Refactor the specs to avoid the warnings even in older versions.
The specs were checking for the warnings, but the purpose of
the related specs as evidenced from their description is to
test for behavior, not for warnings.
Fixes [Bug #14541]
Previously, passing a keyword splat to a method always allocated
a hash on the caller side, and accepting arbitrary keywords in
a method allocated a separate hash on the callee side. Passing
explicit keywords to a method that accepted a keyword splat
did not allocate a hash on the caller side, but resulted in two
hashes allocated on the callee side.
This commit makes passing a single keyword splat to a method not
allocate a hash on the caller side. Passing multiple keyword
splats or a mix of explicit keywords and a keyword splat still
generates a hash on the caller side. On the callee side,
if arbitrary keywords are not accepted, it does not allocate a
hash. If arbitrary keywords are accepted, it will allocate a
hash, but this commit uses a callinfo flag to indicate whether
the caller already allocated a hash, and if so, the callee can
use the passed hash without duplicating it. So this commit
should make it so that a maximum of a single hash is allocated
during method calls.
To set the callinfo flag appropriately, method call argument
compilation checks if only a single keyword splat is given.
If only one keyword splat is given, the VM_CALL_KW_SPLAT_MUT
callinfo flag is not set, since in that case the keyword
splat is passed directly and not mutable. If more than one
splat is used, a new hash needs to be generated on the caller
side, and in that case the callinfo flag is set, indicating
the keyword splat is mutable by the callee.
In compile_hash, used for both hash and keyword argument
compilation, if compiling keyword arguments and only a
single keyword splat is used, pass the argument directly.
On the caller side, in vm_args.c, the callinfo flag needs to
be recognized and handled. Because the keyword splat
argument may not be a hash, it needs to be converted to a
hash first if not. Then, unless the callinfo flag is set,
the hash needs to be duplicated. The temporary copy of the
callinfo flag, kw_flag, is updated if a hash was duplicated,
to prevent the need to duplicate it again. If we are
converting to a hash or duplicating a hash, we need to update
the argument array, which can including duplicating the
positional splat array if one was passed. CALLER_SETUP_ARG
and a couple other places needs to be modified to handle
similar issues for other types of calls.
This includes fairly comprehensive tests for different ways
keywords are handled internally, checking that you get equal
results but that keyword splats on the caller side result in
distinct objects for keyword rest parameters.
Included are benchmarks for keyword argument calls.
Brief results when compiled without optimization:
def kw(a: 1) a end
def kws(**kw) kw end
h = {a: 1}
kw(a: 1) # about same
kw(**h) # 2.37x faster
kws(a: 1) # 1.30x faster
kws(**h) # 2.19x faster
kw(a: 1, **h) # 1.03x slower
kw(**h, **h) # about same
kws(a: 1, **h) # 1.16x faster
kws(**h, **h) # 1.14x faster
send() has special method launcher in VM and it has special
method_missing caller. This path doesn't set
ec->method_missing_reason which is used at exception creation,
so setup this information. Without this setting, NoMethodError
exception becomes NameError.
This patch will fix:
http://ci.rvm.jp/results/trunk-random1@phosphorus-docker/2761643
This patch contains several ideas:
(1) Disposable inline method cache (IMC) for race-free inline method cache
* Making call-cache (CC) as a RVALUE (GC target object) and allocate new
CC on cache miss.
* This technique allows race-free access from parallel processing
elements like RCU.
(2) Introduce per-Class method cache (pCMC)
* Instead of fixed-size global method cache (GMC), pCMC allows flexible
cache size.
* Caching CCs reduces CC allocation and allow sharing CC's fast-path
between same call-info (CI) call-sites.
(3) Invalidate an inline method cache by invalidating corresponding method
entries (MEs)
* Instead of using class serials, we set "invalidated" flag for method
entry itself to represent cache invalidation.
* Compare with using class serials, the impact of method modification
(add/overwrite/delete) is small.
* Updating class serials invalidate all method caches of the class and
sub-classes.
* Proposed approach only invalidate the method cache of only one ME.
See [Feature #16614] for more details.
Now, rb_call_info contains how to call the method with tuple of
(mid, orig_argc, flags, kwarg). Most of cases, kwarg == NULL and
mid+argc+flags only requires 64bits. So this patch packed
rb_call_info to VALUE (1 word) on such cases. If we can not
represent it in VALUE, then use imemo_callinfo which contains
conventional callinfo (rb_callinfo, renamed from rb_call_info).
iseq->body->ci_kw_size is removed because all of callinfo is VALUE
size (packed ci or a pointer to imemo_callinfo).
To access ci information, we need to use these functions:
vm_ci_mid(ci), _flag(ci), _argc(ci), _kwarg(ci).
struct rb_call_info_kw_arg is renamed to rb_callinfo_kwarg.
rb_funcallv_with_cc() and rb_method_basic_definition_p_with_cc()
is temporary removed because cd->ci should be marked.
This removes the warnings added in 2.7, and changes the behavior
so that a final positional hash is not treated as keywords or
vice-versa.
To handle the arg_setup_block splat case correctly with keyword
arguments, we need to check if we are taking a keyword hash.
That case didn't have a test, but it affects real-world code,
so add a test for it.
This removes rb_empty_keyword_given_p() and related code, as
that is not needed in Ruby 3. The empty keyword case is the
same as the no keyword case in Ruby 3.
This changes rb_scan_args to implement keyword argument
separation for C functions when the : character is used.
For backwards compatibility, it returns a duped hash.
This is a bad idea for performance, but not duping the hash
breaks at least Enumerator::ArithmeticSequence#inspect.
Instead of having RB_PASS_CALLED_KEYWORDS be a number,
simplify the code by just making it be rb_keyword_given_p().
Saves comitters' daily life by avoid #include-ing everything from
internal.h to make each file do so instead. This would significantly
speed up incremental builds.
We take the following inclusion order in this changeset:
1. "ruby/config.h", where _GNU_SOURCE is defined (must be the very
first thing among everything).
2. RUBY_EXTCONF_H if any.
3. Standard C headers, sorted alphabetically.
4. Other system headers, maybe guarded by #ifdef
5. Everything else, sorted alphabetically.
Exceptions are those win32-related headers, which tend not be self-
containing (headers have inclusion order dependencies).
With these macros implemented we can write codes just like we can assume
the compiler being clang. MSC_VERSION_SINCE is defined to implement
those macros, but turned out to be handy for other places. The -fdeclspec
compiler flag is necessary for clang to properly handle __has_declspec().
Methods and their definitions can be allocated/deallocated on-the-fly.
One pathological situation is when a method is deallocated then another
one is allocated immediately after that. Address of those old/new method
entries/definitions can be the same then, depending on underlying
malloc/free implementation.
So pointer comparison is insufficient. We have to check the contents.
To do so we introduce def->method_serial, which is an integer unique to
that specific method definition.
PS: Note that method_serial being uintptr_t rather than rb_serial_t is
intentional. This is because rb_serial_t can be bigger than a pointer
on a 32bit system (rb_serial_t is at least 64bit). In order to preserve
old packing of struct rb_call_cache, rb_serial_t is inappropriate.
The equation shall hold for every call cache. However prior to this
changeset cc->me could be updated without also updating cc->def. Let's
make it sure by introducing new macro named CC_SET_ME which sets cc->me
and cc->def at once.
This makes behavior the same as super in instance_eval in method
in class. The reason this wasn't implemented before is that
there is a check to determine if the self in the current context
is of the expected class, and a module itself can be included
in multiple classes, so it doesn't have an expected class.
Implementing this requires giving iclasses knowledge of which
class created them, so that super call in the module method
knows the expected class for super calls. This reference
is called includer, and should only be set for iclasses.
Note that the approach Ruby uses in this check is not robust. If
you instance_eval another object of the same class and call super,
instead of an TypeError, you get super called with the
instance_eval receiver instead of the method receiver. Truly
fixing super would require keeping a reference to the super object
(method receiver) in each frame where scope has changed, and using
that instead of current self when calling super.
Fixes [Bug #11636]
This commit introduces an "inline ivar cache" struct. The reason we
need this is so compaction can differentiate from an ivar cache and a
regular inline cache. Regular inline caches contain references to
`VALUE` and ivar caches just contain references to the ivar index. With
this new struct we can easily update references for inline caches (but
not inline var caches as they just contain an int)
Asynchronous events such as signal trap, finalization timing,
thread switching and so on are managed by "interrupt_flag".
Ruby's threads check this flag periodically and if a thread
does not check this flag, above events doesn't happen.
This checking is CHECK_INTS() (related) macro and it is placed
at some places (laeve instruction and so on). However, at the end
of C methods, C blocks (IMEMO_IFUNC) etc there are no checking
and it can introduce uninterruptible thread.
To modify this situation, we decide to place CHECK_INTS() at
vm_pop_frame(). It increases interrupt checking points.
[Bug #16366]
This patch can introduce unexpected events...
By this change, the following code prints only one warning.
```
def foo(**opt); end
100.times { foo({kw:1}) }
```
A global variable `st_table *caller_to_callees` is a map from caller to
a set of callee methods. It remembers that a warning is already printed
for each pair of caller and callee.
[Feature #16289]
opt_invokebuiltin_delegate and opt_invokebuiltin_delegate_leave
invokes builtin functions with same parameters of the method.
This technique eliminate stack push operations. However, delegation
parameters should be completely same as given parameters.
(e.g. `def foo(a, b, c) __builtin_foo(a, b, c)` is okay, but
__builtin_foo(b, c) is not allowed)
This patch relaxes this restriction. ISeq has a local variables
table which includes parameters. For example, the method defined
as `def foo(a, b, c) x=y=nil`, then local variables table contains
[a, b, c, x, y]. If calling builtin-function with arguments which
are sub-array of the lvar table, use opt_invokebuiltin_delegate
instruction with start index. For example, `__builtin_foo(b, c)`,
`__builtin_bar(c, x, y)` is okay, and so on.
rb_vm_lvar_exposed() is prepared for __builtin_inline!(), needed for
mini_builtin.c and builtin.c. However, it's only on builtin.c.
So move it to make it as a part of VM.
Fixes [Bug #16332]
Constant access was changed to no longer allow top-level constant access
through `nil`, but `defined?` wasn't changed at the same time to stay
consistent.
Use a separate defined type to distinguish between a constant
referenced from the current lexical scope and one referenced from
another namespace.
vm_invoke_builtin() accesses VM stack via cfp->sp. However, MJIT
can use their own stack. To access them appropriately, we need to
use STACK_ADDR_FROM_TOP().
A method which has keyword parameters has an implicit local variable
to specify which keywords are (un)specified.
vm_call_iseq_setup_kwparm_nokwarg() is special function to invoke
a ISeq method without any keyword arguments. However, it should
also initialize the special local var. Without this initialization,
the implicit lvar can points a freed (T_NONE) object.
Support loading builtin features written in Ruby, which implement
with C builtin functions.
[Feature #16254]
Several features:
(1) Load .rb file at boottime with native binary.
Now, prelude.rb is loaded at boottime. However, this file is contained
into the interpreter as a text format and we need to compile it.
This patch contains a feature to load from binary format.
(2) __builtin_func() in Ruby call func() written in C.
In Ruby file, we can write `__builtin_func()` like method call.
However this is not a method call, but special syntax to call
a function `func()` written in C. C functions should be defined
in a file (same compile unit) which load this .rb file.
Functions (`func` in above example) should be defined with
(a) 1st parameter: rb_execution_context_t *ec
(b) rest parameters (0 to 15).
(c) VALUE return type.
This is very similar requirements for functions used by
rb_define_method(), however `rb_execution_context_t *ec`
is new requirement.
(3) automatic C code generation from .rb files.
tool/mk_builtin_loader.rb creates a C code to load .rb files
needed by miniruby and ruby command. This script is run by
BASERUBY, so *.rb should be written in BASERUBY compatbile
syntax. This script load a .rb file and find all of __builtin_
prefix method calls, and generate a part of C code to export
functions.
tool/mk_builtin_binary.rb creates a C code which contains
binary compiled Ruby files needed by ruby command.
Prior to this changeset, majority of inline cache mishits resulted
into the same method entry when rb_callable_method_entry() resolves
a method search. Let's not call the function at the first place on
such situations.
In doing so we extend the struct rb_call_cache from 44 bytes (in
case of 64 bit machine) to 64 bytes, and fill the gap with
secondary class serial(s). Call cache's class serials now behavies
as a LRU cache.
Calculating -------------------------------------
ours 2.7 2.6
vm2_poly_same_method 2.339M 1.744M 1.369M i/s - 6.000M times in 2.565086s 3.441329s 4.381386s
Comparison:
vm2_poly_same_method
ours: 2339103.0 i/s
2.7: 1743512.3 i/s - 1.34x slower
2.6: 1369429.8 i/s - 1.71x slower
Noticed that rb_method_basic_definition_p is frequently called.
Its callers include vm_caller_setup_args_block(),
rb_hash_default_value(), rb_num_neative_int_p(), and a lot more.
It seems worth caching the method resolution part. Majority of
rb_method_basic_definion_p() usages take fixed class and fixed
method id combinations.
Calculating -------------------------------------
ours trunk
so_matrix 2.379 2.115 i/s - 1.000 times in 0.420409s 0.472879s
Comparison:
so_matrix
ours: 2.4 i/s
trunk: 2.1 i/s - 1.12x slower
This mirrors the behavior when manually splatting a hash. This
mirrors the changes made in setup_parameters_complex in
6081ddd6e6, so that splatting to a
non-iseq method works the same as splatting to an iseq method.
To perform a regular method call, the VM needs two structs,
`rb_call_info` and `rb_call_cache`. At the moment, we allocate these two
structures in separate buffers. In the worst case, the CPU needs to read
4 cache lines to complete a method call. Putting the two structures
together reduces the maximum number of cache line reads to 2.
Combining the structures also saves 8 bytes per call site as the current
layout uses separate two pointers for the call info and the call cache.
This saves about 2 MiB on Discourse.
This change improves the Optcarrot benchmark at least 3%. For more
details, see attached bugs.ruby-lang.org ticket.
Complications:
- A new instruction attribute `comptime_sp_inc` is introduced to
calculate SP increase at compile time without using call caches. At
compile time, a `TS_CALLDATA` operand points to a call info struct, but
at runtime, the same operand points to a call data struct. Instruction
that explicitly define `sp_inc` also need to define `comptime_sp_inc`.
- MJIT code for copying call cache becomes slightly more complicated.
- This changes the bytecode format, which might break existing tools.
[Misc #16258]
This reverts commits: 10d6a3aca78ba48c1b85fba8627dc1dd883de5ba6c6a25feca167e6b48f17cb96d41a53207979278595b3c4fdd1521f7cf89c11c5e69accf336082033632a812c0f56506be0d86427a3219 .
The reason for the revert is that we observe ABA problem around
inline method cache. When a cache misshits, we search for a
method entry. And if the entry is identical to what was cached
before, we reuse the cache. But the commits we are reverting here
introduced situations where a method entry is freed, then the
identical memory region is used for another method entry. An
inline method cache cannot detect that ABA.
Here is a code that reproduce such situation:
```ruby
require 'prime'
class << Integer
alias org_sqrt sqrt
def sqrt(n)
raise
end
GC.stress = true
Prime.each(7*37){} rescue nil # <- Here we populate CC
class << Object.new; end
# These adjacent remove-then-alias maneuver
# frees a method entry, then immediately
# reuses it for another.
remove_method :sqrt
alias sqrt org_sqrt
end
Prime.each(7*37).to_a # <- SEGV
```
return directly in class/module is an error, so return in
proc in class/module should also be an error. I believe the
previous behavior was an unintentional oversight during the
addition of top-level return in 2.4.
At last, not only myself but also your compiler are fully confident
that the method entries pointed from call caches are immutable. We
don't have to worry about silent updates. Just delete the branch
that is now always false.
Calculating -------------------------------------
ours trunk
vm2_poly_same_method 2.142M 2.070M i/s - 6.000M times in 2.801148s 2.898994s
Comparison:
vm2_poly_same_method
ours: 2141979.2 i/s
trunk: 2069683.8 i/s - 1.03x slower
Now that we have eliminated most destructive operations over the
rb_method_entry_t / rb_callable_method_entry_t, let's make them
mostly immutabe and mark them const.
One exception is rb_export_method(), which destructively modifies
visibilities of method entries. I have left that operation as is
because I suspect that destructiveness is the nature of that
function.
Tired of rb_method_entry_create(..., rb_method_definition_create(
..., &(rb_method_foo_t) {...})) maneuver. Provide a function that
does the thing to reduce copy&paste.
The deleted function was to destructively overwrite existing method
entries, which is now considered to be a bad idea. Delete it, and
assign a newly created method entry instead.
Before this changeset rb_method_definition_create only allocated a
memory region and we had to destructively initialize it later.
That is not a good design so we change the API to return a complete
struct instead.
Most (if not all) of the fields of rb_method_definition_t are never
meant to be modified once after they are stored. Marking them const
makes it possible for compilers to warn on unintended modifications.
If a method accepts no keywords and was called with a keyword, an
ArgumentError was not always issued previously. Force methods that
accept no keywords to go through setup_parameters_complex so that
an ArgumentError is raised if keywords are provided.
This approach uses a flag bit on the final hash object in the regular splat,
as opposed to a previous approach that used a VM frame flag. The hash flag
approach is less invasive, and handles some cases that the VM frame flag
approach does not, such as saving the argument splat array and splatting it
later:
ruby2_keywords def foo(*args)
@args = args
bar
end
def bar
baz(*@args)
end
def baz(*args, **kw)
[args, kw]
end
foo(a:1) #=> [[], {a: 1}]
foo({a: 1}, **{}) #=> [[{a: 1}], {}]
foo({a: 1}) #=> 2.7: [[], {a: 1}] # and warning
foo({a: 1}) #=> 3.0: [[{a: 1}], {}]
It doesn't handle some cases that the VM frame flag handles, such as when
the final hash object is replaced using Hash#merge, but those cases are
probably less common and are unlikely to properly support keyword
argument separation.
Use ruby2_keywords to handle argument delegation in the delegate library.
I noticed that in case of cache misshit, re-calculated cc->me can
be the same method entry than the pevious one. That is an okay
situation but can't we partially reuse the cache, because cc->call
should still be valid then?
One thing that has to be special-cased is when the method entry
gets amended by some refinements. That happens behind-the-scene
of call cache mechanism. We have to check if cc->me->def points to
the previously saved one.
Calculating -------------------------------------
trunk ours
vm2_poly_same_method 1.534M 2.025M i/s - 6.000M times in 3.910203s 2.962752s
Comparison:
vm2_poly_same_method
ours: 2025143.9 i/s
trunk: 1534447.2 i/s - 1.32x slower
Make sure that vm_yield_with_cfunc can correctly set the empty keyword
flag by passing 2 as the kw_splat value when calling it in
vm_invoke_ifunc_block. Make sure calling.kw_splat is set to 1 and not
128 in vm_sendish, so we can safely check for different kw_splat values.
vm_args.c needs to call add_empty_keyword, and to make JIT happy, the
function needs to be exported. Rename the function to
rb_adjust_argv_kw_splat to more accurately reflect what it does, and
mark it as MJIT exported.
Also add keyword argument separation warnings for Class#new and Method#call.
To allow for keyword argument to required positional hash converstion in
cfuncs, add a vm frame flag indicating the cfunc was called with an empty
keyword hash (which was removed before calling the cfunc). The cfunc can
check this frame flag and add back an empty hash if it is passing its
arguments to another Ruby method. Add rb_empty_keyword_given_p function
for checking if called with an empty keyword hash, and
rb_add_empty_keyword for adding back an empty hash to argv.
All of this empty keyword argument support is only for 2.7. It will be
removed in 3.0 as Ruby 3 will not convert empty keyword arguments to
required positional hash arguments. Comment all of the relevent code
to make it obvious this is expected to be removed.
Add rb_funcallv_kw as an public C-API function, just like rb_funcallv
but with a keyword flag. This is used by rb_obj_call_init (internals
of Class#new). This also required expected call_type enum with
CALL_FCALL_KW, similar to the recent addition of CALL_PUBLIC_KW.
Add rb_vm_call_kw as a internal function, used by call_method_data
(internals of Method#call and UnboundMethod#bind_call). Add tests
for UnboundMethod#bind_call keyword handling.
This is the same as the bmethod, sym proc, and send cases,
where we don't remove the keyword splat, so later code can
move it to a required positional parameter and warn.
This is the same as the bmethod and send cases, where we don't
remove the keyword splat, so later code can move it to to a
a required positional parameter and warn.
The lambda case is similar to the attr_writer case, except we have
to determine the number of required parameters from the iseq
instead of being able to assume a single required parameter.
This fixes a lot of lambda tests which were switched to require
warnings for all usage of keyword arguments. Similar to method
handling, we do not warn when passing keyword arguments to
lambdas that do not accept keyword arguments, the argument is
just passed as a positional hash in that case, unless it is empty.
If it is empty and not the final required parameter, then we
ignore it. If it is empty and the final required parameter, then
we pass it for backwards compatibility and emit a warning, as in
Ruby 3 we will not pass it.
The bmethod case is similar to the send case, in that we do not
want to remove empty keyword splats in vm_call_bmethod, as that
prevents later call handling from moving them to required
positional arguments and warning.
In general, we want to ignore empty keyword hashes. The only case
where we want to allow them for backwards compatibility is when
they are necessary to satify the final required positional argument.
In that case, we want to not ignore them, but we do want to warn,
as that will be going away in Ruby 3.
This commit implements this support for regular methods and
attr_writer methods.
In order to allow send to forward arguments correctly, send no
longer removes empty keyword hashes. It is the responsibility of
the final method to remove the empty keyword hashes now. This
change was necessary as otherwise send could remove the empty
keyword hashes before the regular or attr_writer methods could
move them to required positional arguments.
For completeness, add tests for keyword handling regular
methods calls.
This makes rb_warn_keyword_to_last_hash non-static in vm_args.c
so it can be reused in vm_insnhelper.c, and also moves declarations
before statements in the rb_warn_* functions in vm_args.c.
While doing so is not backwards compatible with Ruby 2.6, it is
necessary for generic argument forwarding to work for all methods:
```ruby
def foo(*args, **kw, &block)
bar(*args, **kw, &block)
end
```
If you do not remove empty keyword hashes, and bar does not accept
keyword arguments, then a call to foo without keyword arguments
calls bar with an extra positional empty hash argument.
Actually, the following call is wrongly warned without this change.
```
class C
def method_missing(x, *args, **opt)
end
end
C.new.foo(k: 1)
# warning: The last argument is used as the keyword parameter
# warning: for `method_missing' defined here
```
...only when a "remove_empty_keyword_hash" flag is specified.
After CALLER_SETUP_ARG is called, `ci->flag & VM_CALL_KW_SPLAT` must not
be used. Instead. use `calling->kw_splat`. This is because
CALLER_SETUP_ARG may modify argv and update `calling->kw_splat`, and
`ci->flag & VM_CALL_KW_SPLAT` may be inconsistent with the result.
There are two styles that argv contains keyword arguments: one is
VM_CALL_KWARG which contains value elements in argv (to avoid a hash
object creation if possible), and the other is VM_CALL_KW_SPLAT which
contains one last hash in argv.
vm_caller_setup_arg_kw translates argv from the VM_CALL_KWARG style to
the VM_CALL_KW_SPLAT style.
`calling->kw_splat` means that argv is the VM_CALL_KW_SPLAT style.
So, instead of setting `calling->kw_splat` at many places, it would be
better to do so when vm_caller_setup_arg_kw is called.
This is needed for C functions to call methods with keyword arguments.
This is a copy of rb_funcall_with_block with an extra argument for
the keyword flag.
There isn't a clean way to implement this that doesn't involve
changing a lot of function signatures, because rb_call doesn't
support a way to mark that the call has keyword arguments. So hack
this in using a CALL_PUBLIC_KW call_type, which we switch for
CALL_PUBLIC later in the call stack.
We do need to modify rm_vm_call0 to take an argument for whether
keyword arguments are used, since the call_type is no longer
available at that point. Use the passed in value to set the
appropriate keyword flag in both calling and ci_entry.
The kw_splat flag is whether the original call passes keyword or not.
Some types of methods (e.g., bmethod and sym_proc) drops the
information. This change tries to propagate the flag to the final
callee, as far as I can.
After 5e86b005c0, I now think ANYARGS is
dangerous and should be extinct. This commit deletes ANYARGS from
struct vm_ifunc, but in doing so we also have to decouple the usage
of this struct in compile.c, which (I think) is an abuse of ANYARGS.
This was an intentional bug added in 1.9.
The approach taken here is to add a second operand to the
getconstant instruction for whether nil should be allowed and
treated as current scope.
Fixes [Bug #11718]
Methods on duplicated class/module refer same constant inline
cache (IC). Constant access lookup should be done for cloned
class/modules but inline cache doesn't check it.
To check it, this patch introduce new RCLASS_CLONED flag which
are set when if class/module is cloned (both orig and dst).
[Bug #15877]
only when its receiver and the argument are both Integers.
Since 6bedbf4625, Integer#[] has supported a range extraction.
This means that Integer#[] now accepts multiple arguments, which made
the method very slow unfortunately.
This change fixes the performance issue by adding a special handling for
its traditional use case: `num[idx]` where both `num` and `idx` are
Integers.
* internal.h (UNALIGNED_MEMBER_ACCESS, UNALIGNED_MEMBER_PTR):
moved from eval_intern.h.
* compile.c iseq.c, vm.c: use UNALIGNED_MEMBER_PTR for `entries`
in `struct iseq_catch_table`.
* vm_eval.c, vm_insnhelper.c: use UNALIGNED_MEMBER_PTR for `body`
in `rb_method_definition_t`.
ec->cfp->iseq might not exist at the very beginning of a thread.
=================================================================
==82954==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7fc86f334810 at pc 0x55ceaf013125 bp 0x7ffe2eddbbf0 sp 0x7ffe2eddbbe8
READ of size 8 at 0x7fc86f334810 thread T0
#0 0x55ceaf013124 in vm_check_canary vm_insnhelper.c:217:24
#1 0x55ceaefb4796 in vm_push_frame vm_insnhelper.c:276:5
#2 0x55ceaf0124bd in th_init vm.c:2661:5
#3 0x55ceaf00d5eb in ruby_thread_init vm.c:2690:5
#4 0x55ceaf00d4b1 in rb_thread_alloc vm.c:2703:5
#5 0x55ceaef0038b in thread_s_new thread.c:872:20
#6 0x55ceaf04d8c1 in call_cfunc_m1 vm_insnhelper.c:2041:12
#7 0x55ceaf03118d in vm_call_cfunc_with_frame vm_insnhelper.c:2207:11
#8 0x55ceaf017985 in vm_call_cfunc vm_insnhelper.c:2225:12
#9 0x55ceaf01548b in vm_call_method_each_type vm_insnhelper.c:2560:9
#10 0x55ceaf014c96 in vm_call_method vm_insnhelper.c:2686:13
#11 0x55ceaefb5de4 in vm_call_general vm_insnhelper.c:2730:12
#12 0x55ceaf03c868 in vm_sendish vm_insnhelper.c:3623:11
#13 0x55ceaefc95bb in vm_exec_core insns.def:771:11
#14 0x55ceaf006700 in rb_vm_exec vm.c:1892:22
#15 0x55ceaf00acbf in rb_iseq_eval_main vm.c:2151:11
#16 0x55ceaea250ca in ruby_exec_internal eval.c:262:2
#17 0x55ceaea2498b in ruby_exec_node eval.c:326:12
#18 0x55ceaea247d0 in ruby_run_node eval.c:318:25
#19 0x55ceae88c486 in main main.c:42:9
#20 0x7fc874330b96 in __libc_start_main /build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:310
#21 0x55ceae7e5289 in _start (miniruby+0x15f289)
0x7fc86f334810 is located 16 bytes to the right of 1048576-byte region [0x7fc86f234800,0x7fc86f334800)
allocated by thread T0 here:
#0 0x55ceae85d56d in malloc (miniruby+0x1d756d)
#1 0x55ceaea71d12 in objspace_xmalloc0 gc.c:9416:5
#2 0x55ceaea71cd2 in ruby_xmalloc2_body gc.c:9623:12
#3 0x55ceaea7d09c in ruby_xmalloc2 gc.c:11479:12
#4 0x55ceaf00c3b7 in rb_thread_recycle_stack vm.c:2462:12
#5 0x55ceaf012256 in th_init vm.c:2656:29
#6 0x55ceaf00d5eb in ruby_thread_init vm.c:2690:5
#7 0x55ceaf00d4b1 in rb_thread_alloc vm.c:2703:5
#8 0x55ceaef0038b in thread_s_new thread.c:872:20
#9 0x55ceaf04d8c1 in call_cfunc_m1 vm_insnhelper.c:2041:12
#10 0x55ceaf03118d in vm_call_cfunc_with_frame vm_insnhelper.c:2207:11
#11 0x55ceaf017985 in vm_call_cfunc vm_insnhelper.c:2225:12
#12 0x55ceaf01548b in vm_call_method_each_type vm_insnhelper.c:2560:9
#13 0x55ceaf014c96 in vm_call_method vm_insnhelper.c:2686:13
#14 0x55ceaefb5de4 in vm_call_general vm_insnhelper.c:2730:12
#15 0x55ceaf03c868 in vm_sendish vm_insnhelper.c:3623:11
#16 0x55ceaefc95bb in vm_exec_core insns.def:771:11
#17 0x55ceaf006700 in rb_vm_exec vm.c:1892:22
#18 0x55ceaf00acbf in rb_iseq_eval_main vm.c:2151:11
#19 0x55ceaea250ca in ruby_exec_internal eval.c:262:2
#20 0x55ceaea2498b in ruby_exec_node eval.c:326:12
#21 0x55ceaea247d0 in ruby_run_node eval.c:318:25
#22 0x55ceae88c486 in main main.c:42:9
#23 0x7fc874330b96 in __libc_start_main /build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:310
SUMMARY: AddressSanitizer: heap-buffer-overflow vm_insnhelper.c:217:24 in vm_check_canary
Shadow bytes around the buggy address:
0x0ff98de5e8b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0ff98de5e8c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0ff98de5e8d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0ff98de5e8e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0ff98de5e8f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0ff98de5e900: fa fa[fa]fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0ff98de5e910: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0ff98de5e920: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0ff98de5e930: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0ff98de5e940: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0ff98de5e950: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==82954==ABORTING
When reviewing r66565, I overlooked that `GET_ISEQ()` and `GET_EP()` are
NOT `ec->cfp->iseq` and `ec->cfp->ep` but `reg_cfp->iseq` and
`reg_cfp->ep`.
`vm_push_frame` updates `ec->cfp` and in this case we want to check the
callee's cfp and so `ec->cfp` should be checked instead.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67522 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* insns.def: add definemethod and definesmethod (singleton method)
instructions. Old YARV contains these instructions, but it is moved
to methods of FrozenCore class because remove number of instructions
can improve performance for some techniques (static stack caching
and so on). However, we don't employ these technique and it is hard
to optimize/analysis definition sequence. So I decide to introduce
them (and remove definition methods). `putiseq` insn is also removed.
* vm_method.c (rb_scope_visibility_get): renamed to
`vm_scope_visibility_get()` and make it accept `ec`.
Same for `vm_scope_module_func_check()`.
These fixes are result of refactoring `vm_define_method`.
* vm_insnhelper.c (rb_vm_get_cref): renamed to `vm_get_cref`
because of consistency with other functions.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67442 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
cfp->bp was (re-)introduced by Kokubun san, but VM doesn't use it
because I (ko1) want to remove it in a future. But using it make
leave instruction fast because of sp consisntency check.
So now VM uses cfp->bp.
To use cfp->bp, I checked the value and I found that it is not a
"initial value of sp" but a "initial value of ep". Fix this problem
and fix all bp references (this is why bp is renamed to bp_).
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67342 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Add counters to count ccf (call cache fastpath) usage.
These counters will help which kind of method dispatch
is important to optimize.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67336 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
similar idea to r67315, provide the following optimization
for method dispatch with lead and kw parameters.
(1) add a special branch to check passing kw arguments to
a method which has lead and kw parameters.
ex) def foo(x, k:1); end; foo(0, k:1)
(2) add a special branch to check passing no-kw arguments to
a method which has lead and kw parameters.
ex) def foo(x, k:1); end; foo(0)
For (1) and (2) cases, provide special dispatchers. For (2) case,
this patch only use the special dispatcher if all default
kw parameters are literal values (nil, 1, and so on. In other case,
kw->default_values does not contains Qundef) (and no required kw
parameters becaseu they don't pass any keyword parameters).
Passing keyword arguments with a hash object is not a scope of
this patch.
Without this patch, (1) and (2) cases use `setup_parameters_complex()`.
Especially, (2) seems frequent case for methods which extend a normal
usecase with keyword parameters (like: `exception: true`).
We can measure the performance with benchmark-driver:
With methods: def kw k1:1, k2:2; end
def m; end
With the following binaries:
clean-miniruby: unmodified trunk.
opt_miniruby1: use special branches for lead/kw parameters.
opt_miniruby2: use special dispatchers for lead/kw parameters.
opt_cc_miniruby: apply step (2).
Result with benchmark-driver:
m
opt_miniruby2: 75222278.0 i/s
clean-miniruby: 73177896.5 i/s - 1.03x slower
opt_miniruby1: 62466783.3 i/s - 1.20x slower
kw
opt_miniruby2: 52044504.4 i/s
opt_miniruby1: 29142025.7 i/s - 1.79x slower
clean-miniruby: 20515235.4 i/s - 2.54x slower
kw k1: 10
opt_miniruby2: 26492219.5 i/s
opt_miniruby1: 25409484.9 i/s - 1.04x slower
clean-miniruby: 20235113.7 i/s - 1.31x slower
kw k1: 10, k2: 20
opt_miniruby1: 24159534.0 i/s
opt_miniruby2: 23470527.5 i/s - 1.03x slower
clean-miniruby: 17822621.5 i/s - 1.36x slower
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67333 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
because it's not used outside vm*.c, and also having non-static function
without MJIT_STATIC is harmful for mswin JIT system.
I hope this fix mswin test failure starting from r67315.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67328 b2dd03c8-39d4-4d8f-98ff-823fe69b080e