To perform a regular method call, the VM needs two structs,
`rb_call_info` and `rb_call_cache`. At the moment, we allocate these two
structures in separate buffers. In the worst case, the CPU needs to read
4 cache lines to complete a method call. Putting the two structures
together reduces the maximum number of cache line reads to 2.
Combining the structures also saves 8 bytes per call site as the current
layout uses separate two pointers for the call info and the call cache.
This saves about 2 MiB on Discourse.
This change improves the Optcarrot benchmark at least 3%. For more
details, see attached bugs.ruby-lang.org ticket.
Complications:
- A new instruction attribute `comptime_sp_inc` is introduced to
calculate SP increase at compile time without using call caches. At
compile time, a `TS_CALLDATA` operand points to a call info struct, but
at runtime, the same operand points to a call data struct. Instruction
that explicitly define `sp_inc` also need to define `comptime_sp_inc`.
- MJIT code for copying call cache becomes slightly more complicated.
- This changes the bytecode format, which might break existing tools.
[Misc #16258]
This reverts commits: 10d6a3aca78ba48c1b85fba8627dc1dd883de5ba6c6a25feca167e6b48f17cb96d41a53207979278595b3c4fdd1521f7cf89c11c5e69accf336082033632a812c0f56506be0d86427a3219 .
The reason for the revert is that we observe ABA problem around
inline method cache. When a cache misshits, we search for a
method entry. And if the entry is identical to what was cached
before, we reuse the cache. But the commits we are reverting here
introduced situations where a method entry is freed, then the
identical memory region is used for another method entry. An
inline method cache cannot detect that ABA.
Here is a code that reproduce such situation:
```ruby
require 'prime'
class << Integer
alias org_sqrt sqrt
def sqrt(n)
raise
end
GC.stress = true
Prime.each(7*37){} rescue nil # <- Here we populate CC
class << Object.new; end
# These adjacent remove-then-alias maneuver
# frees a method entry, then immediately
# reuses it for another.
remove_method :sqrt
alias sqrt org_sqrt
end
Prime.each(7*37).to_a # <- SEGV
```
return directly in class/module is an error, so return in
proc in class/module should also be an error. I believe the
previous behavior was an unintentional oversight during the
addition of top-level return in 2.4.
At last, not only myself but also your compiler are fully confident
that the method entries pointed from call caches are immutable. We
don't have to worry about silent updates. Just delete the branch
that is now always false.
Calculating -------------------------------------
ours trunk
vm2_poly_same_method 2.142M 2.070M i/s - 6.000M times in 2.801148s 2.898994s
Comparison:
vm2_poly_same_method
ours: 2141979.2 i/s
trunk: 2069683.8 i/s - 1.03x slower
Now that we have eliminated most destructive operations over the
rb_method_entry_t / rb_callable_method_entry_t, let's make them
mostly immutabe and mark them const.
One exception is rb_export_method(), which destructively modifies
visibilities of method entries. I have left that operation as is
because I suspect that destructiveness is the nature of that
function.
Tired of rb_method_entry_create(..., rb_method_definition_create(
..., &(rb_method_foo_t) {...})) maneuver. Provide a function that
does the thing to reduce copy&paste.
The deleted function was to destructively overwrite existing method
entries, which is now considered to be a bad idea. Delete it, and
assign a newly created method entry instead.
Before this changeset rb_method_definition_create only allocated a
memory region and we had to destructively initialize it later.
That is not a good design so we change the API to return a complete
struct instead.
Most (if not all) of the fields of rb_method_definition_t are never
meant to be modified once after they are stored. Marking them const
makes it possible for compilers to warn on unintended modifications.
If a method accepts no keywords and was called with a keyword, an
ArgumentError was not always issued previously. Force methods that
accept no keywords to go through setup_parameters_complex so that
an ArgumentError is raised if keywords are provided.
This approach uses a flag bit on the final hash object in the regular splat,
as opposed to a previous approach that used a VM frame flag. The hash flag
approach is less invasive, and handles some cases that the VM frame flag
approach does not, such as saving the argument splat array and splatting it
later:
ruby2_keywords def foo(*args)
@args = args
bar
end
def bar
baz(*@args)
end
def baz(*args, **kw)
[args, kw]
end
foo(a:1) #=> [[], {a: 1}]
foo({a: 1}, **{}) #=> [[{a: 1}], {}]
foo({a: 1}) #=> 2.7: [[], {a: 1}] # and warning
foo({a: 1}) #=> 3.0: [[{a: 1}], {}]
It doesn't handle some cases that the VM frame flag handles, such as when
the final hash object is replaced using Hash#merge, but those cases are
probably less common and are unlikely to properly support keyword
argument separation.
Use ruby2_keywords to handle argument delegation in the delegate library.
I noticed that in case of cache misshit, re-calculated cc->me can
be the same method entry than the pevious one. That is an okay
situation but can't we partially reuse the cache, because cc->call
should still be valid then?
One thing that has to be special-cased is when the method entry
gets amended by some refinements. That happens behind-the-scene
of call cache mechanism. We have to check if cc->me->def points to
the previously saved one.
Calculating -------------------------------------
trunk ours
vm2_poly_same_method 1.534M 2.025M i/s - 6.000M times in 3.910203s 2.962752s
Comparison:
vm2_poly_same_method
ours: 2025143.9 i/s
trunk: 1534447.2 i/s - 1.32x slower
Make sure that vm_yield_with_cfunc can correctly set the empty keyword
flag by passing 2 as the kw_splat value when calling it in
vm_invoke_ifunc_block. Make sure calling.kw_splat is set to 1 and not
128 in vm_sendish, so we can safely check for different kw_splat values.
vm_args.c needs to call add_empty_keyword, and to make JIT happy, the
function needs to be exported. Rename the function to
rb_adjust_argv_kw_splat to more accurately reflect what it does, and
mark it as MJIT exported.
Also add keyword argument separation warnings for Class#new and Method#call.
To allow for keyword argument to required positional hash converstion in
cfuncs, add a vm frame flag indicating the cfunc was called with an empty
keyword hash (which was removed before calling the cfunc). The cfunc can
check this frame flag and add back an empty hash if it is passing its
arguments to another Ruby method. Add rb_empty_keyword_given_p function
for checking if called with an empty keyword hash, and
rb_add_empty_keyword for adding back an empty hash to argv.
All of this empty keyword argument support is only for 2.7. It will be
removed in 3.0 as Ruby 3 will not convert empty keyword arguments to
required positional hash arguments. Comment all of the relevent code
to make it obvious this is expected to be removed.
Add rb_funcallv_kw as an public C-API function, just like rb_funcallv
but with a keyword flag. This is used by rb_obj_call_init (internals
of Class#new). This also required expected call_type enum with
CALL_FCALL_KW, similar to the recent addition of CALL_PUBLIC_KW.
Add rb_vm_call_kw as a internal function, used by call_method_data
(internals of Method#call and UnboundMethod#bind_call). Add tests
for UnboundMethod#bind_call keyword handling.
This is the same as the bmethod, sym proc, and send cases,
where we don't remove the keyword splat, so later code can
move it to a required positional parameter and warn.
This is the same as the bmethod and send cases, where we don't
remove the keyword splat, so later code can move it to to a
a required positional parameter and warn.
The lambda case is similar to the attr_writer case, except we have
to determine the number of required parameters from the iseq
instead of being able to assume a single required parameter.
This fixes a lot of lambda tests which were switched to require
warnings for all usage of keyword arguments. Similar to method
handling, we do not warn when passing keyword arguments to
lambdas that do not accept keyword arguments, the argument is
just passed as a positional hash in that case, unless it is empty.
If it is empty and not the final required parameter, then we
ignore it. If it is empty and the final required parameter, then
we pass it for backwards compatibility and emit a warning, as in
Ruby 3 we will not pass it.
The bmethod case is similar to the send case, in that we do not
want to remove empty keyword splats in vm_call_bmethod, as that
prevents later call handling from moving them to required
positional arguments and warning.
In general, we want to ignore empty keyword hashes. The only case
where we want to allow them for backwards compatibility is when
they are necessary to satify the final required positional argument.
In that case, we want to not ignore them, but we do want to warn,
as that will be going away in Ruby 3.
This commit implements this support for regular methods and
attr_writer methods.
In order to allow send to forward arguments correctly, send no
longer removes empty keyword hashes. It is the responsibility of
the final method to remove the empty keyword hashes now. This
change was necessary as otherwise send could remove the empty
keyword hashes before the regular or attr_writer methods could
move them to required positional arguments.
For completeness, add tests for keyword handling regular
methods calls.
This makes rb_warn_keyword_to_last_hash non-static in vm_args.c
so it can be reused in vm_insnhelper.c, and also moves declarations
before statements in the rb_warn_* functions in vm_args.c.
While doing so is not backwards compatible with Ruby 2.6, it is
necessary for generic argument forwarding to work for all methods:
```ruby
def foo(*args, **kw, &block)
bar(*args, **kw, &block)
end
```
If you do not remove empty keyword hashes, and bar does not accept
keyword arguments, then a call to foo without keyword arguments
calls bar with an extra positional empty hash argument.
Actually, the following call is wrongly warned without this change.
```
class C
def method_missing(x, *args, **opt)
end
end
C.new.foo(k: 1)
# warning: The last argument is used as the keyword parameter
# warning: for `method_missing' defined here
```
...only when a "remove_empty_keyword_hash" flag is specified.
After CALLER_SETUP_ARG is called, `ci->flag & VM_CALL_KW_SPLAT` must not
be used. Instead. use `calling->kw_splat`. This is because
CALLER_SETUP_ARG may modify argv and update `calling->kw_splat`, and
`ci->flag & VM_CALL_KW_SPLAT` may be inconsistent with the result.
There are two styles that argv contains keyword arguments: one is
VM_CALL_KWARG which contains value elements in argv (to avoid a hash
object creation if possible), and the other is VM_CALL_KW_SPLAT which
contains one last hash in argv.
vm_caller_setup_arg_kw translates argv from the VM_CALL_KWARG style to
the VM_CALL_KW_SPLAT style.
`calling->kw_splat` means that argv is the VM_CALL_KW_SPLAT style.
So, instead of setting `calling->kw_splat` at many places, it would be
better to do so when vm_caller_setup_arg_kw is called.
This is needed for C functions to call methods with keyword arguments.
This is a copy of rb_funcall_with_block with an extra argument for
the keyword flag.
There isn't a clean way to implement this that doesn't involve
changing a lot of function signatures, because rb_call doesn't
support a way to mark that the call has keyword arguments. So hack
this in using a CALL_PUBLIC_KW call_type, which we switch for
CALL_PUBLIC later in the call stack.
We do need to modify rm_vm_call0 to take an argument for whether
keyword arguments are used, since the call_type is no longer
available at that point. Use the passed in value to set the
appropriate keyword flag in both calling and ci_entry.
The kw_splat flag is whether the original call passes keyword or not.
Some types of methods (e.g., bmethod and sym_proc) drops the
information. This change tries to propagate the flag to the final
callee, as far as I can.
After 5e86b005c0, I now think ANYARGS is
dangerous and should be extinct. This commit deletes ANYARGS from
struct vm_ifunc, but in doing so we also have to decouple the usage
of this struct in compile.c, which (I think) is an abuse of ANYARGS.
This was an intentional bug added in 1.9.
The approach taken here is to add a second operand to the
getconstant instruction for whether nil should be allowed and
treated as current scope.
Fixes [Bug #11718]
Methods on duplicated class/module refer same constant inline
cache (IC). Constant access lookup should be done for cloned
class/modules but inline cache doesn't check it.
To check it, this patch introduce new RCLASS_CLONED flag which
are set when if class/module is cloned (both orig and dst).
[Bug #15877]
only when its receiver and the argument are both Integers.
Since 6bedbf4625, Integer#[] has supported a range extraction.
This means that Integer#[] now accepts multiple arguments, which made
the method very slow unfortunately.
This change fixes the performance issue by adding a special handling for
its traditional use case: `num[idx]` where both `num` and `idx` are
Integers.
* internal.h (UNALIGNED_MEMBER_ACCESS, UNALIGNED_MEMBER_PTR):
moved from eval_intern.h.
* compile.c iseq.c, vm.c: use UNALIGNED_MEMBER_PTR for `entries`
in `struct iseq_catch_table`.
* vm_eval.c, vm_insnhelper.c: use UNALIGNED_MEMBER_PTR for `body`
in `rb_method_definition_t`.
ec->cfp->iseq might not exist at the very beginning of a thread.
=================================================================
==82954==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7fc86f334810 at pc 0x55ceaf013125 bp 0x7ffe2eddbbf0 sp 0x7ffe2eddbbe8
READ of size 8 at 0x7fc86f334810 thread T0
#0 0x55ceaf013124 in vm_check_canary vm_insnhelper.c:217:24
#1 0x55ceaefb4796 in vm_push_frame vm_insnhelper.c:276:5
#2 0x55ceaf0124bd in th_init vm.c:2661:5
#3 0x55ceaf00d5eb in ruby_thread_init vm.c:2690:5
#4 0x55ceaf00d4b1 in rb_thread_alloc vm.c:2703:5
#5 0x55ceaef0038b in thread_s_new thread.c:872:20
#6 0x55ceaf04d8c1 in call_cfunc_m1 vm_insnhelper.c:2041:12
#7 0x55ceaf03118d in vm_call_cfunc_with_frame vm_insnhelper.c:2207:11
#8 0x55ceaf017985 in vm_call_cfunc vm_insnhelper.c:2225:12
#9 0x55ceaf01548b in vm_call_method_each_type vm_insnhelper.c:2560:9
#10 0x55ceaf014c96 in vm_call_method vm_insnhelper.c:2686:13
#11 0x55ceaefb5de4 in vm_call_general vm_insnhelper.c:2730:12
#12 0x55ceaf03c868 in vm_sendish vm_insnhelper.c:3623:11
#13 0x55ceaefc95bb in vm_exec_core insns.def:771:11
#14 0x55ceaf006700 in rb_vm_exec vm.c:1892:22
#15 0x55ceaf00acbf in rb_iseq_eval_main vm.c:2151:11
#16 0x55ceaea250ca in ruby_exec_internal eval.c:262:2
#17 0x55ceaea2498b in ruby_exec_node eval.c:326:12
#18 0x55ceaea247d0 in ruby_run_node eval.c:318:25
#19 0x55ceae88c486 in main main.c:42:9
#20 0x7fc874330b96 in __libc_start_main /build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:310
#21 0x55ceae7e5289 in _start (miniruby+0x15f289)
0x7fc86f334810 is located 16 bytes to the right of 1048576-byte region [0x7fc86f234800,0x7fc86f334800)
allocated by thread T0 here:
#0 0x55ceae85d56d in malloc (miniruby+0x1d756d)
#1 0x55ceaea71d12 in objspace_xmalloc0 gc.c:9416:5
#2 0x55ceaea71cd2 in ruby_xmalloc2_body gc.c:9623:12
#3 0x55ceaea7d09c in ruby_xmalloc2 gc.c:11479:12
#4 0x55ceaf00c3b7 in rb_thread_recycle_stack vm.c:2462:12
#5 0x55ceaf012256 in th_init vm.c:2656:29
#6 0x55ceaf00d5eb in ruby_thread_init vm.c:2690:5
#7 0x55ceaf00d4b1 in rb_thread_alloc vm.c:2703:5
#8 0x55ceaef0038b in thread_s_new thread.c:872:20
#9 0x55ceaf04d8c1 in call_cfunc_m1 vm_insnhelper.c:2041:12
#10 0x55ceaf03118d in vm_call_cfunc_with_frame vm_insnhelper.c:2207:11
#11 0x55ceaf017985 in vm_call_cfunc vm_insnhelper.c:2225:12
#12 0x55ceaf01548b in vm_call_method_each_type vm_insnhelper.c:2560:9
#13 0x55ceaf014c96 in vm_call_method vm_insnhelper.c:2686:13
#14 0x55ceaefb5de4 in vm_call_general vm_insnhelper.c:2730:12
#15 0x55ceaf03c868 in vm_sendish vm_insnhelper.c:3623:11
#16 0x55ceaefc95bb in vm_exec_core insns.def:771:11
#17 0x55ceaf006700 in rb_vm_exec vm.c:1892:22
#18 0x55ceaf00acbf in rb_iseq_eval_main vm.c:2151:11
#19 0x55ceaea250ca in ruby_exec_internal eval.c:262:2
#20 0x55ceaea2498b in ruby_exec_node eval.c:326:12
#21 0x55ceaea247d0 in ruby_run_node eval.c:318:25
#22 0x55ceae88c486 in main main.c:42:9
#23 0x7fc874330b96 in __libc_start_main /build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:310
SUMMARY: AddressSanitizer: heap-buffer-overflow vm_insnhelper.c:217:24 in vm_check_canary
Shadow bytes around the buggy address:
0x0ff98de5e8b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0ff98de5e8c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0ff98de5e8d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0ff98de5e8e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0ff98de5e8f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0ff98de5e900: fa fa[fa]fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0ff98de5e910: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0ff98de5e920: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0ff98de5e930: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0ff98de5e940: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0ff98de5e950: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==82954==ABORTING
When reviewing r66565, I overlooked that `GET_ISEQ()` and `GET_EP()` are
NOT `ec->cfp->iseq` and `ec->cfp->ep` but `reg_cfp->iseq` and
`reg_cfp->ep`.
`vm_push_frame` updates `ec->cfp` and in this case we want to check the
callee's cfp and so `ec->cfp` should be checked instead.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67522 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* insns.def: add definemethod and definesmethod (singleton method)
instructions. Old YARV contains these instructions, but it is moved
to methods of FrozenCore class because remove number of instructions
can improve performance for some techniques (static stack caching
and so on). However, we don't employ these technique and it is hard
to optimize/analysis definition sequence. So I decide to introduce
them (and remove definition methods). `putiseq` insn is also removed.
* vm_method.c (rb_scope_visibility_get): renamed to
`vm_scope_visibility_get()` and make it accept `ec`.
Same for `vm_scope_module_func_check()`.
These fixes are result of refactoring `vm_define_method`.
* vm_insnhelper.c (rb_vm_get_cref): renamed to `vm_get_cref`
because of consistency with other functions.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67442 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
cfp->bp was (re-)introduced by Kokubun san, but VM doesn't use it
because I (ko1) want to remove it in a future. But using it make
leave instruction fast because of sp consisntency check.
So now VM uses cfp->bp.
To use cfp->bp, I checked the value and I found that it is not a
"initial value of sp" but a "initial value of ep". Fix this problem
and fix all bp references (this is why bp is renamed to bp_).
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67342 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Add counters to count ccf (call cache fastpath) usage.
These counters will help which kind of method dispatch
is important to optimize.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67336 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
similar idea to r67315, provide the following optimization
for method dispatch with lead and kw parameters.
(1) add a special branch to check passing kw arguments to
a method which has lead and kw parameters.
ex) def foo(x, k:1); end; foo(0, k:1)
(2) add a special branch to check passing no-kw arguments to
a method which has lead and kw parameters.
ex) def foo(x, k:1); end; foo(0)
For (1) and (2) cases, provide special dispatchers. For (2) case,
this patch only use the special dispatcher if all default
kw parameters are literal values (nil, 1, and so on. In other case,
kw->default_values does not contains Qundef) (and no required kw
parameters becaseu they don't pass any keyword parameters).
Passing keyword arguments with a hash object is not a scope of
this patch.
Without this patch, (1) and (2) cases use `setup_parameters_complex()`.
Especially, (2) seems frequent case for methods which extend a normal
usecase with keyword parameters (like: `exception: true`).
We can measure the performance with benchmark-driver:
With methods: def kw k1:1, k2:2; end
def m; end
With the following binaries:
clean-miniruby: unmodified trunk.
opt_miniruby1: use special branches for lead/kw parameters.
opt_miniruby2: use special dispatchers for lead/kw parameters.
opt_cc_miniruby: apply step (2).
Result with benchmark-driver:
m
opt_miniruby2: 75222278.0 i/s
clean-miniruby: 73177896.5 i/s - 1.03x slower
opt_miniruby1: 62466783.3 i/s - 1.20x slower
kw
opt_miniruby2: 52044504.4 i/s
opt_miniruby1: 29142025.7 i/s - 1.79x slower
clean-miniruby: 20515235.4 i/s - 2.54x slower
kw k1: 10
opt_miniruby2: 26492219.5 i/s
opt_miniruby1: 25409484.9 i/s - 1.04x slower
clean-miniruby: 20235113.7 i/s - 1.31x slower
kw k1: 10, k2: 20
opt_miniruby1: 24159534.0 i/s
opt_miniruby2: 23470527.5 i/s - 1.03x slower
clean-miniruby: 17822621.5 i/s - 1.36x slower
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67333 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
because it's not used outside vm*.c, and also having non-static function
without MJIT_STATIC is harmful for mswin JIT system.
I hope this fix mswin test failure starting from r67315.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67328 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
There is a special optimization for "only lead parameters"
method dispatch using specialized dispatcher functions
`vm_call_iseq_setup_normal_0start...`.
Other cases (opt, rest, post, ...) we don't use specialized
dispatcher and call with `setup_parameters_complex` to
satisfy Ruby's complex parameter specification.
This commit introduce a specialize dispatcher for
methods which use only lead and optional parameters.
Two step improvements:
(1) prepare "lead/opt" only check pass.
It is to skip the `setup_parameters_complex` function.
(2) introduce specialized dispatcher for only "lead/opt"
parameters methods (vm_call_iseq_setup_normal_opt_start).
With these improvements, we achieved good micro-benchmark
results:
With a method: `def opt2 a, b=nil; end`
With the following binaries:
clean-miniruby: unmodified trunk.
opt_miniruby: apply step (1).
opt_cc_miniruby: apply step (2).
Result with benchmark-driver:
opt2(1)
opt_cc_miniruby: 42269409.1 i/s
opt_miniruby: 36304428.3 i/s - 1.16x slower
clean-miniruby: 25897409.5 i/s - 1.63x slower
opt2(1, 2)
opt_cc_miniruby: 45935145.7 i/s
opt_miniruby: 40513196.9 i/s - 1.13x slower
clean-miniruby: 29976057.6 i/s - 1.53x slower
This improvement may be trivial (difficult to improve practical
cases). However, this is enough small patch so I decide to
introduce it.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67315 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* vm_insnhelper.c: change `call_cfunc_*` parameters order
and specify a function type for the passed func ptr.
This fix reduce the number of asm instructions, such as:
# before this patch
0000000000000110 <call_cfunc_0>:
110: 48 89 fa mov %rdi,%rdx
113: 31 c0 xor %eax,%eax
115: 48 89 f7 mov %rsi,%rdi
118: ff e2 jmpq *%rdx
11a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
# after this patch
0000000000000110 <call_cfunc_0>:
110: ff e1 jmpq *%rcx
However, this kind of instruction reduction doesn't affect
any performance because of great CPU architectures :p
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67122 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
In addition to detect dead canary, we try to detect the very moment
when we smash the stack top. Requested by k0kubun:
https://twitter.com/k0kubun/status/1085180749899194368
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66981 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
and functions to clarify the intention and make sure it's not used in a
surprising way (like using 2, 3, ... other than 0, 1 even while it seems
to be a boolean).
This is a retry of r66775. It included some typos...
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66778 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This reverts commit bb1a1aeab0.
We hit something on ci.rvm.jp, reverting until investigation is done.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66776 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
and functions to clarify the intention and make sure it's not used in a
surprising way (like using 2, 3, ... other than 0, 1 even while it seems
to be a boolean).
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66775 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This changeset should fix the 32bit failures.
See also: https://travis-ci.org/ruby/ruby/jobs/472855470
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66601 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
- FIXNUM_2_P: moved to vm_insnhelper.c because that is the only
place this macro is used.
- FLONUM_2_P: ditto.
- FLOAT_HEAP_P: not used anywhere.
- FLOAT_INSTANCE_P: ditto.
- GET_TOS: ditto.
- USE_IC_FOR_SPECIALIZED_METHOD: ditto.
- rb_obj_hidden_p: ditto.
- REG_A: ditto.
- REG_B: ditto.
- GET_CONST_INLINE_CACHE: ditto.
- vm_regan_regtype: moved inside of VM_COLLECT_USAGE_DETAILS
because that os the only place this enum is used.
- vm_regan_acttype: ditto.
- GET_GLOBAL: used only once. Removed with replacing that usage.
- SET_GLOBAL: ditto.
- rb_method_definition_create: declaration moved to
vm_insnhelper.c because that is the only place this declaration
makes sense.
- rb_method_definition_set: ditto.
- rb_method_definition_eq: ditto.
- rb_make_no_method_exception: ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66597 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
COLDFUNC is introduced in r66228. Use it for pre-existing
__attribute__((__cold__)) usages.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66538 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* vm_insnhelper.c (vm_call_method_each_type): we should use me->defined_class
instead of me->owner because me->owner doesn't has correct ancestors list.
[Bug #15427]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66436 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* iseq.c: before this patch, RubyVM::InstructionSequence.of(src) (ISeq in
short) returns different ISeq (wrapper) objects point to one ISeq internal
object. This patch changes this behavior to cache created ISeq (wrapper)
objects and return same ISeq object for an internal ISeq object.
* iseq.h (ISEQ_EXECUTABLE_P): introduced to check executable ISeq objects.
* iseq.h (ISEQ_COMPILE_DATA_ALLOC): reordr setting flag line to avoid
ISEQ_USE_COMPILE_DATA but compiled_data == NULL case.
* vm_core.h (rb_iseq_t): introduce `rb_iseq_t::wrapper` and
`rb_iseq_t::aux::exec`. Move `rb_iseq_t::local_hooks` to
`rb_iseq_t::aux::exec::local_hooks`.
* test/ruby/test_iseq.rb: add ISeq.of() tests.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66246 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* vm_trace.c (rb_tracepoint_enable_for_target): support targetting
TracePoint. [Feature #15289]
Tragetting TracePoint is only enabled on specified method, proc
and so on, example: `tp.enable(target: code)`.
`code` should be consisted of InstructionSeuqnece (iseq)
(RubyVM::InstructionSeuqnece.of(code) should not return nil)
If code is a tree of iseq, TracePoint is enabled on all of
iseqs in a tree.
Enabled tragetting TracePoints can not enabled again with
and without target.
* vm_core.h (rb_iseq_t): introduce `rb_iseq_t::local_hooks`
to store local hooks.
`rb_iseq_t::aux::trace_events` is renamed to
`global_trace_events` to contrast with `local_hooks`.
* vm_core.h (rb_hook_list_t): add `rb_hook_list_t::running`
to represent how many Threads/Fibers are used this list.
If this field is 0, nobody using this hooks and we can
delete it.
This is why we can remove code from cont.c.
* vm_core.h (rb_vm_t): because of above change, we can eliminate
`rb_vm_t::trace_running` field.
Also renamed from `rb_vm_t::event_hooks` to `global_hooks`.
* vm_core.h, vm.c (ruby_vm_event_enabled_global_flags): renamed
from `ruby_vm_event_enabled_flags.
* vm_core.h, vm.c (ruby_vm_event_local_num): added to count
enabled targetting TracePoints.
* vm_core.h, vm_trace.c (rb_exec_event_hooks): accepts
hook list.
* vm_core.h (rb_vm_global_hooks): added for convinience.
* method.h (rb_method_bmethod_t): added to maintain Proc
and `rb_hook_list_t` for bmethod (defined by define_method).
* prelude.rb (TracePoint#enable): extracet a keyword parameter
(because it is easy than writing in C).
It calls `TracePoint#__enable` internal method written in C.
* vm_insnhelper.c (vm_trace): check also iseq->local_hooks.
* vm.c (invoke_bmethod): check def->body.bmethod.hooks.
* vm.c (hook_before_rewind): check iseq->local_hooks
and def->body.bmethod.hooks before rewind by exception.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66003 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Here, recv can be INT2FIX(-1), which is 0xFFFF_FFFFul.
INT2FIX(1) is 3ul. So `recv - 1 + INT2FIX(1)` is:
recv 0xFFFF_FFFFul
recv-1 0xFFFF_FFFEul (note: unsigned)
recv-1+INT2FIX(1) 0x0000_0001ul Here is the overflow.
Given recv is a Fixnum, it can never be 0xFFFF_FFFD. 0xFFFF_FFFF is
the only value that can overflow this way, so special-casing this
value should just suffice.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65828 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
space_size can be zero here, under the following script. We would
better bail out before bptr calculation.
% ./miniruby --dump=i -e '* = nil'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,7)> (catch: FALSE)
0000 putnil ( 1)[Li]
0001 dup
0002 expandarray 0, 0
0005 leave
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65685 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* vm_insnhelper.c (vm_yield_with_cfunc): use passed me as bmethod.
We also need to set `VM_FRAME_FLAG_BMETHOD` if needed.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65639 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* vm_core.h: remove `rb_execution_context_t::passed_bmethod_me`
and fix functions to pass the `me` directly.
`passed_bmethod_me` was used to make bmethod (methods defined by
`defined_method`). `rb_vm_invoke_bmethod` invoke `Proc` with `me`
information as method frame (`lambda` frame, actually).
If the proc call is not bmethod call, `passed_bmethod_me` should
be NULL. However, there is a bug which passes wrong `me` for
normal block call.
http://ci.rvm.jp/results/trunk-asserts@silicon-docker/1449470
This is because wrong `me` was remained in `passed_bmethod_me`
(and used incorrectly it after collected by GC).
We need to clear `passed_bmethod_me` just after bmethod call,
but clearing is not enough.
To solve this issue, I removed `passed_bmethod_me` and pass `me`
information as a function parameter of `rb_vm_invoke_bmethod`,
`invoke_block_from_c_proc` and `invoke_iseq_block_from_c` in vm.c.
* vm.c (invoke_iseq_block_from_c): the number of parameters is too
long so that I try to specify `ALWAYS_INLINE`.
* vm.c (invoke_block_from_c_proc): ditto.
* vm_insnhelper.c (vm_yield_with_cfunc): now there are no pathes
to use bmethod here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65636 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* hash.c, internal.h: support theap for small Hash.
Introduce RHASH_ARRAY (li_table) besides st_table and small Hash
(<=8 entries) are managed by an array data structure.
This array data can be managed by theap.
If st_table is needed, then converting array data to st_table data.
For st_table using code, we prepare "stlike" APIs which accepts hash value
and are very similar to st_ APIs.
This work is based on the GSoC achievement
by tacinight <tacingiht@gmail.com> and refined by ko1.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65454 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* transient_heap.c, transient_heap.h: implement TransientHeap (theap).
theap is designed for Ruby's object system. theap is like Eden heap
on generational GC terminology. theap allocation is very fast because
it only needs to bump up pointer and deallocation is also fast because
we don't do anything. However we need to evacuate (Copy GC terminology)
if theap memory is long-lived. Evacuation logic is needed for each type.
See [Bug #14858] for details.
* array.c: Now, theap for T_ARRAY is supported.
ary_heap_alloc() tries to allocate memory area from theap. If this trial
sccesses, this array has theap ptr and RARRAY_TRANSIENT_FLAG is turned on.
We don't need to free theap ptr.
* ruby.h: RARRAY_CONST_PTR() returns malloc'ed memory area. It menas that
if ary is allocated at theap, force evacuation to malloc'ed memory.
It makes programs slow, but very compatible with current code because
theap memory can be evacuated (theap memory will be recycled).
If you want to get transient heap ptr, use RARRAY_CONST_PTR_TRANSIENT()
instead of RARRAY_CONST_PTR(). If you can't understand when evacuation
will occur, use RARRAY_CONST_PTR().
(re-commit of r65444)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65449 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* transient_heap.c, transient_heap.h: implement TransientHeap (theap).
theap is designed for Ruby's object system. theap is like Eden heap
on generational GC terminology. theap allocation is very fast because
it only needs to bump up pointer and deallocation is also fast because
we don't do anything. However we need to evacuate (Copy GC terminology)
if theap memory is long-lived. Evacuation logic is needed for each type.
See [Bug #14858] for details.
* array.c: Now, theap for T_ARRAY is supported.
ary_heap_alloc() tries to allocate memory area from theap. If this trial
sccesses, this array has theap ptr and RARRAY_TRANSIENT_FLAG is turned on.
We don't need to free theap ptr.
* ruby.h: RARRAY_CONST_PTR() returns malloc'ed memory area. It menas that
if ary is allocated at theap, force evacuation to malloc'ed memory.
It makes programs slow, but very compatible with current code because
theap memory can be evacuated (theap memory will be recycled).
If you want to get transient heap ptr, use RARRAY_CONST_PTR_TRANSIENT()
instead of RARRAY_CONST_PTR(). If you can't understand when evacuation
will occur, use RARRAY_CONST_PTR().
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65444 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
because it's not supported by this file. Also, shared `def_iseq_ptr`
instead of copying the main definition of it.
vm_core.h: moved `def_iseq_ptr` to this place. added `inline` to avoid
compiler warnings since it's not used in some files including vm_core.h.
vm_insnhelper.c: moved `def_iseq_ptr` to vm_core.h.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65440 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
The instructions were used only for branch coverage.
Instead, it now uses a trace framework [Feature #14104].
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65225 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
We have several options to ensure there's no race condition between main
thread and MJIT thead about IC reference:
1) Give up caching ivar for multiple classes (or multiple versions of the
same class) in the same getinstancevariable (This commit's approach)
2) Allocate new inline cache every time
Other ideas we could think of couldn't eliminate possibilities of race
condition.
In 2, it's memory allocation would be slow and it may trigger JIT
cancellation frequently. So 1 would be fast for both VM and JIT
situations.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65213 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* vm_insnhelper.c (vm_push_frame): validate prev_frame because
prev_frame can be the end of frame.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65162 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
No major performance impact, but just in case for some platform
that matters.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65062 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
That optimization is already reverted and we're not retrying the
optimization soon. Let me simplify the code of vm_getivar.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65061 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This change resolves most of major remaining MJIT bugs on mswin.
Since Visual Studio doesn't support generating pre-processed code
preserving macros, we can't use transform_mjit_header approach for mswin.
So we need to transform MJIT header using macro like this.
vm.c: use MJIT_STATIC for non-static functions that exist on MJIT header
and cause conflict on link.
vm_insnhelper.c: ditto
test_jit.rb: remove many skips for mswin.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64940 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* vm_insnhelper.c: remove `vm_profile_counter` because
it is replaced with debug_counters.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64890 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* debug_counter.h: add debug counters to count frame state transitions:
* frame_R2R: Ruby frame to Ruby frame
* frame_R2C: Ruby frame to C frame
* frame_C2C: C frame to C frame
* frame_C2R: C frame to Ruby frame
* vm_insnhelper.c (vm_push_frame): ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64871 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* debug_counter.h: add the following counters.
* frame_push: control frame counts (total counts).
* frame_push_*: control frame counts per every frame type.
* obj_*: add free'ed counts for each type.
* gc.c: ditto.
* vm_insnhelper.c (vm_push_frame): ditto.
* debug_counter.c (rb_debug_counter_show_results): widen counts field
to show >10G numbers.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64867 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
because r64849 seems to fix issues which we were confused about.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64850 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This reverts commit r64829. I'll prepare another temporary fix, but I'll
separately commit that to make it easier to revert that later.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64838 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
not optimizing Array#& and Array#| because vm_insnhelper.c can't easily
inline it (large amount of array.c code would be needed in vm_insnhelper.c)
and the method body is a little complicated compared to Integer's ones.
So I thought only Integer#& and Integer#| have a significant impact,
and eliminating unnecessary branches would contribute to JIT's performance.
vm_insnhelper.c: ditto
tool/transform_mjit_header.rb: make sure these instructions are inlined
on JIT.
compile.c: compile vm_opt_and and vm_opt_or.
id.def: define id for them to be used in compile.c and vm*.c
vm.c: track redefinition of Integer#& and Integer#|
vm_core.h: allow detecting redefinition of & and |
test/ruby/test_jit.rb: test new insns
test/ruby/test_optimization.rb: ditto
* Optcarrot benchmark
This is a kind of experimental thing but I'm committing this since the
performance impact is significant especially on Optcarrot with JIT.
$ benchmark-driver benchmark.yml --rbenv 'before::before --disable-gems;before+JIT::before --disable-gems --jit;after::after --disable-gems;after+JIT::after --disable-gems --jit' -v --repeat-count 24
before: ruby 2.6.0dev (2018-09-24 trunk 64821) [x86_64-linux]
before+JIT: ruby 2.6.0dev (2018-09-24 trunk 64821) +JIT [x86_64-linux]
after: ruby 2.6.0dev (2018-09-24 opt_and 64821) [x86_64-linux]
last_commit=opt_or
after+JIT: ruby 2.6.0dev (2018-09-24 opt_and 64821) +JIT [x86_64-linux]
last_commit=opt_or
Calculating -------------------------------------
before before+JIT after after+JIT
Optcarrot Lan_Master.nes 51.460 66.315 53.023 71.173 fps
Comparison:
Optcarrot Lan_Master.nes
after+JIT: 71.2 fps
before+JIT: 66.3 fps - 1.07x slower
after: 53.0 fps - 1.34x slower
before: 51.5 fps - 1.38x slower
[close https://github.com/ruby/ruby/pull/1963]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64824 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
by sharing vm_call_iseq_setup_normal. This is a retry of r64280.
vm_insnhelper.c: Remove unused argument `ci` and pass `me` instead of
`cc` to share this with JIT. Declare this with ALWAYS_INLINE to make
sure this function is inlined in JIT.
tool/mk_call_iseq_optimized.rb: deal with the interface change of
vm_call_iseq_setup_normal.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64820 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
for CC_SET_FASTPATH condition. Just a cosmetic change to unify the
styling with other lines.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64775 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
because it's actually setting fastpath to cc instead of ci since r51903.
vm_insnhelper.c: ditto
mjit_compile.c: ditto
tool/ruby_vm/views/_mjit_compile_send.erb: ditto
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64772 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
because cc->call is NULL by default and it is not overridden by
vm_search_super_method if OPT_CALL_FASTPATH is 0. So this macro is not
just a switch for optimization but now it's mandatory.
vm_insnhelper.c: cosmetic change. Use boolean-ish `TRUE` instead of 1 to
specify `enabled` flag.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64735 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Simply use DISPATCH_ORIGINAL_INSN instead of rb_funcall. This is,
when possible, overall performant because method dispatch results are
cached inside of CALL_CACHE. Should also be good for JIT.
----
trunk: ruby 2.6.0dev (2018-09-12 trunk 64689) [x86_64-darwin15]
ours: ruby 2.6.0dev (2018-09-12 leaf-insn 64688) [x86_64-darwin15]
last_commit=make opt_str_freeze leaf
Calculating -------------------------------------
trunk ours
vm2_freezestring 5.440M 31.411M i/s - 6.000M times in 1.102968s 0.191017s
Comparison:
vm2_freezestring
ours: 31410864.5 i/s
trunk: 5439865.4 i/s - 5.77x slower
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64690 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This instruction can be written without rb_funcall. It not only boosts
performance of case statements, but also makes room of future JIT
improvements. Because opt_case_dispatch is about optimization this
should not be a bad thing to have.
----
trunk: ruby 2.6.0dev (2018-09-05 trunk 64634) [x86_64-darwin15]
ours: ruby 2.6.0dev (2018-09-12 leaf-insn 64688) [x86_64-darwin15]
last_commit=make opt_case_dispatch leaf
Calculating -------------------------------------
trunk ours
vm2_case_lit 1.366 2.012 i/s - 1.000 times in 0.731839s 0.497008s
Comparison:
vm2_case_lit
ours: 2.0 i/s
trunk: 1.4 i/s - 1.47x slower
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64689 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
as well, to make CI succeed with VM_CHECK_MODE > 1.
vm_insnhelper.c: drop unnecessary MJIT_HEADER ifdef. This is intended to
be ignored by having `static inline`. Removing that by macro would be
helpful for minimizing compilation time, but the impact is not so big
and having many MJIT_HEADER check would be bad for maintainability.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64682 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
An instruction is leaf if it has no rb_funcall inside. In order to
check this property, we introduce stack canary which is a random
number collected at runtime. Stack top is always filled with this
number and checked for stack smashing operations, when VM_CHECK_MODE.
[GH-1947]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64677 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Line coverage was based on special instruction "tracecoverage".
Now, instead, it uses the mechanism of trace hook [Feature #14104].
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64509 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This commit caused test-all failure with --jit-wait.
I don't know the reason yet, but let me revert it to normalize CI.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64314 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
to resolve warning:
c:\projects\ruby\vm_insnhelper.c(1661) : warning C4141: 'inline' : used more than once
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64312 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
implementation. This had no major performance impact by effort to keep
them inlined.
vm_insnhelper.c: ditto
mjit_compile.c: just update the comment about opt_pc=0 assumption
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64280 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This is just a refactoring.
The receiver of "invokesuper" was a boolean to represent if it is ZSUPER
or not. This was used in vm_search_super_method to prohibit ZSUPER call
in define_method. (It is currently prohibited because of the limitation
of the implementation.)
This change removes the hack by introducing an explicit flag,
VM_CALL_SUPER, to signal the information. Now, the implementation of
"invokesuper" is consistent with "send" instruction.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64268 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
because such inconsistency may result in the regression fixed in r64034.
vm_exec is not touched since renaming it may be controversial...
vm_args.c: ditto.
vm_eval.c: ditto.
vm_insnhelper.c: ditto.
vm_method.c: ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64035 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This optimization was reverted on r63863, but this commit resurrects the
optimization to skip some sp motions on JIT execution.
tool/ruby_vm/views/_mjit_compile_insn_body.erb: ditto
tool/ruby_vm/views/_mjit_compile_insn.erb: ditto
insns.def: resurrect handles_frame as handles_stack, which was deleted
on r63763.
tool/ruby_vm/models/bare_instructions.rb: ditto
vm_insnhelper.c: prevent moving sp outside insns.def to allow modifying
it by JIT.
* Optcarrot benchmark
$ benchmark-driver benchmark.yml --rbenv 'before --jit;after --jit' --repeat-count 12 -v
before --jit: ruby 2.6.0dev (2018-07-17 trunk 63987) +JIT [x86_64-linux]
after --jit: ruby 2.6.0dev (2018-07-17 local-stack 63987) +JIT [x86_64-linux]
last_commit=mjit_compile.c: resurrect local variable stack
Calculating -------------------------------------
before --jit after --jit
Optcarrot Lan_Master.nes 70.518 72.144 fps
Comparison:
Optcarrot Lan_Master.nes
after --jit: 72.1 fps
before --jit: 70.5 fps - 1.02x slower
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63988 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
I introduced this mechanism in r62051 to speed things up. Later it
was reported that the change causes problems. I searched for
workarounds but nothing seemed appropriate. I hereby officially
give it up. The idea to move ADD_PC around was a mistake.
Fixes [Bug #14809] and [Bug #14834].
Signed-off-by: Urabe, Shyouhei <shyouhei@ruby-lang.org>
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63763 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This is a pure refactoring. I see no difference in this change.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63756 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This reverts r63249 (revert r63212) and fixes a bug in it. The test to
prevent the bug is added as well.
vm_insnhelper.c: add `index` argument to vm_getivar. The argument is
created so that MJIT can pass the value of `cc->aux.index` on compilation
time. The cache invalidation in _mjit_compile_send_guard.erb is only
working for the cache value on compilation time.
Note: As `index` is always passed as constant and it's force-inlined,
the performance of `vm_getivar` won't be degraded in VM.
_mjit_compile_send_guard.erb: New. Used to invalidate inlined values of cc.
common.mk: update dependencies for _mjit_compile_send_guard.erb
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63333 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
We need to mark default values for kwarg methods. This also fixes
Bootsnap. IBF iseq loading needed to mark iseqs as "having markable
objects".
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62851 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Directly marking iseq operands allows us to eliminate the "mark array"
stored on ISEQ objects, which will reduce the amount of memory ISEQ
objects consume. This patch changes the iseq mark function to:
* Directly marks ISEQ operands
* Iterate over and mark child ISEQs
It also introduces two flags on the ISEQ object. In order to mark
instruction operands, we have to disassemble the instructions and find
the instruction parameters and types. Instructions may also be
translated to jump addresses. Instruction sequences may get marked by
the GC *while* they're mid flight (being compiled). The
`ISEQ_TRANSLATED` flag is used to indicate whether or not the
instructions have been translated to jump addresses so that when we
decode the instructions we know whether or not we need to go from jump
location back to original instruction or not.
Not all ISEQ objects have any markable objects embedded in their
instructions. We can detect whether or not an ISEQ has markable objects
in the instructions at compile time. If the instructions contain
markable objects, we set a flag `ISEQ_MARKABLE_ISEQ` on the ISEQ object.
This means that during the mark phase, we can skip decompilation if the
flag is *not* set. In other words, we can avoid decompilation of we
know in advance there is nothing to mark.
`once` instructions have an operand that contains the result of a
one-time compilation of a regex. Before this patch, that operand was
called an "inline cache", even though the struct was actually an "inline
storage". This patch changes the operand to be an "inline storage" so
that we can differentiate between caches that need marking (the inline
storage) and caches that don't need marking (inline cache).
[ruby-core:84909]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62706 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* vm_insnhelper.c (vm_call_opt_block_call): get block handler from
the method local frame. fix segfault at calling the proxy in
rescue. http://twitter.com/wannabe53/status/970955247626567680
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62680 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
if catch_except_p is FALSE. If catch_except_p is TRUE, stack values
should be on VM's stack when exception is thrown and the JIT-ed frame
is re-executed by VM's exception handler. If it's FALSE, the JIT-ed
frame won't be re-executed and don't need to keep values on VM's stack.
Using local variables allows us to reduce cfp->sp motion. Moving cfp->sp
is needed only for insns whose handles_frame? is false. So it improves
performance.
_mjit_compile_insn.erb: Prepare `stack_size` variable for GET_SP,
STACK_ADDR_FROM_TOP, TOPN macros. Share pc and sp motion partial view.
Use cancel handler created in mjit_compile.c.
_mjit_compile_send.erb: ditto. Also, when iseq->body->catch_except_p is
TRUE, this stops to call mjit_exec directly. I described the reason in
vm_insnhelper.h's comment for EXEC_EC_CFP.
_mjit_compile_pc_and_sp.erb: Shared logic for moving sp and pc. As you
can see from thsi file, when status->local_stack_p is TRUE and
insn.handles_frame? is false, moving sp is skipped. But if
insn.handles_frame? is true, values should be rolled back to VM's stack.
common.mk: add dependency for the file
_mjit_compile_insn_body.erb: Set sp value before canceling JIT on
DISPATCH_ORIGINAL_INSN. Replace GET_SP, STACK_ADDR_FROM_TOP, TOPN macros
for the case ocal_stack_p is TRUE and insn.handles_frame? is false.
In that case, values are not available on VM's stack and those macros
should be replaced.
mjit_compile.inc.erb: updated comments of macros which are supported by
JIT compiler. All references to `cfp->sp` should be replaced and thus
INC_SP, SET_SV, PUSH are no longer supported for now, because they are
not used now.
vm_exec.h: moved EXEC_EC_CFP definition to vm_insnhelper.h because it's
tighly coupled to CALL_METHOD.
vm_insnhelper.h: Have revised EXEC_EC_CFP definition moved from vm_exec.h.
Now it triggers mjit_exec for VM, and has the guard for catch_except_p
on JIT-ed code. See comments for details. CALL_METHOD delegates
triggering mjit_exec to EXEC_EC_CFP.
insns.def: Stopped using EXEC_EC_CFP for the case we don't want to
trigger mjit_exec. Those insns (defineclass, opt_call_c_function) are
not supported by JIT and it's safe to use RESTORE_REGS(), NEXT_INSN().
expandarray is changed to pass GET_SP() to replace the macro in
_mjit_compile_insn_body.erb.
vm_insnhelper.c: change to take sp for the above reason.
[close https://github.com/ruby/ruby/pull/1828]
This patch resurrects the performance which was attached in
[Feature #14235].
* Benchmark
Optcarrot (with configuration for benchmark_driver.gem)
https://github.com/benchmark-driver/optcarrot
$ benchmark-driver benchmark.yml --verbose 1 --rbenv 'before;before+JIT::before,--jit;after;after+JIT::after,--jit' --repeat-count 10
before: ruby 2.6.0dev (2018-03-04 trunk 62652) [x86_64-linux]
before+JIT: ruby 2.6.0dev (2018-03-04 trunk 62652) +JIT [x86_64-linux]
after: ruby 2.6.0dev (2018-03-04 local-variable.. 62652) [x86_64-linux]
last_commit=mjit_compile.c: use local variables for stack
after+JIT: ruby 2.6.0dev (2018-03-04 local-variable.. 62652) +JIT [x86_64-linux]
last_commit=mjit_compile.c: use local variables for stack
Calculating -------------------------------------
before before+JIT after after+JIT
optcarrot 53.552 59.680 53.697 63.358 fps
Comparison:
optcarrot
after+JIT: 63.4 fps
before+JIT: 59.7 fps - 1.06x slower
after: 53.7 fps - 1.18x slower
before: 53.6 fps - 1.18x slower
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62655 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* vm_insnhelper.c: instructions info are not used in jit source
code. resolved a warning by transform_mjit_header.rb.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62505 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* vm.c: include dummy dtrace probes header in jit header.
* vm_insnhelper.c: probes headers are included by vm.c.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62489 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* vm_insnhelper.c (vm_get_ev_const): add flag argument of
`rb_autoloading_value`.
* constant.h (rb_autoloading_value): moved the declaration from
vm_core.h for `rb_const_flag_t`. [ruby-core:85516] [Bug #14469]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62394 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* insns.def (getinlinecache): Qnil is a valid value as a constant.
this can be observable when accessing a deprecated constant
which is nil. non-nil constant is warned just once for each
location, but every time if it is nil.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62350 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
which is started to be used by mjit_compile.c in r62197.
Related to r62235, this intends to transform the function to static.
Of course we shouldn't pollute the namespace anyway.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62237 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
which has been developed by Takashi Kokubun <takashikkbn@gmail> as
YARV-MJIT. Many of its bugs are fixed by wanabe <s.wanabe@gmail.com>.
This JIT compiler is designed to be a safe migration path to introduce
JIT compiler to MRI. So this commit does not include any bytecode
changes or dynamic instruction modifications, which are done in original
MJIT.
This commit even strips off some aggressive optimizations from
YARV-MJIT, and thus it's slower than YARV-MJIT too. But it's still
fairly faster than Ruby 2.5 in some benchmarks (attached below).
Note that this JIT compiler passes `make test`, `make test-all`, `make
test-spec` without JIT, and even with JIT. Not only it's perfectly safe
with JIT disabled because it does not replace VM instructions unlike
MJIT, but also with JIT enabled it stably runs Ruby applications
including Rails applications.
I'm expecting this version as just "initial" JIT compiler. I have many
optimization ideas which are skipped for initial merging, and you may
easily replace this JIT compiler with a faster one by just replacing
mjit_compile.c. `mjit_compile` interface is designed for the purpose.
common.mk: update dependencies for mjit_compile.c.
internal.h: declare `rb_vm_insn_addr2insn` for MJIT.
vm.c: exclude some definitions if `-DMJIT_HEADER` is provided to
compiler. This avoids to include some functions which take a long time
to compile, e.g. vm_exec_core. Some of the purpose is achieved in
transform_mjit_header.rb (see `IGNORED_FUNCTIONS`) but others are
manually resolved for now. Load mjit_helper.h for MJIT header.
mjit_helper.h: New. This is a file used only by JIT-ed code. I'll
refactor `mjit_call_cfunc` later.
vm_eval.c: add some #ifdef switches to skip compiling some functions
like Init_vm_eval.
win32/mkexports.rb: export thread/ec functions, which are used by MJIT.
include/ruby/defines.h: add MJIT_FUNC_EXPORTED macro alis to clarify
that a function is exported only for MJIT.
array.c: export a function used by MJIT.
bignum.c: ditto.
class.c: ditto.
compile.c: ditto.
error.c: ditto.
gc.c: ditto.
hash.c: ditto.
iseq.c: ditto.
numeric.c: ditto.
object.c: ditto.
proc.c: ditto.
re.c: ditto.
st.c: ditto.
string.c: ditto.
thread.c: ditto.
variable.c: ditto.
vm_backtrace.c: ditto.
vm_insnhelper.c: ditto.
vm_method.c: ditto.
I would like to improve maintainability of function exports, but I
believe this way is acceptable as initial merging if we clarify the
new exports are for MJIT (so that we can use them as TODO list to fix)
and add unit tests to detect unresolved symbols.
I'll add unit tests of JIT compilations in succeeding commits.
Author: Takashi Kokubun <takashikkbn@gmail.com>
Contributor: wanabe <s.wanabe@gmail.com>
Part of [Feature #14235]
---
* Known issues
* Code generated by gcc is faster than clang. The benchmark may be worse
in macOS. Following benchmark result is provided by gcc w/ Linux.
* Performance is decreased when Google Chrome is running
* JIT can work on MinGW, but it doesn't improve performance at least
in short running benchmark.
* Currently it doesn't perform well with Rails. We'll try to fix this
before release.
---
* Benchmark reslts
Benchmarked with:
Intel 4.0GHz i7-4790K with 16GB memory under x86-64 Ubuntu 8 Cores
- 2.0.0-p0: Ruby 2.0.0-p0
- r62186: Ruby trunk (early 2.6.0), before MJIT changes
- JIT off: On this commit, but without `--jit` option
- JIT on: On this commit, and with `--jit` option
** Optcarrot fps
Benchmark: https://github.com/mame/optcarrot
| |2.0.0-p0 |r62186 |JIT off |JIT on |
|:--------|:--------|:--------|:--------|:--------|
|fps |37.32 |51.46 |51.31 |58.88 |
|vs 2.0.0 |1.00x |1.38x |1.37x |1.58x |
** MJIT benchmarks
Benchmark: https://github.com/benchmark-driver/mjit-benchmarks
(Original: https://github.com/vnmakarov/ruby/tree/rtl_mjit_branch/MJIT-benchmarks)
| |2.0.0-p0 |r62186 |JIT off |JIT on |
|:----------|:--------|:--------|:--------|:--------|
|aread |1.00 |1.09 |1.07 |2.19 |
|aref |1.00 |1.13 |1.11 |2.22 |
|aset |1.00 |1.50 |1.45 |2.64 |
|awrite |1.00 |1.17 |1.13 |2.20 |
|call |1.00 |1.29 |1.26 |2.02 |
|const2 |1.00 |1.10 |1.10 |2.19 |
|const |1.00 |1.11 |1.10 |2.19 |
|fannk |1.00 |1.04 |1.02 |1.00 |
|fib |1.00 |1.32 |1.31 |1.84 |
|ivread |1.00 |1.13 |1.12 |2.43 |
|ivwrite |1.00 |1.23 |1.21 |2.40 |
|mandelbrot |1.00 |1.13 |1.16 |1.28 |
|meteor |1.00 |2.97 |2.92 |3.17 |
|nbody |1.00 |1.17 |1.15 |1.49 |
|nest-ntimes|1.00 |1.22 |1.20 |1.39 |
|nest-while |1.00 |1.10 |1.10 |1.37 |
|norm |1.00 |1.18 |1.16 |1.24 |
|nsvb |1.00 |1.16 |1.16 |1.17 |
|red-black |1.00 |1.02 |0.99 |1.12 |
|sieve |1.00 |1.30 |1.28 |1.62 |
|trees |1.00 |1.14 |1.13 |1.19 |
|while |1.00 |1.12 |1.11 |2.41 |
** Discourse's script/bench.rb
Benchmark: https://github.com/discourse/discourse/blob/v1.8.7/script/bench.rb
NOTE: Rails performance was somehow a little degraded with JIT for now.
We should fix this.
(At least I know opt_aref is performing badly in JIT and I have an idea
to fix it. Please wait for the fix.)
*** JIT off
Your Results: (note for timings- percentile is first, duration is second in millisecs)
categories_admin:
50: 17
75: 18
90: 22
99: 29
home_admin:
50: 21
75: 21
90: 27
99: 40
topic_admin:
50: 17
75: 18
90: 22
99: 32
categories:
50: 35
75: 41
90: 43
99: 77
home:
50: 39
75: 46
90: 49
99: 95
topic:
50: 46
75: 52
90: 56
99: 101
*** JIT on
Your Results: (note for timings- percentile is first, duration is second in millisecs)
categories_admin:
50: 19
75: 21
90: 25
99: 33
home_admin:
50: 24
75: 26
90: 30
99: 35
topic_admin:
50: 19
75: 20
90: 25
99: 30
categories:
50: 40
75: 44
90: 48
99: 76
home:
50: 42
75: 48
90: 51
99: 89
topic:
50: 49
75: 55
90: 58
99: 99
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62197 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Now that sp_inc attributes are officially provided as inline
functions. Why not use them directly from the vm core, not just
by the compiler. By doing so, it is now possible for us to
optimize stack manipulations. We can now know exactly how many
words of stack space an instruction consumes before it actually
does. This changeset deletes some lines from insns.def because
they are no longer needed. As a result it reduces the size of
vm_exec_core function from 32,400 bytes to 32,352 bytes on my
machine.
It seems it does not affect performance:
-----------------------------------------------------------
benchmark results:
minimum results in each 3 measurements.
Execution time (sec)
name before after
loop_for 1.093 1.061
loop_generator 1.156 1.152
loop_times 0.982 0.974
loop_whileloop 0.549 0.587
loop_whileloop2 0.115 0.121
Speedup ratio: compare with the result of `before' (greater is better)
name after
loop_for 1.030
loop_generator 1.003
loop_times 1.008
loop_whileloop 0.935
loop_whileloop2 0.949
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62087 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Instead of using magic numbers, let us define a series of attributes
and use them from the VM core. Proper function declarations makes
these attributes inlined in most modern compilers. On my machine
exact same binary is generated with or without this changeset.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62085 b2dd03c8-39d4-4d8f-98ff-823fe69b080e