The instructions are just for optimization. To clarity the intention,
this change adds the prefix "opt_", like "opt_case_dispatch".
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65600 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
In these expressions `1` is of type `signed int` (cf: ISO/IEC
9899:1990 section 6.1.3.2). The variable (e.g. `num`) is of type
`rb_num_t`, which is in fact `unsigned long`. These two expressions
then exercises the "usual arithmetic conversions" (cf: ISO/IEC
9899:1990 section 6.2.1.5) and both eventually become `unsigned long`.
The two unsigned expressions are then subtracted to generate another
unsigned integer expression (cf: ISO/IEC 9899:1990 section 6.3.6).
This is where integer overflows can occur. OTOH the left hand side of
the assignments are `rb_snum_t` which is `signed long`. The
assignments exercise the "implicit conversion" of "an unsigned integer
is converted to its corresponding signed integer" case (cf: ISO/IEC
9899:1990 section 6.2.1.2), which is "implementation-defined" (read:
not portable).
Casts are the proper way to avoid this problem. Because all
expressions are converted to some integer types before any binary
operations are performed, the assignments now have fully defined
behaviour. These values can never exceed LONG_MAX so the casts must
not lose any information.
See also: https://travis-ci.org/ruby/ruby/jobs/451726874#L4357
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65595 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* transient_heap.c, transient_heap.h: implement TransientHeap (theap).
theap is designed for Ruby's object system. theap is like Eden heap
on generational GC terminology. theap allocation is very fast because
it only needs to bump up pointer and deallocation is also fast because
we don't do anything. However we need to evacuate (Copy GC terminology)
if theap memory is long-lived. Evacuation logic is needed for each type.
See [Bug #14858] for details.
* array.c: Now, theap for T_ARRAY is supported.
ary_heap_alloc() tries to allocate memory area from theap. If this trial
sccesses, this array has theap ptr and RARRAY_TRANSIENT_FLAG is turned on.
We don't need to free theap ptr.
* ruby.h: RARRAY_CONST_PTR() returns malloc'ed memory area. It menas that
if ary is allocated at theap, force evacuation to malloc'ed memory.
It makes programs slow, but very compatible with current code because
theap memory can be evacuated (theap memory will be recycled).
If you want to get transient heap ptr, use RARRAY_CONST_PTR_TRANSIENT()
instead of RARRAY_CONST_PTR(). If you can't understand when evacuation
will occur, use RARRAY_CONST_PTR().
(re-commit of r65444)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65449 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* transient_heap.c, transient_heap.h: implement TransientHeap (theap).
theap is designed for Ruby's object system. theap is like Eden heap
on generational GC terminology. theap allocation is very fast because
it only needs to bump up pointer and deallocation is also fast because
we don't do anything. However we need to evacuate (Copy GC terminology)
if theap memory is long-lived. Evacuation logic is needed for each type.
See [Bug #14858] for details.
* array.c: Now, theap for T_ARRAY is supported.
ary_heap_alloc() tries to allocate memory area from theap. If this trial
sccesses, this array has theap ptr and RARRAY_TRANSIENT_FLAG is turned on.
We don't need to free theap ptr.
* ruby.h: RARRAY_CONST_PTR() returns malloc'ed memory area. It menas that
if ary is allocated at theap, force evacuation to malloc'ed memory.
It makes programs slow, but very compatible with current code because
theap memory can be evacuated (theap memory will be recycled).
If you want to get transient heap ptr, use RARRAY_CONST_PTR_TRANSIENT()
instead of RARRAY_CONST_PTR(). If you can't understand when evacuation
will occur, use RARRAY_CONST_PTR().
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65444 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
The idea behind this commit is that handles_sp and leaf are two
concepts that are not mutually independent. By making one explicitly
depend another, we can reduces the number of lines of codes written,
thus making things concise.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65426 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* insns.def (newhashfromarray): `rb_hash_bulk_insert()` can call
Ruby methods like #hash so that it should not be a leaf insn.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65345 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
The instructions were used only for branch coverage.
Instead, it now uses a trace framework [Feature #14104].
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65225 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* insns.def (opt_send_without_block): reorder insn position because
`opt_str_freeze` insn refer this insn (function) when
OPT_CALL_THREADED_CODE is true.
* vm_opts.h (OPT_THREADED_CODE): introduce new macro to select
threaded code implementation with a compile option (-D...).
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64854 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
because r64849 seems to fix issues which we were confused about.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64850 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
as a workaround to fix the build pipeline broken by r64824,
because optimizing Ruby should be prioritized higher than supporting unused jokes.
In the current build system, exceeding 200 insns somehow crashes C
extension build on some of MinGW environments like "mingw32-make[1]:
*** No rule to make target 'note'. Stop."
https://ci.appveyor.com/project/ruby/ruby/build/9725/job/co4nu9jugm8qwdrp
and on some of Linux environments like "cannot load such file -- stringio (LoadError)"
```
build_install /home/ko1/ruby/src/trunk_gcc5/lib/rubygems/specification.rb:18:in `require': cannot load such file -- stringio (LoadError)
from /home/ko1/ruby/src/trunk_gcc5/lib/rubygems/specification.rb:18:in `<top (required)>'
from /home/ko1/ruby/src/trunk_gcc5/lib/rubygems.rb:1365:in `require'
from /home/ko1/ruby/src/trunk_gcc5/lib/rubygems.rb:1365:in `<module:Gem>'
from /home/ko1/ruby/src/trunk_gcc5/lib/rubygems.rb:116:in `<top (required)>'
from /home/ko1/ruby/src/trunk_gcc5/tool/rbinstall.rb:24:in `require'
from /home/ko1/ruby/src/trunk_gcc5/tool/rbinstall.rb:24:in `<main>'
make: *** [do-install-nodoc] Error 1
```
http://ci.rvm.jp/results/trunk_gcc5@silicon-docker/1353447
This commit removes "bitblt" and "trace_bitblt" insns, which reduces the
number of insns from 202 to 200 and fixes at least the latter build
failure. I hope this fixes the MinGW build failure as well. Let me
confirm the situation on AppVeyor CI.
Note that this is hard to fix because some MinGW environments (MSP-Greg's
MinGW CI on AppVeyor) don't reproduce this and some Linux environments
(including my local machine) don't reproduce it either. Make sure you
have the reproductive environment and confirm it's fixed when reverting
this commit.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64839 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This reverts commit r64829. I'll prepare another temporary fix, but I'll
separately commit that to make it easier to revert that later.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64838 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
not optimizing Array#& and Array#| because vm_insnhelper.c can't easily
inline it (large amount of array.c code would be needed in vm_insnhelper.c)
and the method body is a little complicated compared to Integer's ones.
So I thought only Integer#& and Integer#| have a significant impact,
and eliminating unnecessary branches would contribute to JIT's performance.
vm_insnhelper.c: ditto
tool/transform_mjit_header.rb: make sure these instructions are inlined
on JIT.
compile.c: compile vm_opt_and and vm_opt_or.
id.def: define id for them to be used in compile.c and vm*.c
vm.c: track redefinition of Integer#& and Integer#|
vm_core.h: allow detecting redefinition of & and |
test/ruby/test_jit.rb: test new insns
test/ruby/test_optimization.rb: ditto
* Optcarrot benchmark
This is a kind of experimental thing but I'm committing this since the
performance impact is significant especially on Optcarrot with JIT.
$ benchmark-driver benchmark.yml --rbenv 'before::before --disable-gems;before+JIT::before --disable-gems --jit;after::after --disable-gems;after+JIT::after --disable-gems --jit' -v --repeat-count 24
before: ruby 2.6.0dev (2018-09-24 trunk 64821) [x86_64-linux]
before+JIT: ruby 2.6.0dev (2018-09-24 trunk 64821) +JIT [x86_64-linux]
after: ruby 2.6.0dev (2018-09-24 opt_and 64821) [x86_64-linux]
last_commit=opt_or
after+JIT: ruby 2.6.0dev (2018-09-24 opt_and 64821) +JIT [x86_64-linux]
last_commit=opt_or
Calculating -------------------------------------
before before+JIT after after+JIT
Optcarrot Lan_Master.nes 51.460 66.315 53.023 71.173 fps
Comparison:
Optcarrot Lan_Master.nes
after+JIT: 71.2 fps
before+JIT: 66.3 fps - 1.07x slower
after: 53.0 fps - 1.34x slower
before: 51.5 fps - 1.38x slower
[close https://github.com/ruby/ruby/pull/1963]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64824 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Now that we can say for sure if an instruction calls a method or
not internally, it is now possible to reroute the bugs that
forced us to revert the "move PC around" optimization.
First try: r62051
Reverted: r63763
See also: r63999
----
trunk: ruby 2.6.0dev (2018-09-13 trunk 64736) [x86_64-darwin15]
ours: ruby 2.6.0dev (2018-09-13 trunk 64736) [x86_64-darwin15]
last_commit=move ADD_PC around (take 2)
Calculating -------------------------------------
trunk ours
so_ackermann 1.884 2.278 i/s - 1.000 times in 0.530926s 0.438935s
so_array 1.178 1.157 i/s - 1.000 times in 0.848786s 0.864467s
so_binary_trees 0.176 0.177 i/s - 1.000 times in 5.683895s 5.657707s
so_concatenate 0.220 0.221 i/s - 1.000 times in 4.546896s 4.518949s
so_count_words 6.729 6.470 i/s - 1.000 times in 0.148602s 0.154561s
so_exception 3.324 3.688 i/s - 1.000 times in 0.300872s 0.271147s
so_fannkuch 0.546 0.968 i/s - 1.000 times in 1.831328s 1.033376s
so_fasta 0.541 0.547 i/s - 1.000 times in 1.849923s 1.827091s
so_k_nucleotide 0.800 0.777 i/s - 1.000 times in 1.250635s 1.286295s
so_lists 2.101 1.848 i/s - 1.000 times in 0.475954s 0.541095s
so_mandelbrot 0.435 0.408 i/s - 1.000 times in 2.299328s 2.450535s
so_matrix 1.946 1.912 i/s - 1.000 times in 0.513872s 0.523076s
so_meteor_contest 0.311 0.317 i/s - 1.000 times in 3.219297s 3.152052s
so_nbody 0.746 0.703 i/s - 1.000 times in 1.339815s 1.423441s
so_nested_loop 0.899 0.901 i/s - 1.000 times in 1.111767s 1.109555s
so_nsieve 0.559 0.579 i/s - 1.000 times in 1.787763s 1.726552s
so_nsieve_bits 0.435 0.428 i/s - 1.000 times in 2.296282s 2.333852s
so_object 1.368 1.442 i/s - 1.000 times in 0.731237s 0.693684s
so_partial_sums 0.616 0.546 i/s - 1.000 times in 1.623592s 1.833097s
so_pidigits 0.831 0.832 i/s - 1.000 times in 1.203117s 1.202334s
so_random 2.934 2.724 i/s - 1.000 times in 0.340791s 0.367150s
so_reverse_complement 0.583 0.866 i/s - 1.000 times in 1.714144s 1.154615s
so_sieve 1.829 2.081 i/s - 1.000 times in 0.546607s 0.480562s
so_spectralnorm 0.524 0.558 i/s - 1.000 times in 1.908716s 1.792382s
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64737 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
String#freeze can be redefined to be destructive. While such
redefinition is definitely weird, it should be possible. Resurrect
the string to prepare for that sort of things.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64691 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Simply use DISPATCH_ORIGINAL_INSN instead of rb_funcall. This is,
when possible, overall performant because method dispatch results are
cached inside of CALL_CACHE. Should also be good for JIT.
----
trunk: ruby 2.6.0dev (2018-09-12 trunk 64689) [x86_64-darwin15]
ours: ruby 2.6.0dev (2018-09-12 leaf-insn 64688) [x86_64-darwin15]
last_commit=make opt_str_freeze leaf
Calculating -------------------------------------
trunk ours
vm2_freezestring 5.440M 31.411M i/s - 6.000M times in 1.102968s 0.191017s
Comparison:
vm2_freezestring
ours: 31410864.5 i/s
trunk: 5439865.4 i/s - 5.77x slower
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64690 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This instruction can be written without rb_funcall. It not only boosts
performance of case statements, but also makes room of future JIT
improvements. Because opt_case_dispatch is about optimization this
should not be a bad thing to have.
----
trunk: ruby 2.6.0dev (2018-09-05 trunk 64634) [x86_64-darwin15]
ours: ruby 2.6.0dev (2018-09-12 leaf-insn 64688) [x86_64-darwin15]
last_commit=make opt_case_dispatch leaf
Calculating -------------------------------------
trunk ours
vm2_case_lit 1.366 2.012 i/s - 1.000 times in 0.731839s 0.497008s
Comparison:
vm2_case_lit
ours: 2.0 i/s
trunk: 1.4 i/s - 1.47x slower
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64689 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
An instruction is leaf if it has no rb_funcall inside. In order to
check this property, we introduce stack canary which is a random
number collected at runtime. Stack top is always filled with this
number and checked for stack smashing operations, when VM_CHECK_MODE.
[GH-1947]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64677 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
_mjit_compile_send.erb: simplify code using the change
insns.def: adapt to the interface change
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64281 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This is just a refactoring.
The receiver of "invokesuper" was a boolean to represent if it is ZSUPER
or not. This was used in vm_search_super_method to prohibit ZSUPER call
in define_method. (It is currently prohibited because of the limitation
of the implementation.)
This change removes the hack by introducing an explicit flag,
VM_CALL_SUPER, to signal the information. Now, the implementation of
"invokesuper" is consistent with "send" instruction.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64268 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
because it's more suitable to describe the current behavior now.
tool/ruby_vm/models/bare_instructions.rb: ditto.
tool/ruby_vm/views/_insn_entry.erb: ditto.
tool/ruby_vm/views/_mjit_compile_insn_body.erb: ditto.
tool/ruby_vm/views/_mjit_compile_pc_and_sp.erb: ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64053 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
By the way, the original patch of r63988 was provided by wanabe:
https://github.com/wanabe/ruby/tree/local-stack
but I forgot to add his credit in the previous commit message.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63990 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This optimization was reverted on r63863, but this commit resurrects the
optimization to skip some sp motions on JIT execution.
tool/ruby_vm/views/_mjit_compile_insn_body.erb: ditto
tool/ruby_vm/views/_mjit_compile_insn.erb: ditto
insns.def: resurrect handles_frame as handles_stack, which was deleted
on r63763.
tool/ruby_vm/models/bare_instructions.rb: ditto
vm_insnhelper.c: prevent moving sp outside insns.def to allow modifying
it by JIT.
* Optcarrot benchmark
$ benchmark-driver benchmark.yml --rbenv 'before --jit;after --jit' --repeat-count 12 -v
before --jit: ruby 2.6.0dev (2018-07-17 trunk 63987) +JIT [x86_64-linux]
after --jit: ruby 2.6.0dev (2018-07-17 local-stack 63987) +JIT [x86_64-linux]
last_commit=mjit_compile.c: resurrect local variable stack
Calculating -------------------------------------
before --jit after --jit
Optcarrot Lan_Master.nes 70.518 72.144 fps
Comparison:
Optcarrot Lan_Master.nes
after --jit: 72.1 fps
before --jit: 70.5 fps - 1.02x slower
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63988 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
r63655 was tightly coupled to handle_frames and some assumptions seems
to have been broken by r63763.
To partially resolve Bug#14892, this reverts the optimization for now. I
want to make MJIT CI happy first and then I'll probably retry r63655 by
partially reverting r63763 for sp changes.
The skipped test is not fixed yet.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63863 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
I introduced this mechanism in r62051 to speed things up. Later it
was reported that the change causes problems. I searched for
workarounds but nothing seemed appropriate. I hereby officially
give it up. The idea to move ADD_PC around was a mistake.
Fixes [Bug #14809] and [Bug #14834].
Signed-off-by: Urabe, Shyouhei <shyouhei@ruby-lang.org>
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63763 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This is a pure refactoring. I see no difference in this change.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63756 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* insns.def (checktype): split branchiftype to checktype and
branchif, to make branch condition negation possible.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63225 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
We need to mark default values for kwarg methods. This also fixes
Bootsnap. IBF iseq loading needed to mark iseqs as "having markable
objects".
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62851 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Directly marking iseq operands allows us to eliminate the "mark array"
stored on ISEQ objects, which will reduce the amount of memory ISEQ
objects consume. This patch changes the iseq mark function to:
* Directly marks ISEQ operands
* Iterate over and mark child ISEQs
It also introduces two flags on the ISEQ object. In order to mark
instruction operands, we have to disassemble the instructions and find
the instruction parameters and types. Instructions may also be
translated to jump addresses. Instruction sequences may get marked by
the GC *while* they're mid flight (being compiled). The
`ISEQ_TRANSLATED` flag is used to indicate whether or not the
instructions have been translated to jump addresses so that when we
decode the instructions we know whether or not we need to go from jump
location back to original instruction or not.
Not all ISEQ objects have any markable objects embedded in their
instructions. We can detect whether or not an ISEQ has markable objects
in the instructions at compile time. If the instructions contain
markable objects, we set a flag `ISEQ_MARKABLE_ISEQ` on the ISEQ object.
This means that during the mark phase, we can skip decompilation if the
flag is *not* set. In other words, we can avoid decompilation of we
know in advance there is nothing to mark.
`once` instructions have an operand that contains the result of a
one-time compilation of a regex. Before this patch, that operand was
called an "inline cache", even though the struct was actually an "inline
storage". This patch changes the operand to be an "inline storage" so
that we can differentiate between caches that need marking (the inline
storage) and caches that don't need marking (inline cache).
[ruby-core:84909]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62706 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
if catch_except_p is FALSE. If catch_except_p is TRUE, stack values
should be on VM's stack when exception is thrown and the JIT-ed frame
is re-executed by VM's exception handler. If it's FALSE, the JIT-ed
frame won't be re-executed and don't need to keep values on VM's stack.
Using local variables allows us to reduce cfp->sp motion. Moving cfp->sp
is needed only for insns whose handles_frame? is false. So it improves
performance.
_mjit_compile_insn.erb: Prepare `stack_size` variable for GET_SP,
STACK_ADDR_FROM_TOP, TOPN macros. Share pc and sp motion partial view.
Use cancel handler created in mjit_compile.c.
_mjit_compile_send.erb: ditto. Also, when iseq->body->catch_except_p is
TRUE, this stops to call mjit_exec directly. I described the reason in
vm_insnhelper.h's comment for EXEC_EC_CFP.
_mjit_compile_pc_and_sp.erb: Shared logic for moving sp and pc. As you
can see from thsi file, when status->local_stack_p is TRUE and
insn.handles_frame? is false, moving sp is skipped. But if
insn.handles_frame? is true, values should be rolled back to VM's stack.
common.mk: add dependency for the file
_mjit_compile_insn_body.erb: Set sp value before canceling JIT on
DISPATCH_ORIGINAL_INSN. Replace GET_SP, STACK_ADDR_FROM_TOP, TOPN macros
for the case ocal_stack_p is TRUE and insn.handles_frame? is false.
In that case, values are not available on VM's stack and those macros
should be replaced.
mjit_compile.inc.erb: updated comments of macros which are supported by
JIT compiler. All references to `cfp->sp` should be replaced and thus
INC_SP, SET_SV, PUSH are no longer supported for now, because they are
not used now.
vm_exec.h: moved EXEC_EC_CFP definition to vm_insnhelper.h because it's
tighly coupled to CALL_METHOD.
vm_insnhelper.h: Have revised EXEC_EC_CFP definition moved from vm_exec.h.
Now it triggers mjit_exec for VM, and has the guard for catch_except_p
on JIT-ed code. See comments for details. CALL_METHOD delegates
triggering mjit_exec to EXEC_EC_CFP.
insns.def: Stopped using EXEC_EC_CFP for the case we don't want to
trigger mjit_exec. Those insns (defineclass, opt_call_c_function) are
not supported by JIT and it's safe to use RESTORE_REGS(), NEXT_INSN().
expandarray is changed to pass GET_SP() to replace the macro in
_mjit_compile_insn_body.erb.
vm_insnhelper.c: change to take sp for the above reason.
[close https://github.com/ruby/ruby/pull/1828]
This patch resurrects the performance which was attached in
[Feature #14235].
* Benchmark
Optcarrot (with configuration for benchmark_driver.gem)
https://github.com/benchmark-driver/optcarrot
$ benchmark-driver benchmark.yml --verbose 1 --rbenv 'before;before+JIT::before,--jit;after;after+JIT::after,--jit' --repeat-count 10
before: ruby 2.6.0dev (2018-03-04 trunk 62652) [x86_64-linux]
before+JIT: ruby 2.6.0dev (2018-03-04 trunk 62652) +JIT [x86_64-linux]
after: ruby 2.6.0dev (2018-03-04 local-variable.. 62652) [x86_64-linux]
last_commit=mjit_compile.c: use local variables for stack
after+JIT: ruby 2.6.0dev (2018-03-04 local-variable.. 62652) +JIT [x86_64-linux]
last_commit=mjit_compile.c: use local variables for stack
Calculating -------------------------------------
before before+JIT after after+JIT
optcarrot 53.552 59.680 53.697 63.358 fps
Comparison:
optcarrot
after+JIT: 63.4 fps
before+JIT: 59.7 fps - 1.06x slower
after: 53.7 fps - 1.18x slower
before: 53.6 fps - 1.18x slower
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62655 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* insns.def (getinlinecache): Qnil is a valid value as a constant.
this can be observable when accessing a deprecated constant
which is nil. non-nil constant is warned just once for each
location, but every time if it is nil.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62350 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
which has been developed by Takashi Kokubun <takashikkbn@gmail> as
YARV-MJIT. Many of its bugs are fixed by wanabe <s.wanabe@gmail.com>.
This JIT compiler is designed to be a safe migration path to introduce
JIT compiler to MRI. So this commit does not include any bytecode
changes or dynamic instruction modifications, which are done in original
MJIT.
This commit even strips off some aggressive optimizations from
YARV-MJIT, and thus it's slower than YARV-MJIT too. But it's still
fairly faster than Ruby 2.5 in some benchmarks (attached below).
Note that this JIT compiler passes `make test`, `make test-all`, `make
test-spec` without JIT, and even with JIT. Not only it's perfectly safe
with JIT disabled because it does not replace VM instructions unlike
MJIT, but also with JIT enabled it stably runs Ruby applications
including Rails applications.
I'm expecting this version as just "initial" JIT compiler. I have many
optimization ideas which are skipped for initial merging, and you may
easily replace this JIT compiler with a faster one by just replacing
mjit_compile.c. `mjit_compile` interface is designed for the purpose.
common.mk: update dependencies for mjit_compile.c.
internal.h: declare `rb_vm_insn_addr2insn` for MJIT.
vm.c: exclude some definitions if `-DMJIT_HEADER` is provided to
compiler. This avoids to include some functions which take a long time
to compile, e.g. vm_exec_core. Some of the purpose is achieved in
transform_mjit_header.rb (see `IGNORED_FUNCTIONS`) but others are
manually resolved for now. Load mjit_helper.h for MJIT header.
mjit_helper.h: New. This is a file used only by JIT-ed code. I'll
refactor `mjit_call_cfunc` later.
vm_eval.c: add some #ifdef switches to skip compiling some functions
like Init_vm_eval.
win32/mkexports.rb: export thread/ec functions, which are used by MJIT.
include/ruby/defines.h: add MJIT_FUNC_EXPORTED macro alis to clarify
that a function is exported only for MJIT.
array.c: export a function used by MJIT.
bignum.c: ditto.
class.c: ditto.
compile.c: ditto.
error.c: ditto.
gc.c: ditto.
hash.c: ditto.
iseq.c: ditto.
numeric.c: ditto.
object.c: ditto.
proc.c: ditto.
re.c: ditto.
st.c: ditto.
string.c: ditto.
thread.c: ditto.
variable.c: ditto.
vm_backtrace.c: ditto.
vm_insnhelper.c: ditto.
vm_method.c: ditto.
I would like to improve maintainability of function exports, but I
believe this way is acceptable as initial merging if we clarify the
new exports are for MJIT (so that we can use them as TODO list to fix)
and add unit tests to detect unresolved symbols.
I'll add unit tests of JIT compilations in succeeding commits.
Author: Takashi Kokubun <takashikkbn@gmail.com>
Contributor: wanabe <s.wanabe@gmail.com>
Part of [Feature #14235]
---
* Known issues
* Code generated by gcc is faster than clang. The benchmark may be worse
in macOS. Following benchmark result is provided by gcc w/ Linux.
* Performance is decreased when Google Chrome is running
* JIT can work on MinGW, but it doesn't improve performance at least
in short running benchmark.
* Currently it doesn't perform well with Rails. We'll try to fix this
before release.
---
* Benchmark reslts
Benchmarked with:
Intel 4.0GHz i7-4790K with 16GB memory under x86-64 Ubuntu 8 Cores
- 2.0.0-p0: Ruby 2.0.0-p0
- r62186: Ruby trunk (early 2.6.0), before MJIT changes
- JIT off: On this commit, but without `--jit` option
- JIT on: On this commit, and with `--jit` option
** Optcarrot fps
Benchmark: https://github.com/mame/optcarrot
| |2.0.0-p0 |r62186 |JIT off |JIT on |
|:--------|:--------|:--------|:--------|:--------|
|fps |37.32 |51.46 |51.31 |58.88 |
|vs 2.0.0 |1.00x |1.38x |1.37x |1.58x |
** MJIT benchmarks
Benchmark: https://github.com/benchmark-driver/mjit-benchmarks
(Original: https://github.com/vnmakarov/ruby/tree/rtl_mjit_branch/MJIT-benchmarks)
| |2.0.0-p0 |r62186 |JIT off |JIT on |
|:----------|:--------|:--------|:--------|:--------|
|aread |1.00 |1.09 |1.07 |2.19 |
|aref |1.00 |1.13 |1.11 |2.22 |
|aset |1.00 |1.50 |1.45 |2.64 |
|awrite |1.00 |1.17 |1.13 |2.20 |
|call |1.00 |1.29 |1.26 |2.02 |
|const2 |1.00 |1.10 |1.10 |2.19 |
|const |1.00 |1.11 |1.10 |2.19 |
|fannk |1.00 |1.04 |1.02 |1.00 |
|fib |1.00 |1.32 |1.31 |1.84 |
|ivread |1.00 |1.13 |1.12 |2.43 |
|ivwrite |1.00 |1.23 |1.21 |2.40 |
|mandelbrot |1.00 |1.13 |1.16 |1.28 |
|meteor |1.00 |2.97 |2.92 |3.17 |
|nbody |1.00 |1.17 |1.15 |1.49 |
|nest-ntimes|1.00 |1.22 |1.20 |1.39 |
|nest-while |1.00 |1.10 |1.10 |1.37 |
|norm |1.00 |1.18 |1.16 |1.24 |
|nsvb |1.00 |1.16 |1.16 |1.17 |
|red-black |1.00 |1.02 |0.99 |1.12 |
|sieve |1.00 |1.30 |1.28 |1.62 |
|trees |1.00 |1.14 |1.13 |1.19 |
|while |1.00 |1.12 |1.11 |2.41 |
** Discourse's script/bench.rb
Benchmark: https://github.com/discourse/discourse/blob/v1.8.7/script/bench.rb
NOTE: Rails performance was somehow a little degraded with JIT for now.
We should fix this.
(At least I know opt_aref is performing badly in JIT and I have an idea
to fix it. Please wait for the fix.)
*** JIT off
Your Results: (note for timings- percentile is first, duration is second in millisecs)
categories_admin:
50: 17
75: 18
90: 22
99: 29
home_admin:
50: 21
75: 21
90: 27
99: 40
topic_admin:
50: 17
75: 18
90: 22
99: 32
categories:
50: 35
75: 41
90: 43
99: 77
home:
50: 39
75: 46
90: 49
99: 95
topic:
50: 46
75: 52
90: 56
99: 101
*** JIT on
Your Results: (note for timings- percentile is first, duration is second in millisecs)
categories_admin:
50: 19
75: 21
90: 25
99: 33
home_admin:
50: 24
75: 26
90: 30
99: 35
topic_admin:
50: 19
75: 20
90: 25
99: 30
categories:
50: 40
75: 44
90: 48
99: 76
home:
50: 42
75: 48
90: 51
99: 89
topic:
50: 49
75: 55
90: 58
99: 99
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62197 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Now that sp_inc attributes are officially provided as inline
functions. Why not use them directly from the vm core, not just
by the compiler. By doing so, it is now possible for us to
optimize stack manipulations. We can now know exactly how many
words of stack space an instruction consumes before it actually
does. This changeset deletes some lines from insns.def because
they are no longer needed. As a result it reduces the size of
vm_exec_core function from 32,400 bytes to 32,352 bytes on my
machine.
It seems it does not affect performance:
-----------------------------------------------------------
benchmark results:
minimum results in each 3 measurements.
Execution time (sec)
name before after
loop_for 1.093 1.061
loop_generator 1.156 1.152
loop_times 0.982 0.974
loop_whileloop 0.549 0.587
loop_whileloop2 0.115 0.121
Speedup ratio: compare with the result of `before' (greater is better)
name after
loop_for 1.030
loop_generator 1.003
loop_times 1.008
loop_whileloop 0.935
loop_whileloop2 0.949
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62087 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
It says "warning C4146: unary minus operator applied
to unsigned type, result still unsigned"
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61794 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
- Gave up @j comments
- Room for sp_inc to be a proper grammer element
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61782 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
- Gave up @j comments
- Room for sp_inc to be a proper grammer element
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61729 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* insns.def (getblockparamproxy): introduce new instruction to return
the `rb_block_param_proxy` object if possible. This object responds
to `call` method and invoke given block (completely similar to `yield`).
* method.h (OPTIMIZED_METHOD_TYPE_BLOCK_CALL): add new optimized call type
which is for `rb_block_param_proxy.cal`.
* vm_insnhelper.c (vm_call_method_each_type): ditto.
* vm_insnhelper.c (vm_call_opt_block_call): ditto.
* vm_core.h (BOP_CALL, PROC_REDEFINED_OP_FLAG): add check for `Proc#call`
redefinition.
* compile.c (iseq_compile_each0): compile to use new insn
`getblockparamproxy` for method call.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61659 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* vm_insnhelper.c (vm_call_opt_call): do same process of `yield` instead of
invoking `Proc`.
* vm_insnhelper.c (vm_invoke_block): invoke given block handler instead of
using a block handler in the current frame.
Also do not check blcok handler here (caller should check it).
* insns.def (invokeblock): catch up this fix.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61624 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* insns.def (checkkeyword): adjust argument type to
vm_check_keyword as lindex_t.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61420 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2.5's line coverage measurement was about two times slower than 2.4
because of two reasons; (1) vm_trace uses rb_iseq_event_flags (which
takes O(n) currently where n is the length of iseq) to get an event
type, and (2) RUBY_EVENT_LINE uses setjmp to call an event hook.
This change adds a special event for line coverage,
RUBY_EVENT_COVERAGE_LINE, and adds `tracecoverage` instructions where
the event occurs in iseq.
`tracecoverage` instruction calls an event hook without vm_trace.
And, RUBY_EVENT_COVERAGE_LINE is an internal event which does not
use setjmp.
This change also cancells lineno change due to the deletion of trace
instructions [Feature #14104]. So fixes [Bug #14191].
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61350 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* tool/instruction.rb: create `trace_` prefix instructions.
* compile.c (ADD_TRACE): do not add `trace` instructions but add
TRACE link elements. TRACE elements will be unified with a next
instruction as instruction information.
* vm_trace.c (update_global_event_hook): modify all ISeqs when
hooks are enabled.
* iseq.c (rb_iseq_trace_set): added to toggle `trace_` instructions.
* vm_insnhelper.c (vm_trace): added.
This function is a body of `trace_` prefix instructions.
* vm_insnhelper.h (JUMP): save PC to a control frame.
* insns.def (trace): removed.
* vm_exec.h (INSN_ENTRY_SIG): add debug output (disabled).
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60763 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Fix compile errors for OPT_CALL_THREADED_CODE (in vm_opts.h).
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60493 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* vm_insnhelper.c (vm_search_super_method): accepts `ec` instead of `th`.
Surprisingly, it doesn't use `th` (now `ec`) so this patch is for
the future extension.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60471 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
to represent execution context [Feature #14038]
* vm_core.h (rb_thread_t): rb_thread_t::ec is now a pointer.
There are many code using `th` to represent execution context
(such as cfp, VM stack and so on). To access `ec`, they need to
use `th->ec->...` (adding one indirection) so that we need to
replace them by passing `ec` instead of `th`.
* vm_core.h (GET_EC()): introduced to access current ec. Also
remove `ruby_current_thread` global variable.
* cont.c (rb_context_t): introduce rb_context_t::thread_ptr instead of
rb_context_t::thread_value.
* cont.c (ec_set_vm_stack): added to update vm_stack explicitly.
* cont.c (ec_switch): added to switch ec explicitly.
* cont.c (rb_fiber_close): added to terminate fibers explicitly.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60440 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
[Feature #14045]
* insns.def (getblockparam, setblockparam): add special access
instructions for block parameters.
getblockparam checks VM_FRAME_FLAG_MODIFIED_BLOCK_PARAM and
if it is not set this instruction creates a Proc object from
a given blcok and set VM_FRAME_FLAG_MODIFIED_BLOCK_PARAM.
setblockparam is similar to setlocal, but set
VM_FRAME_FLAG_MODIFIED_BLOCK_PARAM.
* compile.c: use get/setblockparm instead get/setlocal instructions.
Note that they are used for method local block parameters (def m(&b)),
not for block local method parameters (iter{|&b|).
* proc.c (get_local_variable_ptr): creates Proc object for
Binding#local_variable_get/set.
* safe.c (safe_setter): we need to create Proc objects for postponed
block parameters when $SAFE is changed.
* vm_args.c (args_setup_block_parameter): used only for block local blcok
parameters.
* vm_args.c (vm_caller_setup_arg_block): if called with
VM_CALL_ARGS_BLOCKARG_BLOCKPARAM flag then passed block values should be
a block handler.
* test/ruby/test_optimization.rb: add tests.
* benchmark/bm_vm1_blockparam*: added.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60397 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* insns.def (intern): new instruction to turn string into symbol.
opt_call_c_function can not dump.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59951 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* compile.c (iseq_compile_each0): insert to_s method call, so that
refinements activated at the caller should take place.
[Feature #13812]
* insns.def (tostring): fix up converted object to a string,
infect and fallback.
* insns.def (branchiftype): new instruction for conversion.
branches if TOS is an instance of the given type.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59950 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This is needed for passing to the hook function the measuring target
type (line/branch/method) and the site of coverage event fired.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59871 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Sometimes, size of a hash can be calcluated a priori. By providing
such info to the constructor we can avoid unnecessary internal re-
allocations. This can boost for instance creation of hash literals.
[Bug #13861]
Signed-off-by: Urabe, Shyouhei <shyouhei@ruby-lang.org>
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59744 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
is_local argument was introduced on r11639 and removed on r11813.
* insns.def (getinstancevariable, setinstancevariable): Remove a not
exist argument.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59600 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* vm_insnhelper.c (vm_stack_consistency_error): extracted from
insns.def for further info in the future.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59149 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* debug_counter.h: add the following counters:
* lvar_get: counter for lvar get.
* lvar_get_dynamic: counter for lvar get from upper frames.
* lvar_set: coutner for lvar set.
* lvar_set_dynamic: coutner for lvar set from upper frames.
* lvar_set_slowpath: counter for lavr set using slowpath.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58977 b2dd03c8-39d4-4d8f-98ff-823fe69b080e