ruby/insns.def

1371 строка
26 KiB
Modula-2
Исходник Обычный вид История

/* -*- mode:c; style:ruby; coding: utf-8 -*-
insns.def - YARV instruction definitions
$Author: $
created at: 04/01/01 01:17:55 JST
Copyright (C) 2004-2007 Koichi Sasada
Massive rewrite by @shyouhei in 2017.
*/
/* Some comments about this file's contents:
- The new format aims to be editable by C editor of your choice;
your mileage might vary of course.
- Each instructions are in following format:
DEFINE_INSN
instruction_name
(type operand, type operand, ..)
(pop_values, ..)
(return values ..)
// attr type name contents..
{
.. // insn body
}
- Unlike the old format which was line-oriented, you can now place
newlines and comments at liberal positions.
- `DEFINE_INSN` is a keyword.
- An instruction name must be a valid C identifier.
- Operands, pop values, return values are series of either variable
declarations, keyword `void`, or keyword `...`. They are much
like C function declarations.
- Attribute pragmas are optional, and can include arbitrary C
expressions. You can write anything there but as of writing,
supported attributes are:
* sp_inc: Used to dynamically calculate sp increase in
`insn_stack_increase`.
* handles_frame: If it is true, VM moves pc before insn body.
- Attributes can access operands, but not stack (push/pop) variables.
- An instruction's body is a pure C block, copied verbatimly into
the generated C source code.
*/
/* nop */
DEFINE_INSN
nop
()
()
()
{
/* none */
}
/**********************************************************/
/* deal with variables */
/**********************************************************/
/* Get local variable (pointed by `idx' and `level').
'level' indicates the nesting depth from the current block.
*/
DEFINE_INSN
getlocal
(lindex_t idx, rb_num_t level)
()
(VALUE val)
{
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
val = *(vm_get_ep(GET_EP(), level) - idx);
RB_DEBUG_COUNTER_INC(lvar_get);
(void)RB_DEBUG_COUNTER_INC_IF(lvar_get_dynamic, level > 0);
}
/* Set a local variable (pointed to by 'idx') as val.
'level' indicates the nesting depth from the current block.
*/
DEFINE_INSN
setlocal
(lindex_t idx, rb_num_t level)
(VALUE val)
()
{
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
vm_env_write(vm_get_ep(GET_EP(), level), -(int)idx, val);
RB_DEBUG_COUNTER_INC(lvar_set);
(void)RB_DEBUG_COUNTER_INC_IF(lvar_set_dynamic, level > 0);
}
/* Get a block parameter. */
DEFINE_INSN
getblockparam
(lindex_t idx, rb_num_t level)
()
(VALUE val)
{
const VALUE *ep = vm_get_ep(GET_EP(), level);
VM_ASSERT(VM_ENV_LOCAL_P(ep));
if (!VM_ENV_FLAGS(ep, VM_FRAME_FLAG_MODIFIED_BLOCK_PARAM)) {
val = rb_vm_bh_to_procval(ec, VM_ENV_BLOCK_HANDLER(ep));
vm_env_write(ep, -(int)idx, val);
VM_ENV_FLAGS_SET(ep, VM_FRAME_FLAG_MODIFIED_BLOCK_PARAM);
}
else {
val = *(ep - idx);
RB_DEBUG_COUNTER_INC(lvar_get);
(void)RB_DEBUG_COUNTER_INC_IF(lvar_get_dynamic, level > 0);
}
}
/* Set block parameter. */
DEFINE_INSN
setblockparam
(lindex_t idx, rb_num_t level)
(VALUE val)
()
{
const VALUE *ep = vm_get_ep(GET_EP(), level);
VM_ASSERT(VM_ENV_LOCAL_P(ep));
vm_env_write(ep, -(int)idx, val);
RB_DEBUG_COUNTER_INC(lvar_set);
(void)RB_DEBUG_COUNTER_INC_IF(lvar_set_dynamic, level > 0);
VM_ENV_FLAGS_SET(ep, VM_FRAME_FLAG_MODIFIED_BLOCK_PARAM);
}
/* Get special proxy object which only responds to `call` method if the block parameter
represents a iseq/ifunc block. Otherwise, same as `getblockparam`.
*/
DEFINE_INSN
getblockparamproxy
(lindex_t idx, rb_num_t level)
()
(VALUE val)
{
const VALUE *ep = vm_get_ep(GET_EP(), level);
VM_ASSERT(VM_ENV_LOCAL_P(ep));
if (!VM_ENV_FLAGS(ep, VM_FRAME_FLAG_MODIFIED_BLOCK_PARAM)) {
VALUE block_handler = VM_ENV_BLOCK_HANDLER(ep);
if (block_handler) {
switch (vm_block_handler_type(block_handler)) {
case block_handler_type_iseq:
case block_handler_type_ifunc:
val = rb_block_param_proxy;
break;
case block_handler_type_symbol:
val = rb_sym_to_proc(VM_BH_TO_SYMBOL(block_handler));
goto INSN_LABEL(set);
case block_handler_type_proc:
val = VM_BH_TO_PROC(block_handler);
goto INSN_LABEL(set);
default:
VM_UNREACHABLE(getblockparamproxy);
}
}
else {
val = Qnil;
INSN_LABEL(set):
vm_env_write(ep, -(int)idx, val);
VM_ENV_FLAGS_SET(ep, VM_FRAME_FLAG_MODIFIED_BLOCK_PARAM);
}
}
else {
val = *(ep - idx);
RB_DEBUG_COUNTER_INC(lvar_get);
(void)RB_DEBUG_COUNTER_INC_IF(lvar_get_dynamic, level > 0);
}
}
/* Get value of special local variable ($~, $_, ..). */
DEFINE_INSN
getspecial
(rb_num_t key, rb_num_t type)
()
(VALUE val)
{
val = vm_getspecial(ec, GET_LEP(), key, type);
}
/* Set value of special local variable ($~, $_, ...) to obj. */
DEFINE_INSN
setspecial
(rb_num_t key)
(VALUE obj)
()
{
lep_svar_set(ec, GET_LEP(), key, obj);
}
/* Get value of instance variable id of self. */
DEFINE_INSN
getinstancevariable
(ID id, IC ic)
()
(VALUE val)
{
val = vm_getinstancevariable(GET_SELF(), id, ic);
}
/* Set value of instance variable id of self to val. */
DEFINE_INSN
setinstancevariable
(ID id, IC ic)
(VALUE val)
()
{
vm_setinstancevariable(GET_SELF(), id, val, ic);
}
/* Get value of class variable id of klass as val. */
DEFINE_INSN
getclassvariable
(ID id)
()
(VALUE val)
{
val = rb_cvar_get(vm_get_cvar_base(rb_vm_get_cref(GET_EP()), GET_CFP()), id);
}
/* Set value of class variable id of klass as val. */
DEFINE_INSN
setclassvariable
(ID id)
(VALUE val)
()
{
vm_ensure_not_refinement_module(GET_SELF());
rb_cvar_set(vm_get_cvar_base(rb_vm_get_cref(GET_EP()), GET_CFP()), id, val);
}
/* Get constant variable id. If klass is Qnil, constants
are searched in the current scope. If klass is Qfalse, constants
are searched as top level constants. Otherwise, get constant under klass
class or module.
*/
DEFINE_INSN
getconstant
(ID id)
(VALUE klass)
(VALUE val)
{
val = vm_get_ev_const(ec, klass, id, 0);
}
/* Set constant variable id. If klass is Qfalse, constant
is able to access in this scope. if klass is Qnil, set
top level constant. otherwise, set constant under klass
class or module.
*/
DEFINE_INSN
setconstant
(ID id)
(VALUE val, VALUE cbase)
()
{
vm_check_if_namespace(cbase);
vm_ensure_not_refinement_module(GET_SELF());
rb_const_set(cbase, id, val);
}
/* get global variable id. */
DEFINE_INSN
getglobal
(GENTRY entry)
()
(VALUE val)
{
val = GET_GLOBAL((VALUE)entry);
}
/* set global variable id as val. */
DEFINE_INSN
setglobal
(GENTRY entry)
(VALUE val)
()
{
SET_GLOBAL((VALUE)entry, val);
}
/**********************************************************/
/* deal with values */
/**********************************************************/
/* put nil to stack. */
DEFINE_INSN
putnil
()
()
(VALUE val)
{
val = Qnil;
}
/* put self. */
DEFINE_INSN
putself
()
()
(VALUE val)
{
val = GET_SELF();
}
/* put some object.
i.e. Fixnum, true, false, nil, and so on.
*/
DEFINE_INSN
putobject
(VALUE val)
()
(VALUE val)
{
/* */
}
/* put special object. "value_type" is for expansion. */
DEFINE_INSN
putspecialobject
(rb_num_t value_type)
()
(VALUE val)
{
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
enum vm_special_object_type type;
type = (enum vm_special_object_type)value_type;
val = vm_get_special_object(GET_EP(), type);
}
/* put iseq value. */
DEFINE_INSN
putiseq
(ISEQ iseq)
()
(VALUE ret)
{
2015-07-22 01:52:59 +03:00
ret = (VALUE)iseq;
}
/* put string val. string will be copied. */
DEFINE_INSN
putstring
(VALUE str)
()
(VALUE val)
{
val = rb_str_resurrect(str);
}
/* put concatenate strings */
DEFINE_INSN
concatstrings
(rb_num_t num)
(...)
(VALUE val)
// attr rb_snum_t sp_inc = 1 - num;
{
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
val = rb_str_concat_literals(num, STACK_ADDR_FROM_TOP(num));
}
/* push the result of to_s. */
DEFINE_INSN
tostring
()
(VALUE val, VALUE str)
(VALUE val)
{
VALUE rb_obj_as_string_result(VALUE str, VALUE obj);
val = rb_obj_as_string_result(str, val);
}
/* Freeze (dynamically) created strings. if debug_info is given, set it. */
DEFINE_INSN
freezestring
(VALUE debug_info)
(VALUE str)
(VALUE str)
{
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
vm_freezestring(str, debug_info);
}
/* compile str to Regexp and push it.
opt is the option for the Regexp.
*/
DEFINE_INSN
toregexp
(rb_num_t opt, rb_num_t cnt)
(...)
(VALUE val)
// attr rb_snum_t sp_inc = 1 - cnt;
{
VALUE rb_reg_new_ary(VALUE ary, int options);
VALUE rb_ary_tmp_new_from_values(VALUE, long, const VALUE *);
const VALUE ary = rb_ary_tmp_new_from_values(0, cnt, STACK_ADDR_FROM_TOP(cnt));
val = rb_reg_new_ary(ary, (int)opt);
rb_ary_clear(ary);
}
/* intern str to Symbol and push it. */
DEFINE_INSN
intern
()
(VALUE str)
(VALUE sym)
{
sym = rb_str_intern(str);
}
/* put new array initialized with num values on the stack. */
DEFINE_INSN
newarray
(rb_num_t num)
(...)
(VALUE val)
// attr rb_snum_t sp_inc = 1 - num;
{
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
val = rb_ary_new4(num, STACK_ADDR_FROM_TOP(num));
}
/* dup array */
DEFINE_INSN
duparray
(VALUE ary)
()
(VALUE val)
{
val = rb_ary_resurrect(ary);
}
/* if TOS is an array expand, expand it to num objects.
if the number of the array is less than num, push nils to fill.
if it is greater than num, exceeding elements are dropped.
unless TOS is an array, push num - 1 nils.
if flags is non-zero, push the array of the rest elements.
flag: 0x01 - rest args array
flag: 0x02 - for postarg
flag: 0x04 - reverse?
*/
DEFINE_INSN
expandarray
(rb_num_t num, rb_num_t flag)
(..., VALUE ary)
(...)
// attr rb_snum_t sp_inc = num - 1 + (flag & 1 ? 1 : 0);
{
mjit_compile.c: use local variables for stack if catch_except_p is FALSE. If catch_except_p is TRUE, stack values should be on VM's stack when exception is thrown and the JIT-ed frame is re-executed by VM's exception handler. If it's FALSE, the JIT-ed frame won't be re-executed and don't need to keep values on VM's stack. Using local variables allows us to reduce cfp->sp motion. Moving cfp->sp is needed only for insns whose handles_frame? is false. So it improves performance. _mjit_compile_insn.erb: Prepare `stack_size` variable for GET_SP, STACK_ADDR_FROM_TOP, TOPN macros. Share pc and sp motion partial view. Use cancel handler created in mjit_compile.c. _mjit_compile_send.erb: ditto. Also, when iseq->body->catch_except_p is TRUE, this stops to call mjit_exec directly. I described the reason in vm_insnhelper.h's comment for EXEC_EC_CFP. _mjit_compile_pc_and_sp.erb: Shared logic for moving sp and pc. As you can see from thsi file, when status->local_stack_p is TRUE and insn.handles_frame? is false, moving sp is skipped. But if insn.handles_frame? is true, values should be rolled back to VM's stack. common.mk: add dependency for the file _mjit_compile_insn_body.erb: Set sp value before canceling JIT on DISPATCH_ORIGINAL_INSN. Replace GET_SP, STACK_ADDR_FROM_TOP, TOPN macros for the case ocal_stack_p is TRUE and insn.handles_frame? is false. In that case, values are not available on VM's stack and those macros should be replaced. mjit_compile.inc.erb: updated comments of macros which are supported by JIT compiler. All references to `cfp->sp` should be replaced and thus INC_SP, SET_SV, PUSH are no longer supported for now, because they are not used now. vm_exec.h: moved EXEC_EC_CFP definition to vm_insnhelper.h because it's tighly coupled to CALL_METHOD. vm_insnhelper.h: Have revised EXEC_EC_CFP definition moved from vm_exec.h. Now it triggers mjit_exec for VM, and has the guard for catch_except_p on JIT-ed code. See comments for details. CALL_METHOD delegates triggering mjit_exec to EXEC_EC_CFP. insns.def: Stopped using EXEC_EC_CFP for the case we don't want to trigger mjit_exec. Those insns (defineclass, opt_call_c_function) are not supported by JIT and it's safe to use RESTORE_REGS(), NEXT_INSN(). expandarray is changed to pass GET_SP() to replace the macro in _mjit_compile_insn_body.erb. vm_insnhelper.c: change to take sp for the above reason. [close https://github.com/ruby/ruby/pull/1828] This patch resurrects the performance which was attached in [Feature #14235]. * Benchmark Optcarrot (with configuration for benchmark_driver.gem) https://github.com/benchmark-driver/optcarrot $ benchmark-driver benchmark.yml --verbose 1 --rbenv 'before;before+JIT::before,--jit;after;after+JIT::after,--jit' --repeat-count 10 before: ruby 2.6.0dev (2018-03-04 trunk 62652) [x86_64-linux] before+JIT: ruby 2.6.0dev (2018-03-04 trunk 62652) +JIT [x86_64-linux] after: ruby 2.6.0dev (2018-03-04 local-variable.. 62652) [x86_64-linux] last_commit=mjit_compile.c: use local variables for stack after+JIT: ruby 2.6.0dev (2018-03-04 local-variable.. 62652) +JIT [x86_64-linux] last_commit=mjit_compile.c: use local variables for stack Calculating ------------------------------------- before before+JIT after after+JIT optcarrot 53.552 59.680 53.697 63.358 fps Comparison: optcarrot after+JIT: 63.4 fps before+JIT: 59.7 fps - 1.06x slower after: 53.7 fps - 1.18x slower before: 53.6 fps - 1.18x slower git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62655 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-03-04 10:04:40 +03:00
vm_expandarray(GET_SP(), ary, num, (int)flag);
}
/* concat two arrays */
DEFINE_INSN
concatarray
()
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
(VALUE ary1, VALUE ary2)
(VALUE ary)
{
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
ary = vm_concat_array(ary1, ary2);
}
/* call to_a on array ary to splat */
DEFINE_INSN
splatarray
(VALUE flag)
(VALUE ary)
(VALUE obj)
{
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
obj = vm_splat_array(flag, ary);
}
/* put new Hash from n elements. n must be an even number. */
DEFINE_INSN
newhash
(rb_num_t num)
(...)
(VALUE val)
// attr rb_snum_t sp_inc = 1 - num;
{
RUBY_DTRACE_CREATE_HOOK(HASH, num);
* probes.d: add DTrace probe declarations. [ruby-core:27448] * array.c (empty_ary_alloc, ary_new): added array create DTrace probe. * compile.c (rb_insns_name): allowing DTrace probes to access instruction sequence name. * Makefile.in: translate probes.d file to appropriate header file. * common.mk: declare dependencies on the DTrace header. * configure.in: add a test for existence of DTrace. * eval.c (setup_exception): add a probe for when an exception is raised. * gc.c: Add DTrace probes for mark begin and end, and sweep begin and end. * hash.c (empty_hash_alloc): Add a probe for hash allocation. * insns.def: Add probes for function entry and return. * internal.h: function declaration for compile.c change. * load.c (rb_f_load): add probes for `load` entry and exit, require entry and exit, and wrapping search_required for load path search. * object.c (rb_obj_alloc): added a probe for general object creation. * parse.y (yycompile0): added a probe around parse and compile phase. * string.c (empty_str_alloc, str_new): DTrace probes for string allocation. * test/dtrace/*: tests for DTrace probes. * vm.c (vm_invoke_proc): add probes for function return on exception raise, hash create, and instruction sequence execution. * vm_core.h: add probe declarations for function entry and exit. * vm_dump.c: add probes header file. * vm_eval.c (vm_call0_cfunc, vm_call0_cfunc_with_frame): add probe on function entry and return. * vm_exec.c: expose instruction number to instruction name function. * vm_insnshelper.c: add function entry and exit probes for cfunc methods. * vm_insnhelper.h: vm usage information is always collected, so uncomment the functions. 12 19:14:50 2012 Akinori MUSHA <knu@iDaemons.org> * configure.in (isinf, isnan): isinf() and isnan() are macros on DragonFly which cannot be found by AC_REPLACE_FUNCS(). This workaround enforces the fact that they exist on DragonFly. 12 15:59:38 2012 Shugo Maeda <shugo@ruby-lang.org> * vm_core.h (rb_call_info_t::refinements), compile.c (new_callinfo), vm_insnhelper.c (vm_search_method): revert r37616 because it's too slow. [ruby-dev:46477] * test/ruby/test_refinement.rb (test_inline_method_cache): skip the test until the bug is fixed efficiently. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@37631 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2012-11-13 01:52:12 +04:00
val = rb_hash_new_with_size(num / 2);
if (num) {
rb_hash_bulk_insert(num, STACK_ADDR_FROM_TOP(num), val);
}
}
/* put new Range object.(Range.new(low, high, flag)) */
DEFINE_INSN
newrange
(rb_num_t flag)
(VALUE low, VALUE high)
(VALUE val)
{
val = rb_range_new(low, high, (int)flag);
}
/**********************************************************/
/* deal with stack operation */
/**********************************************************/
/* pop from stack. */
DEFINE_INSN
pop
()
(VALUE val)
()
{
(void)val;
/* none */
}
/* duplicate stack top. */
DEFINE_INSN
dup
()
(VALUE val)
(VALUE val1, VALUE val2)
{
val1 = val2 = val;
}
/* duplicate stack top n elements */
DEFINE_INSN
dupn
(rb_num_t n)
(...)
(...)
// attr rb_snum_t sp_inc = n;
{
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
void *dst = GET_SP();
void *src = STACK_ADDR_FROM_TOP(n);
MEMCPY(dst, src, VALUE, n);
}
/* swap top 2 vals */
DEFINE_INSN
swap
()
(VALUE val, VALUE obj)
(VALUE obj, VALUE val)
{
/* none */
}
/* reverse stack top N order. */
DEFINE_INSN
reverse
(rb_num_t n)
(...)
(...)
// attr rb_snum_t sp_inc = 0;
{
rb_num_t i;
VALUE *sp = STACK_ADDR_FROM_TOP(n);
for (i=0; i<n/2; i++) {
VALUE v0 = sp[i];
VALUE v1 = TOPN(i);
sp[i] = v1;
TOPN(i) = v0;
}
}
/* for stack caching. */
DEFINE_INSN
reput
()
(..., VALUE val)
(VALUE val)
// attr rb_snum_t sp_inc = 0;
{
/* none */
}
/* get nth stack value from stack top */
DEFINE_INSN
topn
(rb_num_t n)
(...)
(VALUE val)
// attr rb_snum_t sp_inc = 1;
{
val = TOPN(n);
}
/* set Nth stack entry to stack top */
DEFINE_INSN
setn
(rb_num_t n)
(..., VALUE val)
(VALUE val)
// attr rb_snum_t sp_inc = 0;
{
TOPN(n) = val;
}
/* empty current stack */
DEFINE_INSN
adjuststack
(rb_num_t n)
(...)
(...)
// attr rb_snum_t sp_inc = -(rb_snum_t)n;
{
/* none */
}
/**********************************************************/
/* deal with setting */
/**********************************************************/
/* defined? */
DEFINE_INSN
defined
(rb_num_t op_type, VALUE obj, VALUE needstr)
(VALUE v)
(VALUE val)
{
val = vm_defined(ec, GET_CFP(), op_type, obj, needstr, v);
}
/* check `target' matches `pattern'.
`flag & VM_CHECKMATCH_TYPE_MASK' describe how to check pattern.
VM_CHECKMATCH_TYPE_WHEN: ignore target and check pattern is truthy.
VM_CHECKMATCH_TYPE_CASE: check `patten === target'.
VM_CHECKMATCH_TYPE_RESCUE: check `pattern.kind_op?(Module) && pattern == target'.
if `flag & VM_CHECKMATCH_ARRAY' is not 0, then `patten' is array of patterns.
*/
DEFINE_INSN
checkmatch
(rb_num_t flag)
(VALUE target, VALUE pattern)
(VALUE result)
{
result = vm_check_match(ec, target, pattern, flag);
}
/* check keywords are specified or not. */
* rewrite method/block parameter fitting logic to optimize keyword arguments/parameters and a splat argument. [Feature #10440] (Details are described in this ticket) Most of complex part is moved to vm_args.c. Now, ISeq#to_a does not catch up new instruction format. * vm_core.h: change iseq data structures. * introduce rb_call_info_kw_arg_t to represent keyword arguments. * add rb_call_info_t::kw_arg. * rename rb_iseq_t::arg_post_len to rb_iseq_t::arg_post_num. * rename rb_iseq_t::arg_keywords to arg_keyword_num. * rename rb_iseq_t::arg_keyword to rb_iseq_t::arg_keyword_bits. to represent keyword bitmap parameter index. This bitmap parameter shows that which keyword parameters are given or not given (0 for given). It is refered by `checkkeyword' instruction described bellow. * rename rb_iseq_t::arg_keyword_check to rb_iseq_t::arg_keyword_rest to represent keyword rest parameter index. * add rb_iseq_t::arg_keyword_default_values to represent default keyword values. * rename VM_CALL_ARGS_SKIP_SETUP to VM_CALL_ARGS_SIMPLE to represent (ci->flag & (SPLAT|BLOCKARG)) && ci->blockiseq == NULL && ci->kw_arg == NULL. * vm_insnhelper.c, vm_args.c: rewrite with refactoring. * rewrite splat argument code. * rewrite keyword arguments/parameters code. * merge method and block parameter fitting code into one code base. * vm.c, vm_eval.c: catch up these changes. * compile.c (new_callinfo): callinfo requires kw_arg parameter. * compile.c (compile_array_): check the last argument Hash object or not. If Hash object and all keys are Symbol literals, they are compiled to keyword arguments. * insns.def (checkkeyword): add new instruction. This instruction check the availability of corresponding keyword. For example, a method "def foo k1: 'v1'; end" is cimpiled to the following instructions. 0000 checkkeyword 2, 0 # check k1 is given. 0003 branchif 9 # if given, jump to address #9 0005 putstring "v1" 0007 setlocal_OP__WC__0 3 # k1 = 'v1' 0009 trace 8 0011 putnil 0012 trace 16 0014 leave * insns.def (opt_send_simple): removed and add new instruction "opt_send_without_block". * parse.y (new_args_tail_gen): reorder variables. Before this patch, a method "def foo(k1: 1, kr1:, k2: 2, **krest, &b)" has parameter variables "k1, kr1, k2, &b, internal_id, krest", but this patch reorders to "kr1, k1, k2, internal_id, krest, &b". (locate a block variable at last) * parse.y (vtable_pop): added. This function remove latest `n' variables from vtable. * iseq.c: catch up iseq data changes. * proc.c: ditto. * class.c (keyword_error): export as rb_keyword_error(). * common.mk: depend vm_args.c for vm.o. * hash.c (rb_hash_has_key): export. * internal.h: ditto. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48239 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-11-02 21:02:55 +03:00
DEFINE_INSN
checkkeyword
(lindex_t kw_bits_index, lindex_t keyword_index)
* rewrite method/block parameter fitting logic to optimize keyword arguments/parameters and a splat argument. [Feature #10440] (Details are described in this ticket) Most of complex part is moved to vm_args.c. Now, ISeq#to_a does not catch up new instruction format. * vm_core.h: change iseq data structures. * introduce rb_call_info_kw_arg_t to represent keyword arguments. * add rb_call_info_t::kw_arg. * rename rb_iseq_t::arg_post_len to rb_iseq_t::arg_post_num. * rename rb_iseq_t::arg_keywords to arg_keyword_num. * rename rb_iseq_t::arg_keyword to rb_iseq_t::arg_keyword_bits. to represent keyword bitmap parameter index. This bitmap parameter shows that which keyword parameters are given or not given (0 for given). It is refered by `checkkeyword' instruction described bellow. * rename rb_iseq_t::arg_keyword_check to rb_iseq_t::arg_keyword_rest to represent keyword rest parameter index. * add rb_iseq_t::arg_keyword_default_values to represent default keyword values. * rename VM_CALL_ARGS_SKIP_SETUP to VM_CALL_ARGS_SIMPLE to represent (ci->flag & (SPLAT|BLOCKARG)) && ci->blockiseq == NULL && ci->kw_arg == NULL. * vm_insnhelper.c, vm_args.c: rewrite with refactoring. * rewrite splat argument code. * rewrite keyword arguments/parameters code. * merge method and block parameter fitting code into one code base. * vm.c, vm_eval.c: catch up these changes. * compile.c (new_callinfo): callinfo requires kw_arg parameter. * compile.c (compile_array_): check the last argument Hash object or not. If Hash object and all keys are Symbol literals, they are compiled to keyword arguments. * insns.def (checkkeyword): add new instruction. This instruction check the availability of corresponding keyword. For example, a method "def foo k1: 'v1'; end" is cimpiled to the following instructions. 0000 checkkeyword 2, 0 # check k1 is given. 0003 branchif 9 # if given, jump to address #9 0005 putstring "v1" 0007 setlocal_OP__WC__0 3 # k1 = 'v1' 0009 trace 8 0011 putnil 0012 trace 16 0014 leave * insns.def (opt_send_simple): removed and add new instruction "opt_send_without_block". * parse.y (new_args_tail_gen): reorder variables. Before this patch, a method "def foo(k1: 1, kr1:, k2: 2, **krest, &b)" has parameter variables "k1, kr1, k2, &b, internal_id, krest", but this patch reorders to "kr1, k1, k2, internal_id, krest, &b". (locate a block variable at last) * parse.y (vtable_pop): added. This function remove latest `n' variables from vtable. * iseq.c: catch up iseq data changes. * proc.c: ditto. * class.c (keyword_error): export as rb_keyword_error(). * common.mk: depend vm_args.c for vm.o. * hash.c (rb_hash_has_key): export. * internal.h: ditto. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48239 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-11-02 21:02:55 +03:00
()
(VALUE ret)
{
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
ret = vm_check_keyword(kw_bits_index, keyword_index, GET_EP());
* rewrite method/block parameter fitting logic to optimize keyword arguments/parameters and a splat argument. [Feature #10440] (Details are described in this ticket) Most of complex part is moved to vm_args.c. Now, ISeq#to_a does not catch up new instruction format. * vm_core.h: change iseq data structures. * introduce rb_call_info_kw_arg_t to represent keyword arguments. * add rb_call_info_t::kw_arg. * rename rb_iseq_t::arg_post_len to rb_iseq_t::arg_post_num. * rename rb_iseq_t::arg_keywords to arg_keyword_num. * rename rb_iseq_t::arg_keyword to rb_iseq_t::arg_keyword_bits. to represent keyword bitmap parameter index. This bitmap parameter shows that which keyword parameters are given or not given (0 for given). It is refered by `checkkeyword' instruction described bellow. * rename rb_iseq_t::arg_keyword_check to rb_iseq_t::arg_keyword_rest to represent keyword rest parameter index. * add rb_iseq_t::arg_keyword_default_values to represent default keyword values. * rename VM_CALL_ARGS_SKIP_SETUP to VM_CALL_ARGS_SIMPLE to represent (ci->flag & (SPLAT|BLOCKARG)) && ci->blockiseq == NULL && ci->kw_arg == NULL. * vm_insnhelper.c, vm_args.c: rewrite with refactoring. * rewrite splat argument code. * rewrite keyword arguments/parameters code. * merge method and block parameter fitting code into one code base. * vm.c, vm_eval.c: catch up these changes. * compile.c (new_callinfo): callinfo requires kw_arg parameter. * compile.c (compile_array_): check the last argument Hash object or not. If Hash object and all keys are Symbol literals, they are compiled to keyword arguments. * insns.def (checkkeyword): add new instruction. This instruction check the availability of corresponding keyword. For example, a method "def foo k1: 'v1'; end" is cimpiled to the following instructions. 0000 checkkeyword 2, 0 # check k1 is given. 0003 branchif 9 # if given, jump to address #9 0005 putstring "v1" 0007 setlocal_OP__WC__0 3 # k1 = 'v1' 0009 trace 8 0011 putnil 0012 trace 16 0014 leave * insns.def (opt_send_simple): removed and add new instruction "opt_send_without_block". * parse.y (new_args_tail_gen): reorder variables. Before this patch, a method "def foo(k1: 1, kr1:, k2: 2, **krest, &b)" has parameter variables "k1, kr1, k2, &b, internal_id, krest", but this patch reorders to "kr1, k1, k2, internal_id, krest, &b". (locate a block variable at last) * parse.y (vtable_pop): added. This function remove latest `n' variables from vtable. * iseq.c: catch up iseq data changes. * proc.c: ditto. * class.c (keyword_error): export as rb_keyword_error(). * common.mk: depend vm_args.c for vm.o. * hash.c (rb_hash_has_key): export. * internal.h: ditto. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48239 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-11-02 21:02:55 +03:00
}
/* fire a coverage event (currently, this is used for line coverage and branch coverage) */
DEFINE_INSN
tracecoverage
(rb_num_t nf, VALUE data)
()
()
{
rb_event_flag_t flag = (rb_event_flag_t)nf;
vm_dtrace(flag, ec);
EXEC_EVENT_HOOK(ec, flag, GET_SELF(), 0, 0, 0 /* id and klass are resolved at callee */, data);
}
/**********************************************************/
/* deal with control flow 1: class/module */
/**********************************************************/
/* enter class definition scope. if super is Qfalse, and class
"klass" is defined, it's redefine. otherwise, define "klass" class.
*/
DEFINE_INSN
defineclass
(ID id, ISEQ class_iseq, rb_num_t flags)
(VALUE cbase, VALUE super)
(VALUE val)
move ADD_PC around to optimize PC manipluiations This commit introduces new attribute handles_flame and if that is _not_ the case, places ADD_PC right after INC_SP. This improves locality of PC manipulations to prevents unnecessary register spill- outs. As a result, it reduces the size of vm_exec_core from 32,688 bytes to 32,384 bytes on my machine. Speedup is very faint, but certain. ----------------------------------------------------------- benchmark results: minimum results in each 3 measurements. Execution time (sec) name before after so_ackermann 0.476 0.464 so_array 0.742 0.728 so_binary_trees 5.493 5.466 so_concatenate 3.619 3.395 so_count_words 0.190 0.184 so_exception 0.249 0.239 so_fannkuch 0.994 0.953 so_fasta 1.369 1.374 so_k_nucleotide 1.111 1.111 so_lists 0.470 0.481 so_mandelbrot 2.059 2.050 so_matrix 0.466 0.465 so_meteor_contest 2.712 2.781 so_nbody 1.154 1.204 so_nested_loop 0.852 0.846 so_nsieve 1.636 1.623 so_nsieve_bits 2.073 2.039 so_object 0.616 0.584 so_partial_sums 1.464 1.481 so_pidigits 1.075 1.082 so_random 0.321 0.317 so_reverse_complement 0.555 0.558 so_sieve 0.495 0.490 so_spectralnorm 1.634 1.627 Speedup ratio: compare with the result of `before' (greater is better) name after so_ackermann 1.025 so_array 1.019 so_binary_trees 1.005 so_concatenate 1.066 so_count_words 1.030 so_exception 1.040 so_fannkuch 1.043 so_fasta 0.996 so_k_nucleotide 1.000 so_lists 0.978 so_mandelbrot 1.004 so_matrix 1.001 so_meteor_contest 0.975 so_nbody 0.959 so_nested_loop 1.007 so_nsieve 1.008 so_nsieve_bits 1.017 so_object 1.056 so_partial_sums 0.989 so_pidigits 0.994 so_random 1.014 so_reverse_complement 0.996 so_sieve 1.010 so_spectralnorm 1.004 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62051 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-26 09:30:58 +03:00
// attr bool handles_frame = true;
{
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
VALUE klass = vm_find_or_create_class_by_id(id, flags, cbase, super);
* introduce new ISeq binary format serializer/de-serializer and a pre-compilation/runtime loader sample. [Feature #11788] * iseq.c: add new methods: * RubyVM::InstructionSequence#to_binary_format(extra_data = nil) * RubyVM::InstructionSequence.from_binary_format(binary) * RubyVM::InstructionSequence.from_binary_format_extra_data(binary) * compile.c: implement body of this new feature. * load.c (rb_load_internal0), iseq.c (rb_iseq_load_iseq): call RubyVM::InstructionSequence.load_iseq(fname) with loading script name if this method is defined. We can return any ISeq object as a result value. Otherwise loading will be continue as usual. This interface is not matured and is not extensible. So that we don't guarantee the future compatibility of this method. Basically, you should'nt use this method. * iseq.h: move ISEQ_MAJOR/MINOR_VERSION (and some definitions) from iseq.c. * encoding.c (rb_data_is_encoding), internal.h: added. * vm_core.h: add several supports for lazy load. * add USE_LAZY_LOAD macro to specify enable or disable of this feature. * add several fields to rb_iseq_t. * introduce new macro rb_iseq_check(). * insns.def: some check for lazy loading feature. * vm_insnhelper.c: ditto. * proc.c: ditto. * vm.c: ditto. * test/lib/iseq_loader_checker.rb: enabled iff suitable environment variables are provided. * test/runner.rb: enable lib/iseq_loader_checker.rb. * sample/iseq_loader.rb: add sample compiler and loader. $ ruby sample/iseq_loader.rb [dir] will compile all ruby scripts in [dir]. With default setting, this compile creates *.rb.yarb files in same directory of target .rb scripts. $ ruby -r sample/iseq_loader.rb [app] will run with enable to load compiled binary data. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52949 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2015-12-08 16:58:50 +03:00
rb_iseq_check(class_iseq);
/* enter scope */
vm_push_frame(ec, class_iseq, VM_FRAME_MAGIC_CLASS | VM_ENV_FLAG_LOCAL, klass,
GET_BLOCK_HANDLER(),
(VALUE)vm_cref_push(ec, klass, NULL, FALSE),
2015-07-22 01:52:59 +03:00
class_iseq->body->iseq_encoded, GET_SP(),
class_iseq->body->local_table_size,
* introduce new ISeq binary format serializer/de-serializer and a pre-compilation/runtime loader sample. [Feature #11788] * iseq.c: add new methods: * RubyVM::InstructionSequence#to_binary_format(extra_data = nil) * RubyVM::InstructionSequence.from_binary_format(binary) * RubyVM::InstructionSequence.from_binary_format_extra_data(binary) * compile.c: implement body of this new feature. * load.c (rb_load_internal0), iseq.c (rb_iseq_load_iseq): call RubyVM::InstructionSequence.load_iseq(fname) with loading script name if this method is defined. We can return any ISeq object as a result value. Otherwise loading will be continue as usual. This interface is not matured and is not extensible. So that we don't guarantee the future compatibility of this method. Basically, you should'nt use this method. * iseq.h: move ISEQ_MAJOR/MINOR_VERSION (and some definitions) from iseq.c. * encoding.c (rb_data_is_encoding), internal.h: added. * vm_core.h: add several supports for lazy load. * add USE_LAZY_LOAD macro to specify enable or disable of this feature. * add several fields to rb_iseq_t. * introduce new macro rb_iseq_check(). * insns.def: some check for lazy loading feature. * vm_insnhelper.c: ditto. * proc.c: ditto. * vm.c: ditto. * test/lib/iseq_loader_checker.rb: enabled iff suitable environment variables are provided. * test/runner.rb: enable lib/iseq_loader_checker.rb. * sample/iseq_loader.rb: add sample compiler and loader. $ ruby sample/iseq_loader.rb [dir] will compile all ruby scripts in [dir]. With default setting, this compile creates *.rb.yarb files in same directory of target .rb scripts. $ ruby -r sample/iseq_loader.rb [app] will run with enable to load compiled binary data. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52949 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2015-12-08 16:58:50 +03:00
class_iseq->body->stack_max);
mjit_compile.c: use local variables for stack if catch_except_p is FALSE. If catch_except_p is TRUE, stack values should be on VM's stack when exception is thrown and the JIT-ed frame is re-executed by VM's exception handler. If it's FALSE, the JIT-ed frame won't be re-executed and don't need to keep values on VM's stack. Using local variables allows us to reduce cfp->sp motion. Moving cfp->sp is needed only for insns whose handles_frame? is false. So it improves performance. _mjit_compile_insn.erb: Prepare `stack_size` variable for GET_SP, STACK_ADDR_FROM_TOP, TOPN macros. Share pc and sp motion partial view. Use cancel handler created in mjit_compile.c. _mjit_compile_send.erb: ditto. Also, when iseq->body->catch_except_p is TRUE, this stops to call mjit_exec directly. I described the reason in vm_insnhelper.h's comment for EXEC_EC_CFP. _mjit_compile_pc_and_sp.erb: Shared logic for moving sp and pc. As you can see from thsi file, when status->local_stack_p is TRUE and insn.handles_frame? is false, moving sp is skipped. But if insn.handles_frame? is true, values should be rolled back to VM's stack. common.mk: add dependency for the file _mjit_compile_insn_body.erb: Set sp value before canceling JIT on DISPATCH_ORIGINAL_INSN. Replace GET_SP, STACK_ADDR_FROM_TOP, TOPN macros for the case ocal_stack_p is TRUE and insn.handles_frame? is false. In that case, values are not available on VM's stack and those macros should be replaced. mjit_compile.inc.erb: updated comments of macros which are supported by JIT compiler. All references to `cfp->sp` should be replaced and thus INC_SP, SET_SV, PUSH are no longer supported for now, because they are not used now. vm_exec.h: moved EXEC_EC_CFP definition to vm_insnhelper.h because it's tighly coupled to CALL_METHOD. vm_insnhelper.h: Have revised EXEC_EC_CFP definition moved from vm_exec.h. Now it triggers mjit_exec for VM, and has the guard for catch_except_p on JIT-ed code. See comments for details. CALL_METHOD delegates triggering mjit_exec to EXEC_EC_CFP. insns.def: Stopped using EXEC_EC_CFP for the case we don't want to trigger mjit_exec. Those insns (defineclass, opt_call_c_function) are not supported by JIT and it's safe to use RESTORE_REGS(), NEXT_INSN(). expandarray is changed to pass GET_SP() to replace the macro in _mjit_compile_insn_body.erb. vm_insnhelper.c: change to take sp for the above reason. [close https://github.com/ruby/ruby/pull/1828] This patch resurrects the performance which was attached in [Feature #14235]. * Benchmark Optcarrot (with configuration for benchmark_driver.gem) https://github.com/benchmark-driver/optcarrot $ benchmark-driver benchmark.yml --verbose 1 --rbenv 'before;before+JIT::before,--jit;after;after+JIT::after,--jit' --repeat-count 10 before: ruby 2.6.0dev (2018-03-04 trunk 62652) [x86_64-linux] before+JIT: ruby 2.6.0dev (2018-03-04 trunk 62652) +JIT [x86_64-linux] after: ruby 2.6.0dev (2018-03-04 local-variable.. 62652) [x86_64-linux] last_commit=mjit_compile.c: use local variables for stack after+JIT: ruby 2.6.0dev (2018-03-04 local-variable.. 62652) +JIT [x86_64-linux] last_commit=mjit_compile.c: use local variables for stack Calculating ------------------------------------- before before+JIT after after+JIT optcarrot 53.552 59.680 53.697 63.358 fps Comparison: optcarrot after+JIT: 63.4 fps before+JIT: 59.7 fps - 1.06x slower after: 53.7 fps - 1.18x slower before: 53.6 fps - 1.18x slower git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62655 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-03-04 10:04:40 +03:00
RESTORE_REGS();
NEXT_INSN();
}
/**********************************************************/
/* deal with control flow 2: method/iterator */
/**********************************************************/
/* invoke method. */
DEFINE_INSN
send
(CALL_INFO ci, CALL_CACHE cc, ISEQ blockiseq)
(...)
(VALUE val)
s/CALL_SIMPLE_METHOD/DISPATCH_ORIGINAL_INSN/ Now that DISPATCH_ORIGINAL_INSN is introduced, we can replace CALL_SIMPLE_METHOD with DISPATCH_ORIGINAL_INSN. These two macros differ in size very much and results in this big difference in compiled binary size. This changeset reduces the size of vm_exec_core from 32,352 bytes to 27,008 bytes on my machine. As a result it yields slightly better performance. Closes [GH-1779]. ----------------------------------------------------------- benchmark results: minimum results in each 3 measurements. Execution time (sec) name before after so_ackermann 0.484 0.454 so_array 0.837 0.779 so_binary_trees 5.928 5.801 so_concatenate 3.473 3.543 so_count_words 0.201 0.222 so_exception 0.255 0.252 so_fannkuch 1.080 1.019 so_fasta 1.459 1.463 so_k_nucleotide 1.218 1.180 so_lists 0.499 0.484 so_mandelbrot 2.189 2.324 so_matrix 0.510 0.496 so_meteor_contest 3.025 2.925 so_nbody 1.319 1.273 so_nested_loop 0.941 0.932 so_nsieve 1.806 1.647 so_nsieve_bits 2.151 2.078 so_object 0.632 0.621 so_partial_sums 1.560 1.632 so_pidigits 1.190 1.183 so_random 0.333 0.353 so_reverse_complement 0.604 0.586 so_sieve 0.521 0.481 so_spectralnorm 1.774 1.722 Speedup ratio: compare with the result of `before' (greater is better) name after so_ackermann 1.065 so_array 1.075 so_binary_trees 1.022 so_concatenate 0.980 so_count_words 0.903 so_exception 1.009 so_fannkuch 1.059 so_fasta 0.997 so_k_nucleotide 1.032 so_lists 1.032 so_mandelbrot 0.942 so_matrix 1.028 so_meteor_contest 1.034 so_nbody 1.036 so_nested_loop 1.009 so_nsieve 1.097 so_nsieve_bits 1.035 so_object 1.018 so_partial_sums 0.956 so_pidigits 1.006 so_random 0.943 so_reverse_complement 1.032 so_sieve 1.083 so_spectralnorm 1.030 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62088 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-29 10:04:50 +03:00
// attr bool handles_frame = true;
// attr rb_snum_t sp_inc = - (int)(ci->orig_argc + ((ci->flag & VM_CALL_ARGS_BLOCKARG) ? 1 : 0));
{
struct rb_calling_info calling;
vm_caller_setup_arg_block(ec, reg_cfp, &calling, ci, blockiseq, FALSE);
vm_search_method(ci, cc, calling.recv = TOPN(calling.argc = ci->orig_argc));
CALL_METHOD(&calling, ci, cc);
}
DEFINE_INSN
opt_str_freeze
(VALUE str)
()
(VALUE val)
{
if (BASIC_OP_UNREDEFINED_P(BOP_FREEZE, STRING_REDEFINED_OP_FLAG)) {
val = str;
}
else {
val = rb_funcall(rb_str_resurrect(str), idFreeze, 0);
}
}
DEFINE_INSN
opt_str_uminus
(VALUE str)
()
(VALUE val)
{
if (BASIC_OP_UNREDEFINED_P(BOP_UMINUS, STRING_REDEFINED_OP_FLAG)) {
val = str;
}
else {
val = rb_funcall(rb_str_resurrect(str), idUMinus, 0);
}
}
DEFINE_INSN
opt_newarray_max
(rb_num_t num)
(...)
(VALUE val)
// attr rb_snum_t sp_inc = 1 - num;
{
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
val = vm_opt_newarray_max(num, STACK_ADDR_FROM_TOP(num));
}
DEFINE_INSN
opt_newarray_min
(rb_num_t num)
(...)
(VALUE val)
// attr rb_snum_t sp_inc = 1 - num;
{
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
val = vm_opt_newarray_min(num, STACK_ADDR_FROM_TOP(num));
}
/* Invoke method without block */
DEFINE_INSN
* rewrite method/block parameter fitting logic to optimize keyword arguments/parameters and a splat argument. [Feature #10440] (Details are described in this ticket) Most of complex part is moved to vm_args.c. Now, ISeq#to_a does not catch up new instruction format. * vm_core.h: change iseq data structures. * introduce rb_call_info_kw_arg_t to represent keyword arguments. * add rb_call_info_t::kw_arg. * rename rb_iseq_t::arg_post_len to rb_iseq_t::arg_post_num. * rename rb_iseq_t::arg_keywords to arg_keyword_num. * rename rb_iseq_t::arg_keyword to rb_iseq_t::arg_keyword_bits. to represent keyword bitmap parameter index. This bitmap parameter shows that which keyword parameters are given or not given (0 for given). It is refered by `checkkeyword' instruction described bellow. * rename rb_iseq_t::arg_keyword_check to rb_iseq_t::arg_keyword_rest to represent keyword rest parameter index. * add rb_iseq_t::arg_keyword_default_values to represent default keyword values. * rename VM_CALL_ARGS_SKIP_SETUP to VM_CALL_ARGS_SIMPLE to represent (ci->flag & (SPLAT|BLOCKARG)) && ci->blockiseq == NULL && ci->kw_arg == NULL. * vm_insnhelper.c, vm_args.c: rewrite with refactoring. * rewrite splat argument code. * rewrite keyword arguments/parameters code. * merge method and block parameter fitting code into one code base. * vm.c, vm_eval.c: catch up these changes. * compile.c (new_callinfo): callinfo requires kw_arg parameter. * compile.c (compile_array_): check the last argument Hash object or not. If Hash object and all keys are Symbol literals, they are compiled to keyword arguments. * insns.def (checkkeyword): add new instruction. This instruction check the availability of corresponding keyword. For example, a method "def foo k1: 'v1'; end" is cimpiled to the following instructions. 0000 checkkeyword 2, 0 # check k1 is given. 0003 branchif 9 # if given, jump to address #9 0005 putstring "v1" 0007 setlocal_OP__WC__0 3 # k1 = 'v1' 0009 trace 8 0011 putnil 0012 trace 16 0014 leave * insns.def (opt_send_simple): removed and add new instruction "opt_send_without_block". * parse.y (new_args_tail_gen): reorder variables. Before this patch, a method "def foo(k1: 1, kr1:, k2: 2, **krest, &b)" has parameter variables "k1, kr1, k2, &b, internal_id, krest", but this patch reorders to "kr1, k1, k2, internal_id, krest, &b". (locate a block variable at last) * parse.y (vtable_pop): added. This function remove latest `n' variables from vtable. * iseq.c: catch up iseq data changes. * proc.c: ditto. * class.c (keyword_error): export as rb_keyword_error(). * common.mk: depend vm_args.c for vm.o. * hash.c (rb_hash_has_key): export. * internal.h: ditto. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48239 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-11-02 21:02:55 +03:00
opt_send_without_block
(CALL_INFO ci, CALL_CACHE cc)
(...)
(VALUE val)
s/CALL_SIMPLE_METHOD/DISPATCH_ORIGINAL_INSN/ Now that DISPATCH_ORIGINAL_INSN is introduced, we can replace CALL_SIMPLE_METHOD with DISPATCH_ORIGINAL_INSN. These two macros differ in size very much and results in this big difference in compiled binary size. This changeset reduces the size of vm_exec_core from 32,352 bytes to 27,008 bytes on my machine. As a result it yields slightly better performance. Closes [GH-1779]. ----------------------------------------------------------- benchmark results: minimum results in each 3 measurements. Execution time (sec) name before after so_ackermann 0.484 0.454 so_array 0.837 0.779 so_binary_trees 5.928 5.801 so_concatenate 3.473 3.543 so_count_words 0.201 0.222 so_exception 0.255 0.252 so_fannkuch 1.080 1.019 so_fasta 1.459 1.463 so_k_nucleotide 1.218 1.180 so_lists 0.499 0.484 so_mandelbrot 2.189 2.324 so_matrix 0.510 0.496 so_meteor_contest 3.025 2.925 so_nbody 1.319 1.273 so_nested_loop 0.941 0.932 so_nsieve 1.806 1.647 so_nsieve_bits 2.151 2.078 so_object 0.632 0.621 so_partial_sums 1.560 1.632 so_pidigits 1.190 1.183 so_random 0.333 0.353 so_reverse_complement 0.604 0.586 so_sieve 0.521 0.481 so_spectralnorm 1.774 1.722 Speedup ratio: compare with the result of `before' (greater is better) name after so_ackermann 1.065 so_array 1.075 so_binary_trees 1.022 so_concatenate 0.980 so_count_words 0.903 so_exception 1.009 so_fannkuch 1.059 so_fasta 0.997 so_k_nucleotide 1.032 so_lists 1.032 so_mandelbrot 0.942 so_matrix 1.028 so_meteor_contest 1.034 so_nbody 1.036 so_nested_loop 1.009 so_nsieve 1.097 so_nsieve_bits 1.035 so_object 1.018 so_partial_sums 0.956 so_pidigits 1.006 so_random 0.943 so_reverse_complement 1.032 so_sieve 1.083 so_spectralnorm 1.030 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62088 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-29 10:04:50 +03:00
// attr bool handles_frame = true;
// attr rb_snum_t sp_inc = -ci->orig_argc;
{
struct rb_calling_info calling;
calling.block_handler = VM_BLOCK_HANDLER_NONE;
vm_search_method(ci, cc, calling.recv = TOPN(calling.argc = ci->orig_argc));
CALL_METHOD(&calling, ci, cc);
}
/* super(args) # args.size => num */
DEFINE_INSN
invokesuper
(CALL_INFO ci, CALL_CACHE cc, ISEQ blockiseq)
(...)
(VALUE val)
s/CALL_SIMPLE_METHOD/DISPATCH_ORIGINAL_INSN/ Now that DISPATCH_ORIGINAL_INSN is introduced, we can replace CALL_SIMPLE_METHOD with DISPATCH_ORIGINAL_INSN. These two macros differ in size very much and results in this big difference in compiled binary size. This changeset reduces the size of vm_exec_core from 32,352 bytes to 27,008 bytes on my machine. As a result it yields slightly better performance. Closes [GH-1779]. ----------------------------------------------------------- benchmark results: minimum results in each 3 measurements. Execution time (sec) name before after so_ackermann 0.484 0.454 so_array 0.837 0.779 so_binary_trees 5.928 5.801 so_concatenate 3.473 3.543 so_count_words 0.201 0.222 so_exception 0.255 0.252 so_fannkuch 1.080 1.019 so_fasta 1.459 1.463 so_k_nucleotide 1.218 1.180 so_lists 0.499 0.484 so_mandelbrot 2.189 2.324 so_matrix 0.510 0.496 so_meteor_contest 3.025 2.925 so_nbody 1.319 1.273 so_nested_loop 0.941 0.932 so_nsieve 1.806 1.647 so_nsieve_bits 2.151 2.078 so_object 0.632 0.621 so_partial_sums 1.560 1.632 so_pidigits 1.190 1.183 so_random 0.333 0.353 so_reverse_complement 0.604 0.586 so_sieve 0.521 0.481 so_spectralnorm 1.774 1.722 Speedup ratio: compare with the result of `before' (greater is better) name after so_ackermann 1.065 so_array 1.075 so_binary_trees 1.022 so_concatenate 0.980 so_count_words 0.903 so_exception 1.009 so_fannkuch 1.059 so_fasta 0.997 so_k_nucleotide 1.032 so_lists 1.032 so_mandelbrot 0.942 so_matrix 1.028 so_meteor_contest 1.034 so_nbody 1.036 so_nested_loop 1.009 so_nsieve 1.097 so_nsieve_bits 1.035 so_object 1.018 so_partial_sums 0.956 so_pidigits 1.006 so_random 0.943 so_reverse_complement 1.032 so_sieve 1.083 so_spectralnorm 1.030 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62088 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-29 10:04:50 +03:00
// attr bool handles_frame = true;
// attr rb_snum_t sp_inc = - (int)(ci->orig_argc + ((ci->flag & VM_CALL_ARGS_BLOCKARG) ? 1 : 0));
{
struct rb_calling_info calling;
calling.argc = ci->orig_argc;
vm_caller_setup_arg_block(ec, reg_cfp, &calling, ci, blockiseq, TRUE);
calling.recv = GET_SELF();
vm_search_super_method(ec, GET_CFP(), &calling, ci, cc);
CALL_METHOD(&calling, ci, cc);
}
/* yield(args) */
DEFINE_INSN
invokeblock
(CALL_INFO ci)
(...)
(VALUE val)
s/CALL_SIMPLE_METHOD/DISPATCH_ORIGINAL_INSN/ Now that DISPATCH_ORIGINAL_INSN is introduced, we can replace CALL_SIMPLE_METHOD with DISPATCH_ORIGINAL_INSN. These two macros differ in size very much and results in this big difference in compiled binary size. This changeset reduces the size of vm_exec_core from 32,352 bytes to 27,008 bytes on my machine. As a result it yields slightly better performance. Closes [GH-1779]. ----------------------------------------------------------- benchmark results: minimum results in each 3 measurements. Execution time (sec) name before after so_ackermann 0.484 0.454 so_array 0.837 0.779 so_binary_trees 5.928 5.801 so_concatenate 3.473 3.543 so_count_words 0.201 0.222 so_exception 0.255 0.252 so_fannkuch 1.080 1.019 so_fasta 1.459 1.463 so_k_nucleotide 1.218 1.180 so_lists 0.499 0.484 so_mandelbrot 2.189 2.324 so_matrix 0.510 0.496 so_meteor_contest 3.025 2.925 so_nbody 1.319 1.273 so_nested_loop 0.941 0.932 so_nsieve 1.806 1.647 so_nsieve_bits 2.151 2.078 so_object 0.632 0.621 so_partial_sums 1.560 1.632 so_pidigits 1.190 1.183 so_random 0.333 0.353 so_reverse_complement 0.604 0.586 so_sieve 0.521 0.481 so_spectralnorm 1.774 1.722 Speedup ratio: compare with the result of `before' (greater is better) name after so_ackermann 1.065 so_array 1.075 so_binary_trees 1.022 so_concatenate 0.980 so_count_words 0.903 so_exception 1.009 so_fannkuch 1.059 so_fasta 0.997 so_k_nucleotide 1.032 so_lists 1.032 so_mandelbrot 0.942 so_matrix 1.028 so_meteor_contest 1.034 so_nbody 1.036 so_nested_loop 1.009 so_nsieve 1.097 so_nsieve_bits 1.035 so_object 1.018 so_partial_sums 0.956 so_pidigits 1.006 so_random 0.943 so_reverse_complement 1.032 so_sieve 1.083 so_spectralnorm 1.030 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62088 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-29 10:04:50 +03:00
// attr bool handles_frame = true;
// attr rb_snum_t sp_inc = 1 - ci->orig_argc;
{
struct rb_calling_info calling;
VALUE block_handler;
calling.argc = ci->orig_argc;
calling.block_handler = VM_BLOCK_HANDLER_NONE;
calling.recv = Qundef; /* should not be used */
block_handler = VM_CF_BLOCK_HANDLER(GET_CFP());
if (block_handler == VM_BLOCK_HANDLER_NONE) {
rb_vm_localjump_error("no block given (yield)", Qnil, 0);
}
val = vm_invoke_block(ec, GET_CFP(), &calling, ci, block_handler);
mjit_compile.c: use local variables for stack if catch_except_p is FALSE. If catch_except_p is TRUE, stack values should be on VM's stack when exception is thrown and the JIT-ed frame is re-executed by VM's exception handler. If it's FALSE, the JIT-ed frame won't be re-executed and don't need to keep values on VM's stack. Using local variables allows us to reduce cfp->sp motion. Moving cfp->sp is needed only for insns whose handles_frame? is false. So it improves performance. _mjit_compile_insn.erb: Prepare `stack_size` variable for GET_SP, STACK_ADDR_FROM_TOP, TOPN macros. Share pc and sp motion partial view. Use cancel handler created in mjit_compile.c. _mjit_compile_send.erb: ditto. Also, when iseq->body->catch_except_p is TRUE, this stops to call mjit_exec directly. I described the reason in vm_insnhelper.h's comment for EXEC_EC_CFP. _mjit_compile_pc_and_sp.erb: Shared logic for moving sp and pc. As you can see from thsi file, when status->local_stack_p is TRUE and insn.handles_frame? is false, moving sp is skipped. But if insn.handles_frame? is true, values should be rolled back to VM's stack. common.mk: add dependency for the file _mjit_compile_insn_body.erb: Set sp value before canceling JIT on DISPATCH_ORIGINAL_INSN. Replace GET_SP, STACK_ADDR_FROM_TOP, TOPN macros for the case ocal_stack_p is TRUE and insn.handles_frame? is false. In that case, values are not available on VM's stack and those macros should be replaced. mjit_compile.inc.erb: updated comments of macros which are supported by JIT compiler. All references to `cfp->sp` should be replaced and thus INC_SP, SET_SV, PUSH are no longer supported for now, because they are not used now. vm_exec.h: moved EXEC_EC_CFP definition to vm_insnhelper.h because it's tighly coupled to CALL_METHOD. vm_insnhelper.h: Have revised EXEC_EC_CFP definition moved from vm_exec.h. Now it triggers mjit_exec for VM, and has the guard for catch_except_p on JIT-ed code. See comments for details. CALL_METHOD delegates triggering mjit_exec to EXEC_EC_CFP. insns.def: Stopped using EXEC_EC_CFP for the case we don't want to trigger mjit_exec. Those insns (defineclass, opt_call_c_function) are not supported by JIT and it's safe to use RESTORE_REGS(), NEXT_INSN(). expandarray is changed to pass GET_SP() to replace the macro in _mjit_compile_insn_body.erb. vm_insnhelper.c: change to take sp for the above reason. [close https://github.com/ruby/ruby/pull/1828] This patch resurrects the performance which was attached in [Feature #14235]. * Benchmark Optcarrot (with configuration for benchmark_driver.gem) https://github.com/benchmark-driver/optcarrot $ benchmark-driver benchmark.yml --verbose 1 --rbenv 'before;before+JIT::before,--jit;after;after+JIT::after,--jit' --repeat-count 10 before: ruby 2.6.0dev (2018-03-04 trunk 62652) [x86_64-linux] before+JIT: ruby 2.6.0dev (2018-03-04 trunk 62652) +JIT [x86_64-linux] after: ruby 2.6.0dev (2018-03-04 local-variable.. 62652) [x86_64-linux] last_commit=mjit_compile.c: use local variables for stack after+JIT: ruby 2.6.0dev (2018-03-04 local-variable.. 62652) +JIT [x86_64-linux] last_commit=mjit_compile.c: use local variables for stack Calculating ------------------------------------- before before+JIT after after+JIT optcarrot 53.552 59.680 53.697 63.358 fps Comparison: optcarrot after+JIT: 63.4 fps before+JIT: 59.7 fps - 1.06x slower after: 53.7 fps - 1.18x slower before: 53.6 fps - 1.18x slower git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62655 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-03-04 10:04:40 +03:00
if (val == Qundef) {
EXEC_EC_CFP(val);
}
}
/* return from this scope. */
DEFINE_INSN
leave
()
(VALUE val)
(VALUE val)
move ADD_PC around to optimize PC manipluiations This commit introduces new attribute handles_flame and if that is _not_ the case, places ADD_PC right after INC_SP. This improves locality of PC manipulations to prevents unnecessary register spill- outs. As a result, it reduces the size of vm_exec_core from 32,688 bytes to 32,384 bytes on my machine. Speedup is very faint, but certain. ----------------------------------------------------------- benchmark results: minimum results in each 3 measurements. Execution time (sec) name before after so_ackermann 0.476 0.464 so_array 0.742 0.728 so_binary_trees 5.493 5.466 so_concatenate 3.619 3.395 so_count_words 0.190 0.184 so_exception 0.249 0.239 so_fannkuch 0.994 0.953 so_fasta 1.369 1.374 so_k_nucleotide 1.111 1.111 so_lists 0.470 0.481 so_mandelbrot 2.059 2.050 so_matrix 0.466 0.465 so_meteor_contest 2.712 2.781 so_nbody 1.154 1.204 so_nested_loop 0.852 0.846 so_nsieve 1.636 1.623 so_nsieve_bits 2.073 2.039 so_object 0.616 0.584 so_partial_sums 1.464 1.481 so_pidigits 1.075 1.082 so_random 0.321 0.317 so_reverse_complement 0.555 0.558 so_sieve 0.495 0.490 so_spectralnorm 1.634 1.627 Speedup ratio: compare with the result of `before' (greater is better) name after so_ackermann 1.025 so_array 1.019 so_binary_trees 1.005 so_concatenate 1.066 so_count_words 1.030 so_exception 1.040 so_fannkuch 1.043 so_fasta 0.996 so_k_nucleotide 1.000 so_lists 0.978 so_mandelbrot 1.004 so_matrix 1.001 so_meteor_contest 0.975 so_nbody 0.959 so_nested_loop 1.007 so_nsieve 1.008 so_nsieve_bits 1.017 so_object 1.056 so_partial_sums 0.989 so_pidigits 0.994 so_random 1.014 so_reverse_complement 0.996 so_sieve 1.010 so_spectralnorm 1.004 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62051 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-26 09:30:58 +03:00
// attr bool handles_frame = true;
{
if (OPT_CHECKED_RUN) {
const VALUE *const bp = vm_base_ptr(reg_cfp);
if (reg_cfp->sp != bp) {
vm_stack_consistency_error(ec, reg_cfp, bp);
}
}
RUBY_VM_CHECK_INTS(ec);
if (vm_pop_frame(ec, GET_CFP(), GET_EP())) {
#if OPT_CALL_THREADED_CODE
rb_ec_thread_ptr(ec)->retval = val;
return 0;
#else
* vm_core.h: remove VM_FRAME_MAGIC_FINISH (finish frame type). Before this commit: `finish frame' was place holder which indicates that VM loop needs to return function. If a C method calls a Ruby methods (a method written by Ruby), then VM loop will be (re-)invoked. When the Ruby method returns, then also VM loop should be escaped. `finish frame' has only one instruction `finish', which returns VM loop function. VM loop function executes `finish' instruction, then VM loop function returns itself. With such mechanism, `leave' instruction (which returns one frame from current scope) doesn't need to check that this `leave' should also return from VM loop function. Strictly, one branch can be removed from `leave' instructon. Consideration: However, pushing the `finish frame' needs costs because it needs several memory accesses. The number of pushing `finish frame' is greater than I had assumed. Of course, pushing `finish frame' consumes additional control frame. Moreover, recent processors has good branch prediction, with which we can ignore such trivial checking. After this commit: Finally, I decide to remove `finish frame' and `finish' instruction. Some parts of VM depend on `finish frame', so the new frame flag VM_FRAME_FLAG_FINISH is introduced. If this frame should escape from VM function loop, then the result of VM_FRAME_TYPE_FINISH_P(cfp) is true. `leave' instruction checks this flag every time. I measured performance on it. However on my environments, it improves some benchmarks and slows some benchmarks down. Maybe it is because of C compiler optimization parameters. I'll re-visit here if this cause problems. * insns.def (leave, finish): remove finish instruction. * vm.c, vm_eval.c, vm_exec.c, vm_backtrace.c, vm_dump.c: apply above changes. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@36099 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2012-06-15 14:22:34 +04:00
return val;
#endif
* vm_core.h: remove VM_FRAME_MAGIC_FINISH (finish frame type). Before this commit: `finish frame' was place holder which indicates that VM loop needs to return function. If a C method calls a Ruby methods (a method written by Ruby), then VM loop will be (re-)invoked. When the Ruby method returns, then also VM loop should be escaped. `finish frame' has only one instruction `finish', which returns VM loop function. VM loop function executes `finish' instruction, then VM loop function returns itself. With such mechanism, `leave' instruction (which returns one frame from current scope) doesn't need to check that this `leave' should also return from VM loop function. Strictly, one branch can be removed from `leave' instructon. Consideration: However, pushing the `finish frame' needs costs because it needs several memory accesses. The number of pushing `finish frame' is greater than I had assumed. Of course, pushing `finish frame' consumes additional control frame. Moreover, recent processors has good branch prediction, with which we can ignore such trivial checking. After this commit: Finally, I decide to remove `finish frame' and `finish' instruction. Some parts of VM depend on `finish frame', so the new frame flag VM_FRAME_FLAG_FINISH is introduced. If this frame should escape from VM function loop, then the result of VM_FRAME_TYPE_FINISH_P(cfp) is true. `leave' instruction checks this flag every time. I measured performance on it. However on my environments, it improves some benchmarks and slows some benchmarks down. Maybe it is because of C compiler optimization parameters. I'll re-visit here if this cause problems. * insns.def (leave, finish): remove finish instruction. * vm.c, vm_eval.c, vm_exec.c, vm_backtrace.c, vm_dump.c: apply above changes. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@36099 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2012-06-15 14:22:34 +04:00
}
else {
RESTORE_REGS();
}
}
/**********************************************************/
/* deal with control flow 3: exception */
/**********************************************************/
/* longjump */
DEFINE_INSN
throw
(rb_num_t throw_state)
(VALUE throwobj)
(VALUE val)
{
RUBY_VM_CHECK_INTS(ec);
val = vm_throw(ec, GET_CFP(), throw_state, throwobj);
THROW_EXCEPTION(val);
/* unreachable */
}
/**********************************************************/
/* deal with control flow 4: local jump */
/**********************************************************/
/* set PC to (PC + dst). */
DEFINE_INSN
jump
(OFFSET dst)
()
()
{
RUBY_VM_CHECK_INTS(ec);
JUMP(dst);
}
/* if val is not false or nil, set PC to (PC + dst). */
DEFINE_INSN
branchif
(OFFSET dst)
(VALUE val)
()
{
if (RTEST(val)) {
RUBY_VM_CHECK_INTS(ec);
JUMP(dst);
}
}
/* if val is false or nil, set PC to (PC + dst). */
DEFINE_INSN
branchunless
(OFFSET dst)
(VALUE val)
()
{
if (!RTEST(val)) {
RUBY_VM_CHECK_INTS(ec);
JUMP(dst);
}
}
/* if val is nil, set PC to (PC + dst). */
DEFINE_INSN
branchnil
(OFFSET dst)
(VALUE val)
()
{
if (NIL_P(val)) {
RUBY_VM_CHECK_INTS(ec);
JUMP(dst);
}
}
/* if val is type, set PC to (PC + dst). */
DEFINE_INSN
branchiftype
(rb_num_t type, OFFSET dst)
(VALUE val)
()
{
if (TYPE(val) == (int)type) {
RUBY_VM_CHECK_INTS(ec);
JUMP(dst);
}
}
/**********************************************************/
/* for optimize */
/**********************************************************/
/* push inline-cached value and go to dst if it is valid */
DEFINE_INSN
getinlinecache
(OFFSET dst, IC ic)
()
(VALUE val)
{
if (vm_ic_hit_p(ic, GET_EP())) {
val = ic->ic_value.value;
JUMP(dst);
}
else {
val = Qnil;
}
}
/* set inline cache */
DEFINE_INSN
setinlinecache
(IC ic)
(VALUE val)
(VALUE val)
{
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
vm_ic_update(ic, val, GET_EP());
}
/* run iseq only once */
DEFINE_INSN
once
Add direct marking on iseq operands Directly marking iseq operands allows us to eliminate the "mark array" stored on ISEQ objects, which will reduce the amount of memory ISEQ objects consume. This patch changes the iseq mark function to: * Directly marks ISEQ operands * Iterate over and mark child ISEQs It also introduces two flags on the ISEQ object. In order to mark instruction operands, we have to disassemble the instructions and find the instruction parameters and types. Instructions may also be translated to jump addresses. Instruction sequences may get marked by the GC *while* they're mid flight (being compiled). The `ISEQ_TRANSLATED` flag is used to indicate whether or not the instructions have been translated to jump addresses so that when we decode the instructions we know whether or not we need to go from jump location back to original instruction or not. Not all ISEQ objects have any markable objects embedded in their instructions. We can detect whether or not an ISEQ has markable objects in the instructions at compile time. If the instructions contain markable objects, we set a flag `ISEQ_MARKABLE_ISEQ` on the ISEQ object. This means that during the mark phase, we can skip decompilation if the flag is *not* set. In other words, we can avoid decompilation of we know in advance there is nothing to mark. `once` instructions have an operand that contains the result of a one-time compilation of a regex. Before this patch, that operand was called an "inline cache", even though the struct was actually an "inline storage". This patch changes the operand to be an "inline storage" so that we can differentiate between caches that need marking (the inline storage) and caches that don't need marking (inline cache). [ruby-core:84909] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62706 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-03-09 23:11:45 +03:00
(ISEQ iseq, ISE ise)
()
(VALUE val)
{
Add direct marking on iseq operands Directly marking iseq operands allows us to eliminate the "mark array" stored on ISEQ objects, which will reduce the amount of memory ISEQ objects consume. This patch changes the iseq mark function to: * Directly marks ISEQ operands * Iterate over and mark child ISEQs It also introduces two flags on the ISEQ object. In order to mark instruction operands, we have to disassemble the instructions and find the instruction parameters and types. Instructions may also be translated to jump addresses. Instruction sequences may get marked by the GC *while* they're mid flight (being compiled). The `ISEQ_TRANSLATED` flag is used to indicate whether or not the instructions have been translated to jump addresses so that when we decode the instructions we know whether or not we need to go from jump location back to original instruction or not. Not all ISEQ objects have any markable objects embedded in their instructions. We can detect whether or not an ISEQ has markable objects in the instructions at compile time. If the instructions contain markable objects, we set a flag `ISEQ_MARKABLE_ISEQ` on the ISEQ object. This means that during the mark phase, we can skip decompilation if the flag is *not* set. In other words, we can avoid decompilation of we know in advance there is nothing to mark. `once` instructions have an operand that contains the result of a one-time compilation of a regex. Before this patch, that operand was called an "inline cache", even though the struct was actually an "inline storage". This patch changes the operand to be an "inline storage" so that we can differentiate between caches that need marking (the inline storage) and caches that don't need marking (inline cache). [ruby-core:84909] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62706 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-03-09 23:11:45 +03:00
val = vm_once_dispatch(ec, iseq, ise);
}
/* case dispatcher, jump by table if possible */
DEFINE_INSN
opt_case_dispatch
(CDHASH hash, OFFSET else_offset)
(..., VALUE key)
()
// attr rb_snum_t sp_inc = -1;
{
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
OFFSET dst = vm_case_dispatch(hash, else_offset, key);
if (dst) {
JUMP(dst);
}
}
/** simple functions */
/* optimized X+Y. */
DEFINE_INSN
opt_plus
(CALL_INFO ci, CALL_CACHE cc)
(VALUE recv, VALUE obj)
(VALUE val)
{
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
val = vm_opt_plus(recv, obj);
if (val == Qundef) {
s/CALL_SIMPLE_METHOD/DISPATCH_ORIGINAL_INSN/ Now that DISPATCH_ORIGINAL_INSN is introduced, we can replace CALL_SIMPLE_METHOD with DISPATCH_ORIGINAL_INSN. These two macros differ in size very much and results in this big difference in compiled binary size. This changeset reduces the size of vm_exec_core from 32,352 bytes to 27,008 bytes on my machine. As a result it yields slightly better performance. Closes [GH-1779]. ----------------------------------------------------------- benchmark results: minimum results in each 3 measurements. Execution time (sec) name before after so_ackermann 0.484 0.454 so_array 0.837 0.779 so_binary_trees 5.928 5.801 so_concatenate 3.473 3.543 so_count_words 0.201 0.222 so_exception 0.255 0.252 so_fannkuch 1.080 1.019 so_fasta 1.459 1.463 so_k_nucleotide 1.218 1.180 so_lists 0.499 0.484 so_mandelbrot 2.189 2.324 so_matrix 0.510 0.496 so_meteor_contest 3.025 2.925 so_nbody 1.319 1.273 so_nested_loop 0.941 0.932 so_nsieve 1.806 1.647 so_nsieve_bits 2.151 2.078 so_object 0.632 0.621 so_partial_sums 1.560 1.632 so_pidigits 1.190 1.183 so_random 0.333 0.353 so_reverse_complement 0.604 0.586 so_sieve 0.521 0.481 so_spectralnorm 1.774 1.722 Speedup ratio: compare with the result of `before' (greater is better) name after so_ackermann 1.065 so_array 1.075 so_binary_trees 1.022 so_concatenate 0.980 so_count_words 0.903 so_exception 1.009 so_fannkuch 1.059 so_fasta 0.997 so_k_nucleotide 1.032 so_lists 1.032 so_mandelbrot 0.942 so_matrix 1.028 so_meteor_contest 1.034 so_nbody 1.036 so_nested_loop 1.009 so_nsieve 1.097 so_nsieve_bits 1.035 so_object 1.018 so_partial_sums 0.956 so_pidigits 1.006 so_random 0.943 so_reverse_complement 1.032 so_sieve 1.083 so_spectralnorm 1.030 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62088 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-29 10:04:50 +03:00
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
}
}
/* optimized X-Y. */
DEFINE_INSN
opt_minus
(CALL_INFO ci, CALL_CACHE cc)
(VALUE recv, VALUE obj)
(VALUE val)
{
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
val = vm_opt_minus(recv, obj);
if (val == Qundef) {
s/CALL_SIMPLE_METHOD/DISPATCH_ORIGINAL_INSN/ Now that DISPATCH_ORIGINAL_INSN is introduced, we can replace CALL_SIMPLE_METHOD with DISPATCH_ORIGINAL_INSN. These two macros differ in size very much and results in this big difference in compiled binary size. This changeset reduces the size of vm_exec_core from 32,352 bytes to 27,008 bytes on my machine. As a result it yields slightly better performance. Closes [GH-1779]. ----------------------------------------------------------- benchmark results: minimum results in each 3 measurements. Execution time (sec) name before after so_ackermann 0.484 0.454 so_array 0.837 0.779 so_binary_trees 5.928 5.801 so_concatenate 3.473 3.543 so_count_words 0.201 0.222 so_exception 0.255 0.252 so_fannkuch 1.080 1.019 so_fasta 1.459 1.463 so_k_nucleotide 1.218 1.180 so_lists 0.499 0.484 so_mandelbrot 2.189 2.324 so_matrix 0.510 0.496 so_meteor_contest 3.025 2.925 so_nbody 1.319 1.273 so_nested_loop 0.941 0.932 so_nsieve 1.806 1.647 so_nsieve_bits 2.151 2.078 so_object 0.632 0.621 so_partial_sums 1.560 1.632 so_pidigits 1.190 1.183 so_random 0.333 0.353 so_reverse_complement 0.604 0.586 so_sieve 0.521 0.481 so_spectralnorm 1.774 1.722 Speedup ratio: compare with the result of `before' (greater is better) name after so_ackermann 1.065 so_array 1.075 so_binary_trees 1.022 so_concatenate 0.980 so_count_words 0.903 so_exception 1.009 so_fannkuch 1.059 so_fasta 0.997 so_k_nucleotide 1.032 so_lists 1.032 so_mandelbrot 0.942 so_matrix 1.028 so_meteor_contest 1.034 so_nbody 1.036 so_nested_loop 1.009 so_nsieve 1.097 so_nsieve_bits 1.035 so_object 1.018 so_partial_sums 0.956 so_pidigits 1.006 so_random 0.943 so_reverse_complement 1.032 so_sieve 1.083 so_spectralnorm 1.030 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62088 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-29 10:04:50 +03:00
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
}
}
/* optimized X*Y. */
DEFINE_INSN
opt_mult
(CALL_INFO ci, CALL_CACHE cc)
(VALUE recv, VALUE obj)
(VALUE val)
{
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
val = vm_opt_mult(recv, obj);
if (val == Qundef) {
s/CALL_SIMPLE_METHOD/DISPATCH_ORIGINAL_INSN/ Now that DISPATCH_ORIGINAL_INSN is introduced, we can replace CALL_SIMPLE_METHOD with DISPATCH_ORIGINAL_INSN. These two macros differ in size very much and results in this big difference in compiled binary size. This changeset reduces the size of vm_exec_core from 32,352 bytes to 27,008 bytes on my machine. As a result it yields slightly better performance. Closes [GH-1779]. ----------------------------------------------------------- benchmark results: minimum results in each 3 measurements. Execution time (sec) name before after so_ackermann 0.484 0.454 so_array 0.837 0.779 so_binary_trees 5.928 5.801 so_concatenate 3.473 3.543 so_count_words 0.201 0.222 so_exception 0.255 0.252 so_fannkuch 1.080 1.019 so_fasta 1.459 1.463 so_k_nucleotide 1.218 1.180 so_lists 0.499 0.484 so_mandelbrot 2.189 2.324 so_matrix 0.510 0.496 so_meteor_contest 3.025 2.925 so_nbody 1.319 1.273 so_nested_loop 0.941 0.932 so_nsieve 1.806 1.647 so_nsieve_bits 2.151 2.078 so_object 0.632 0.621 so_partial_sums 1.560 1.632 so_pidigits 1.190 1.183 so_random 0.333 0.353 so_reverse_complement 0.604 0.586 so_sieve 0.521 0.481 so_spectralnorm 1.774 1.722 Speedup ratio: compare with the result of `before' (greater is better) name after so_ackermann 1.065 so_array 1.075 so_binary_trees 1.022 so_concatenate 0.980 so_count_words 0.903 so_exception 1.009 so_fannkuch 1.059 so_fasta 0.997 so_k_nucleotide 1.032 so_lists 1.032 so_mandelbrot 0.942 so_matrix 1.028 so_meteor_contest 1.034 so_nbody 1.036 so_nested_loop 1.009 so_nsieve 1.097 so_nsieve_bits 1.035 so_object 1.018 so_partial_sums 0.956 so_pidigits 1.006 so_random 0.943 so_reverse_complement 1.032 so_sieve 1.083 so_spectralnorm 1.030 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62088 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-29 10:04:50 +03:00
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
}
}
/* optimized X/Y. */
DEFINE_INSN
opt_div
(CALL_INFO ci, CALL_CACHE cc)
(VALUE recv, VALUE obj)
(VALUE val)
{
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
val = vm_opt_div(recv, obj);
if (val == Qundef) {
s/CALL_SIMPLE_METHOD/DISPATCH_ORIGINAL_INSN/ Now that DISPATCH_ORIGINAL_INSN is introduced, we can replace CALL_SIMPLE_METHOD with DISPATCH_ORIGINAL_INSN. These two macros differ in size very much and results in this big difference in compiled binary size. This changeset reduces the size of vm_exec_core from 32,352 bytes to 27,008 bytes on my machine. As a result it yields slightly better performance. Closes [GH-1779]. ----------------------------------------------------------- benchmark results: minimum results in each 3 measurements. Execution time (sec) name before after so_ackermann 0.484 0.454 so_array 0.837 0.779 so_binary_trees 5.928 5.801 so_concatenate 3.473 3.543 so_count_words 0.201 0.222 so_exception 0.255 0.252 so_fannkuch 1.080 1.019 so_fasta 1.459 1.463 so_k_nucleotide 1.218 1.180 so_lists 0.499 0.484 so_mandelbrot 2.189 2.324 so_matrix 0.510 0.496 so_meteor_contest 3.025 2.925 so_nbody 1.319 1.273 so_nested_loop 0.941 0.932 so_nsieve 1.806 1.647 so_nsieve_bits 2.151 2.078 so_object 0.632 0.621 so_partial_sums 1.560 1.632 so_pidigits 1.190 1.183 so_random 0.333 0.353 so_reverse_complement 0.604 0.586 so_sieve 0.521 0.481 so_spectralnorm 1.774 1.722 Speedup ratio: compare with the result of `before' (greater is better) name after so_ackermann 1.065 so_array 1.075 so_binary_trees 1.022 so_concatenate 0.980 so_count_words 0.903 so_exception 1.009 so_fannkuch 1.059 so_fasta 0.997 so_k_nucleotide 1.032 so_lists 1.032 so_mandelbrot 0.942 so_matrix 1.028 so_meteor_contest 1.034 so_nbody 1.036 so_nested_loop 1.009 so_nsieve 1.097 so_nsieve_bits 1.035 so_object 1.018 so_partial_sums 0.956 so_pidigits 1.006 so_random 0.943 so_reverse_complement 1.032 so_sieve 1.083 so_spectralnorm 1.030 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62088 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-29 10:04:50 +03:00
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
}
}
/* optimized X%Y. */
DEFINE_INSN
opt_mod
(CALL_INFO ci, CALL_CACHE cc)
(VALUE recv, VALUE obj)
(VALUE val)
{
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
val = vm_opt_mod(recv, obj);
if (val == Qundef) {
s/CALL_SIMPLE_METHOD/DISPATCH_ORIGINAL_INSN/ Now that DISPATCH_ORIGINAL_INSN is introduced, we can replace CALL_SIMPLE_METHOD with DISPATCH_ORIGINAL_INSN. These two macros differ in size very much and results in this big difference in compiled binary size. This changeset reduces the size of vm_exec_core from 32,352 bytes to 27,008 bytes on my machine. As a result it yields slightly better performance. Closes [GH-1779]. ----------------------------------------------------------- benchmark results: minimum results in each 3 measurements. Execution time (sec) name before after so_ackermann 0.484 0.454 so_array 0.837 0.779 so_binary_trees 5.928 5.801 so_concatenate 3.473 3.543 so_count_words 0.201 0.222 so_exception 0.255 0.252 so_fannkuch 1.080 1.019 so_fasta 1.459 1.463 so_k_nucleotide 1.218 1.180 so_lists 0.499 0.484 so_mandelbrot 2.189 2.324 so_matrix 0.510 0.496 so_meteor_contest 3.025 2.925 so_nbody 1.319 1.273 so_nested_loop 0.941 0.932 so_nsieve 1.806 1.647 so_nsieve_bits 2.151 2.078 so_object 0.632 0.621 so_partial_sums 1.560 1.632 so_pidigits 1.190 1.183 so_random 0.333 0.353 so_reverse_complement 0.604 0.586 so_sieve 0.521 0.481 so_spectralnorm 1.774 1.722 Speedup ratio: compare with the result of `before' (greater is better) name after so_ackermann 1.065 so_array 1.075 so_binary_trees 1.022 so_concatenate 0.980 so_count_words 0.903 so_exception 1.009 so_fannkuch 1.059 so_fasta 0.997 so_k_nucleotide 1.032 so_lists 1.032 so_mandelbrot 0.942 so_matrix 1.028 so_meteor_contest 1.034 so_nbody 1.036 so_nested_loop 1.009 so_nsieve 1.097 so_nsieve_bits 1.035 so_object 1.018 so_partial_sums 0.956 so_pidigits 1.006 so_random 0.943 so_reverse_complement 1.032 so_sieve 1.083 so_spectralnorm 1.030 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62088 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-29 10:04:50 +03:00
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
}
}
/* optimized X==Y. */
DEFINE_INSN
opt_eq
(CALL_INFO ci, CALL_CACHE cc)
(VALUE recv, VALUE obj)
(VALUE val)
{
val = opt_eq_func(recv, obj, ci, cc);
if (val == Qundef) {
s/CALL_SIMPLE_METHOD/DISPATCH_ORIGINAL_INSN/ Now that DISPATCH_ORIGINAL_INSN is introduced, we can replace CALL_SIMPLE_METHOD with DISPATCH_ORIGINAL_INSN. These two macros differ in size very much and results in this big difference in compiled binary size. This changeset reduces the size of vm_exec_core from 32,352 bytes to 27,008 bytes on my machine. As a result it yields slightly better performance. Closes [GH-1779]. ----------------------------------------------------------- benchmark results: minimum results in each 3 measurements. Execution time (sec) name before after so_ackermann 0.484 0.454 so_array 0.837 0.779 so_binary_trees 5.928 5.801 so_concatenate 3.473 3.543 so_count_words 0.201 0.222 so_exception 0.255 0.252 so_fannkuch 1.080 1.019 so_fasta 1.459 1.463 so_k_nucleotide 1.218 1.180 so_lists 0.499 0.484 so_mandelbrot 2.189 2.324 so_matrix 0.510 0.496 so_meteor_contest 3.025 2.925 so_nbody 1.319 1.273 so_nested_loop 0.941 0.932 so_nsieve 1.806 1.647 so_nsieve_bits 2.151 2.078 so_object 0.632 0.621 so_partial_sums 1.560 1.632 so_pidigits 1.190 1.183 so_random 0.333 0.353 so_reverse_complement 0.604 0.586 so_sieve 0.521 0.481 so_spectralnorm 1.774 1.722 Speedup ratio: compare with the result of `before' (greater is better) name after so_ackermann 1.065 so_array 1.075 so_binary_trees 1.022 so_concatenate 0.980 so_count_words 0.903 so_exception 1.009 so_fannkuch 1.059 so_fasta 0.997 so_k_nucleotide 1.032 so_lists 1.032 so_mandelbrot 0.942 so_matrix 1.028 so_meteor_contest 1.034 so_nbody 1.036 so_nested_loop 1.009 so_nsieve 1.097 so_nsieve_bits 1.035 so_object 1.018 so_partial_sums 0.956 so_pidigits 1.006 so_random 0.943 so_reverse_complement 1.032 so_sieve 1.083 so_spectralnorm 1.030 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62088 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-29 10:04:50 +03:00
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
}
}
/* optimized X!=Y. */
DEFINE_INSN
opt_neq
eliminate CALL_SIMPLE_METHOD Arrange operands of several opt_something insns so that jumps to opt_send_without_block can be applied to them. This makes it possible to eliminate CALL_SIMPLE_METHOD macro at all. Results in binary size of vm_exec_core to change from 27,008 bytes to 26,016 bytes on my machine. [close GH-1779] Note however that PC can point somewhere non-instruction now. ----------------------------------------------------------- benchmark results: minimum results in each 3 measurements. Execution time (sec) name before after so_ackermann 0.450 0.426 so_array 0.789 0.824 so_binary_trees 5.760 5.635 so_concatenate 3.594 3.508 so_count_words 0.211 0.196 so_exception 0.256 0.244 so_fannkuch 1.049 1.044 so_fasta 1.485 1.472 so_k_nucleotide 1.195 1.216 so_lists 0.517 0.513 so_mandelbrot 2.264 2.394 so_matrix 0.501 0.468 so_meteor_contest 2.987 2.912 so_nbody 1.307 1.289 so_nested_loop 0.908 0.925 so_nsieve 1.679 1.614 so_nsieve_bits 2.131 2.092 so_object 0.620 0.625 so_partial_sums 1.623 1.675 so_pidigits 1.135 1.190 so_random 0.357 0.321 so_reverse_complement 0.619 0.583 so_sieve 0.493 0.496 so_spectralnorm 1.749 1.737 Speedup ratio: compare with the result of `before' (greater is better) name after so_ackermann 1.057 so_array 0.958 so_binary_trees 1.022 so_concatenate 1.024 so_count_words 1.077 so_exception 1.049 so_fannkuch 1.004 so_fasta 1.009 so_k_nucleotide 0.983 so_lists 1.007 so_mandelbrot 0.946 so_matrix 1.072 so_meteor_contest 1.026 so_nbody 1.013 so_nested_loop 0.982 so_nsieve 1.040 so_nsieve_bits 1.018 so_object 0.992 so_partial_sums 0.969 so_pidigits 0.954 so_random 1.111 so_reverse_complement 1.062 so_sieve 0.994 so_spectralnorm 1.007 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62089 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-29 10:15:08 +03:00
(CALL_INFO ci_eq, CALL_CACHE cc_eq, CALL_INFO ci, CALL_CACHE cc)
(VALUE recv, VALUE obj)
(VALUE val)
{
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
val = vm_opt_neq(ci, cc, ci_eq, cc_eq, recv, obj);
if (val == Qundef) {
mjit_compile.c: merge initial JIT compiler which has been developed by Takashi Kokubun <takashikkbn@gmail> as YARV-MJIT. Many of its bugs are fixed by wanabe <s.wanabe@gmail.com>. This JIT compiler is designed to be a safe migration path to introduce JIT compiler to MRI. So this commit does not include any bytecode changes or dynamic instruction modifications, which are done in original MJIT. This commit even strips off some aggressive optimizations from YARV-MJIT, and thus it's slower than YARV-MJIT too. But it's still fairly faster than Ruby 2.5 in some benchmarks (attached below). Note that this JIT compiler passes `make test`, `make test-all`, `make test-spec` without JIT, and even with JIT. Not only it's perfectly safe with JIT disabled because it does not replace VM instructions unlike MJIT, but also with JIT enabled it stably runs Ruby applications including Rails applications. I'm expecting this version as just "initial" JIT compiler. I have many optimization ideas which are skipped for initial merging, and you may easily replace this JIT compiler with a faster one by just replacing mjit_compile.c. `mjit_compile` interface is designed for the purpose. common.mk: update dependencies for mjit_compile.c. internal.h: declare `rb_vm_insn_addr2insn` for MJIT. vm.c: exclude some definitions if `-DMJIT_HEADER` is provided to compiler. This avoids to include some functions which take a long time to compile, e.g. vm_exec_core. Some of the purpose is achieved in transform_mjit_header.rb (see `IGNORED_FUNCTIONS`) but others are manually resolved for now. Load mjit_helper.h for MJIT header. mjit_helper.h: New. This is a file used only by JIT-ed code. I'll refactor `mjit_call_cfunc` later. vm_eval.c: add some #ifdef switches to skip compiling some functions like Init_vm_eval. win32/mkexports.rb: export thread/ec functions, which are used by MJIT. include/ruby/defines.h: add MJIT_FUNC_EXPORTED macro alis to clarify that a function is exported only for MJIT. array.c: export a function used by MJIT. bignum.c: ditto. class.c: ditto. compile.c: ditto. error.c: ditto. gc.c: ditto. hash.c: ditto. iseq.c: ditto. numeric.c: ditto. object.c: ditto. proc.c: ditto. re.c: ditto. st.c: ditto. string.c: ditto. thread.c: ditto. variable.c: ditto. vm_backtrace.c: ditto. vm_insnhelper.c: ditto. vm_method.c: ditto. I would like to improve maintainability of function exports, but I believe this way is acceptable as initial merging if we clarify the new exports are for MJIT (so that we can use them as TODO list to fix) and add unit tests to detect unresolved symbols. I'll add unit tests of JIT compilations in succeeding commits. Author: Takashi Kokubun <takashikkbn@gmail.com> Contributor: wanabe <s.wanabe@gmail.com> Part of [Feature #14235] --- * Known issues * Code generated by gcc is faster than clang. The benchmark may be worse in macOS. Following benchmark result is provided by gcc w/ Linux. * Performance is decreased when Google Chrome is running * JIT can work on MinGW, but it doesn't improve performance at least in short running benchmark. * Currently it doesn't perform well with Rails. We'll try to fix this before release. --- * Benchmark reslts Benchmarked with: Intel 4.0GHz i7-4790K with 16GB memory under x86-64 Ubuntu 8 Cores - 2.0.0-p0: Ruby 2.0.0-p0 - r62186: Ruby trunk (early 2.6.0), before MJIT changes - JIT off: On this commit, but without `--jit` option - JIT on: On this commit, and with `--jit` option ** Optcarrot fps Benchmark: https://github.com/mame/optcarrot | |2.0.0-p0 |r62186 |JIT off |JIT on | |:--------|:--------|:--------|:--------|:--------| |fps |37.32 |51.46 |51.31 |58.88 | |vs 2.0.0 |1.00x |1.38x |1.37x |1.58x | ** MJIT benchmarks Benchmark: https://github.com/benchmark-driver/mjit-benchmarks (Original: https://github.com/vnmakarov/ruby/tree/rtl_mjit_branch/MJIT-benchmarks) | |2.0.0-p0 |r62186 |JIT off |JIT on | |:----------|:--------|:--------|:--------|:--------| |aread |1.00 |1.09 |1.07 |2.19 | |aref |1.00 |1.13 |1.11 |2.22 | |aset |1.00 |1.50 |1.45 |2.64 | |awrite |1.00 |1.17 |1.13 |2.20 | |call |1.00 |1.29 |1.26 |2.02 | |const2 |1.00 |1.10 |1.10 |2.19 | |const |1.00 |1.11 |1.10 |2.19 | |fannk |1.00 |1.04 |1.02 |1.00 | |fib |1.00 |1.32 |1.31 |1.84 | |ivread |1.00 |1.13 |1.12 |2.43 | |ivwrite |1.00 |1.23 |1.21 |2.40 | |mandelbrot |1.00 |1.13 |1.16 |1.28 | |meteor |1.00 |2.97 |2.92 |3.17 | |nbody |1.00 |1.17 |1.15 |1.49 | |nest-ntimes|1.00 |1.22 |1.20 |1.39 | |nest-while |1.00 |1.10 |1.10 |1.37 | |norm |1.00 |1.18 |1.16 |1.24 | |nsvb |1.00 |1.16 |1.16 |1.17 | |red-black |1.00 |1.02 |0.99 |1.12 | |sieve |1.00 |1.30 |1.28 |1.62 | |trees |1.00 |1.14 |1.13 |1.19 | |while |1.00 |1.12 |1.11 |2.41 | ** Discourse's script/bench.rb Benchmark: https://github.com/discourse/discourse/blob/v1.8.7/script/bench.rb NOTE: Rails performance was somehow a little degraded with JIT for now. We should fix this. (At least I know opt_aref is performing badly in JIT and I have an idea to fix it. Please wait for the fix.) *** JIT off Your Results: (note for timings- percentile is first, duration is second in millisecs) categories_admin: 50: 17 75: 18 90: 22 99: 29 home_admin: 50: 21 75: 21 90: 27 99: 40 topic_admin: 50: 17 75: 18 90: 22 99: 32 categories: 50: 35 75: 41 90: 43 99: 77 home: 50: 39 75: 46 90: 49 99: 95 topic: 50: 46 75: 52 90: 56 99: 101 *** JIT on Your Results: (note for timings- percentile is first, duration is second in millisecs) categories_admin: 50: 19 75: 21 90: 25 99: 33 home_admin: 50: 24 75: 26 90: 30 99: 35 topic_admin: 50: 19 75: 20 90: 25 99: 30 categories: 50: 40 75: 44 90: 48 99: 76 home: 50: 42 75: 48 90: 51 99: 89 topic: 50: 49 75: 55 90: 58 99: 99 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62197 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-02-04 14:22:28 +03:00
#ifndef MJIT_HEADER
eliminate CALL_SIMPLE_METHOD Arrange operands of several opt_something insns so that jumps to opt_send_without_block can be applied to them. This makes it possible to eliminate CALL_SIMPLE_METHOD macro at all. Results in binary size of vm_exec_core to change from 27,008 bytes to 26,016 bytes on my machine. [close GH-1779] Note however that PC can point somewhere non-instruction now. ----------------------------------------------------------- benchmark results: minimum results in each 3 measurements. Execution time (sec) name before after so_ackermann 0.450 0.426 so_array 0.789 0.824 so_binary_trees 5.760 5.635 so_concatenate 3.594 3.508 so_count_words 0.211 0.196 so_exception 0.256 0.244 so_fannkuch 1.049 1.044 so_fasta 1.485 1.472 so_k_nucleotide 1.195 1.216 so_lists 0.517 0.513 so_mandelbrot 2.264 2.394 so_matrix 0.501 0.468 so_meteor_contest 2.987 2.912 so_nbody 1.307 1.289 so_nested_loop 0.908 0.925 so_nsieve 1.679 1.614 so_nsieve_bits 2.131 2.092 so_object 0.620 0.625 so_partial_sums 1.623 1.675 so_pidigits 1.135 1.190 so_random 0.357 0.321 so_reverse_complement 0.619 0.583 so_sieve 0.493 0.496 so_spectralnorm 1.749 1.737 Speedup ratio: compare with the result of `before' (greater is better) name after so_ackermann 1.057 so_array 0.958 so_binary_trees 1.022 so_concatenate 1.024 so_count_words 1.077 so_exception 1.049 so_fannkuch 1.004 so_fasta 1.009 so_k_nucleotide 0.983 so_lists 1.007 so_mandelbrot 0.946 so_matrix 1.072 so_meteor_contest 1.026 so_nbody 1.013 so_nested_loop 0.982 so_nsieve 1.040 so_nsieve_bits 1.018 so_object 0.992 so_partial_sums 0.969 so_pidigits 0.954 so_random 1.111 so_reverse_complement 1.062 so_sieve 0.994 so_spectralnorm 1.007 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62089 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-29 10:15:08 +03:00
ADD_PC(2); /* !!! */
mjit_compile.c: merge initial JIT compiler which has been developed by Takashi Kokubun <takashikkbn@gmail> as YARV-MJIT. Many of its bugs are fixed by wanabe <s.wanabe@gmail.com>. This JIT compiler is designed to be a safe migration path to introduce JIT compiler to MRI. So this commit does not include any bytecode changes or dynamic instruction modifications, which are done in original MJIT. This commit even strips off some aggressive optimizations from YARV-MJIT, and thus it's slower than YARV-MJIT too. But it's still fairly faster than Ruby 2.5 in some benchmarks (attached below). Note that this JIT compiler passes `make test`, `make test-all`, `make test-spec` without JIT, and even with JIT. Not only it's perfectly safe with JIT disabled because it does not replace VM instructions unlike MJIT, but also with JIT enabled it stably runs Ruby applications including Rails applications. I'm expecting this version as just "initial" JIT compiler. I have many optimization ideas which are skipped for initial merging, and you may easily replace this JIT compiler with a faster one by just replacing mjit_compile.c. `mjit_compile` interface is designed for the purpose. common.mk: update dependencies for mjit_compile.c. internal.h: declare `rb_vm_insn_addr2insn` for MJIT. vm.c: exclude some definitions if `-DMJIT_HEADER` is provided to compiler. This avoids to include some functions which take a long time to compile, e.g. vm_exec_core. Some of the purpose is achieved in transform_mjit_header.rb (see `IGNORED_FUNCTIONS`) but others are manually resolved for now. Load mjit_helper.h for MJIT header. mjit_helper.h: New. This is a file used only by JIT-ed code. I'll refactor `mjit_call_cfunc` later. vm_eval.c: add some #ifdef switches to skip compiling some functions like Init_vm_eval. win32/mkexports.rb: export thread/ec functions, which are used by MJIT. include/ruby/defines.h: add MJIT_FUNC_EXPORTED macro alis to clarify that a function is exported only for MJIT. array.c: export a function used by MJIT. bignum.c: ditto. class.c: ditto. compile.c: ditto. error.c: ditto. gc.c: ditto. hash.c: ditto. iseq.c: ditto. numeric.c: ditto. object.c: ditto. proc.c: ditto. re.c: ditto. st.c: ditto. string.c: ditto. thread.c: ditto. variable.c: ditto. vm_backtrace.c: ditto. vm_insnhelper.c: ditto. vm_method.c: ditto. I would like to improve maintainability of function exports, but I believe this way is acceptable as initial merging if we clarify the new exports are for MJIT (so that we can use them as TODO list to fix) and add unit tests to detect unresolved symbols. I'll add unit tests of JIT compilations in succeeding commits. Author: Takashi Kokubun <takashikkbn@gmail.com> Contributor: wanabe <s.wanabe@gmail.com> Part of [Feature #14235] --- * Known issues * Code generated by gcc is faster than clang. The benchmark may be worse in macOS. Following benchmark result is provided by gcc w/ Linux. * Performance is decreased when Google Chrome is running * JIT can work on MinGW, but it doesn't improve performance at least in short running benchmark. * Currently it doesn't perform well with Rails. We'll try to fix this before release. --- * Benchmark reslts Benchmarked with: Intel 4.0GHz i7-4790K with 16GB memory under x86-64 Ubuntu 8 Cores - 2.0.0-p0: Ruby 2.0.0-p0 - r62186: Ruby trunk (early 2.6.0), before MJIT changes - JIT off: On this commit, but without `--jit` option - JIT on: On this commit, and with `--jit` option ** Optcarrot fps Benchmark: https://github.com/mame/optcarrot | |2.0.0-p0 |r62186 |JIT off |JIT on | |:--------|:--------|:--------|:--------|:--------| |fps |37.32 |51.46 |51.31 |58.88 | |vs 2.0.0 |1.00x |1.38x |1.37x |1.58x | ** MJIT benchmarks Benchmark: https://github.com/benchmark-driver/mjit-benchmarks (Original: https://github.com/vnmakarov/ruby/tree/rtl_mjit_branch/MJIT-benchmarks) | |2.0.0-p0 |r62186 |JIT off |JIT on | |:----------|:--------|:--------|:--------|:--------| |aread |1.00 |1.09 |1.07 |2.19 | |aref |1.00 |1.13 |1.11 |2.22 | |aset |1.00 |1.50 |1.45 |2.64 | |awrite |1.00 |1.17 |1.13 |2.20 | |call |1.00 |1.29 |1.26 |2.02 | |const2 |1.00 |1.10 |1.10 |2.19 | |const |1.00 |1.11 |1.10 |2.19 | |fannk |1.00 |1.04 |1.02 |1.00 | |fib |1.00 |1.32 |1.31 |1.84 | |ivread |1.00 |1.13 |1.12 |2.43 | |ivwrite |1.00 |1.23 |1.21 |2.40 | |mandelbrot |1.00 |1.13 |1.16 |1.28 | |meteor |1.00 |2.97 |2.92 |3.17 | |nbody |1.00 |1.17 |1.15 |1.49 | |nest-ntimes|1.00 |1.22 |1.20 |1.39 | |nest-while |1.00 |1.10 |1.10 |1.37 | |norm |1.00 |1.18 |1.16 |1.24 | |nsvb |1.00 |1.16 |1.16 |1.17 | |red-black |1.00 |1.02 |0.99 |1.12 | |sieve |1.00 |1.30 |1.28 |1.62 | |trees |1.00 |1.14 |1.13 |1.19 | |while |1.00 |1.12 |1.11 |2.41 | ** Discourse's script/bench.rb Benchmark: https://github.com/discourse/discourse/blob/v1.8.7/script/bench.rb NOTE: Rails performance was somehow a little degraded with JIT for now. We should fix this. (At least I know opt_aref is performing badly in JIT and I have an idea to fix it. Please wait for the fix.) *** JIT off Your Results: (note for timings- percentile is first, duration is second in millisecs) categories_admin: 50: 17 75: 18 90: 22 99: 29 home_admin: 50: 21 75: 21 90: 27 99: 40 topic_admin: 50: 17 75: 18 90: 22 99: 32 categories: 50: 35 75: 41 90: 43 99: 77 home: 50: 39 75: 46 90: 49 99: 95 topic: 50: 46 75: 52 90: 56 99: 101 *** JIT on Your Results: (note for timings- percentile is first, duration is second in millisecs) categories_admin: 50: 19 75: 21 90: 25 99: 33 home_admin: 50: 24 75: 26 90: 30 99: 35 topic_admin: 50: 19 75: 20 90: 25 99: 30 categories: 50: 40 75: 44 90: 48 99: 76 home: 50: 42 75: 48 90: 51 99: 89 topic: 50: 49 75: 55 90: 58 99: 99 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62197 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-02-04 14:22:28 +03:00
#endif
eliminate CALL_SIMPLE_METHOD Arrange operands of several opt_something insns so that jumps to opt_send_without_block can be applied to them. This makes it possible to eliminate CALL_SIMPLE_METHOD macro at all. Results in binary size of vm_exec_core to change from 27,008 bytes to 26,016 bytes on my machine. [close GH-1779] Note however that PC can point somewhere non-instruction now. ----------------------------------------------------------- benchmark results: minimum results in each 3 measurements. Execution time (sec) name before after so_ackermann 0.450 0.426 so_array 0.789 0.824 so_binary_trees 5.760 5.635 so_concatenate 3.594 3.508 so_count_words 0.211 0.196 so_exception 0.256 0.244 so_fannkuch 1.049 1.044 so_fasta 1.485 1.472 so_k_nucleotide 1.195 1.216 so_lists 0.517 0.513 so_mandelbrot 2.264 2.394 so_matrix 0.501 0.468 so_meteor_contest 2.987 2.912 so_nbody 1.307 1.289 so_nested_loop 0.908 0.925 so_nsieve 1.679 1.614 so_nsieve_bits 2.131 2.092 so_object 0.620 0.625 so_partial_sums 1.623 1.675 so_pidigits 1.135 1.190 so_random 0.357 0.321 so_reverse_complement 0.619 0.583 so_sieve 0.493 0.496 so_spectralnorm 1.749 1.737 Speedup ratio: compare with the result of `before' (greater is better) name after so_ackermann 1.057 so_array 0.958 so_binary_trees 1.022 so_concatenate 1.024 so_count_words 1.077 so_exception 1.049 so_fannkuch 1.004 so_fasta 1.009 so_k_nucleotide 0.983 so_lists 1.007 so_mandelbrot 0.946 so_matrix 1.072 so_meteor_contest 1.026 so_nbody 1.013 so_nested_loop 0.982 so_nsieve 1.040 so_nsieve_bits 1.018 so_object 0.992 so_partial_sums 0.969 so_pidigits 0.954 so_random 1.111 so_reverse_complement 1.062 so_sieve 0.994 so_spectralnorm 1.007 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62089 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-29 10:15:08 +03:00
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
}
}
/* optimized X<Y. */
DEFINE_INSN
opt_lt
(CALL_INFO ci, CALL_CACHE cc)
(VALUE recv, VALUE obj)
(VALUE val)
{
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
val = vm_opt_lt(recv, obj);
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
if (val == Qundef) {
s/CALL_SIMPLE_METHOD/DISPATCH_ORIGINAL_INSN/ Now that DISPATCH_ORIGINAL_INSN is introduced, we can replace CALL_SIMPLE_METHOD with DISPATCH_ORIGINAL_INSN. These two macros differ in size very much and results in this big difference in compiled binary size. This changeset reduces the size of vm_exec_core from 32,352 bytes to 27,008 bytes on my machine. As a result it yields slightly better performance. Closes [GH-1779]. ----------------------------------------------------------- benchmark results: minimum results in each 3 measurements. Execution time (sec) name before after so_ackermann 0.484 0.454 so_array 0.837 0.779 so_binary_trees 5.928 5.801 so_concatenate 3.473 3.543 so_count_words 0.201 0.222 so_exception 0.255 0.252 so_fannkuch 1.080 1.019 so_fasta 1.459 1.463 so_k_nucleotide 1.218 1.180 so_lists 0.499 0.484 so_mandelbrot 2.189 2.324 so_matrix 0.510 0.496 so_meteor_contest 3.025 2.925 so_nbody 1.319 1.273 so_nested_loop 0.941 0.932 so_nsieve 1.806 1.647 so_nsieve_bits 2.151 2.078 so_object 0.632 0.621 so_partial_sums 1.560 1.632 so_pidigits 1.190 1.183 so_random 0.333 0.353 so_reverse_complement 0.604 0.586 so_sieve 0.521 0.481 so_spectralnorm 1.774 1.722 Speedup ratio: compare with the result of `before' (greater is better) name after so_ackermann 1.065 so_array 1.075 so_binary_trees 1.022 so_concatenate 0.980 so_count_words 0.903 so_exception 1.009 so_fannkuch 1.059 so_fasta 0.997 so_k_nucleotide 1.032 so_lists 1.032 so_mandelbrot 0.942 so_matrix 1.028 so_meteor_contest 1.034 so_nbody 1.036 so_nested_loop 1.009 so_nsieve 1.097 so_nsieve_bits 1.035 so_object 1.018 so_partial_sums 0.956 so_pidigits 1.006 so_random 0.943 so_reverse_complement 1.032 so_sieve 1.083 so_spectralnorm 1.030 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62088 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-29 10:04:50 +03:00
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
}
}
/* optimized X<=Y. */
DEFINE_INSN
opt_le
(CALL_INFO ci, CALL_CACHE cc)
(VALUE recv, VALUE obj)
(VALUE val)
{
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
val = vm_opt_le(recv, obj);
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
if (val == Qundef) {
s/CALL_SIMPLE_METHOD/DISPATCH_ORIGINAL_INSN/ Now that DISPATCH_ORIGINAL_INSN is introduced, we can replace CALL_SIMPLE_METHOD with DISPATCH_ORIGINAL_INSN. These two macros differ in size very much and results in this big difference in compiled binary size. This changeset reduces the size of vm_exec_core from 32,352 bytes to 27,008 bytes on my machine. As a result it yields slightly better performance. Closes [GH-1779]. ----------------------------------------------------------- benchmark results: minimum results in each 3 measurements. Execution time (sec) name before after so_ackermann 0.484 0.454 so_array 0.837 0.779 so_binary_trees 5.928 5.801 so_concatenate 3.473 3.543 so_count_words 0.201 0.222 so_exception 0.255 0.252 so_fannkuch 1.080 1.019 so_fasta 1.459 1.463 so_k_nucleotide 1.218 1.180 so_lists 0.499 0.484 so_mandelbrot 2.189 2.324 so_matrix 0.510 0.496 so_meteor_contest 3.025 2.925 so_nbody 1.319 1.273 so_nested_loop 0.941 0.932 so_nsieve 1.806 1.647 so_nsieve_bits 2.151 2.078 so_object 0.632 0.621 so_partial_sums 1.560 1.632 so_pidigits 1.190 1.183 so_random 0.333 0.353 so_reverse_complement 0.604 0.586 so_sieve 0.521 0.481 so_spectralnorm 1.774 1.722 Speedup ratio: compare with the result of `before' (greater is better) name after so_ackermann 1.065 so_array 1.075 so_binary_trees 1.022 so_concatenate 0.980 so_count_words 0.903 so_exception 1.009 so_fannkuch 1.059 so_fasta 0.997 so_k_nucleotide 1.032 so_lists 1.032 so_mandelbrot 0.942 so_matrix 1.028 so_meteor_contest 1.034 so_nbody 1.036 so_nested_loop 1.009 so_nsieve 1.097 so_nsieve_bits 1.035 so_object 1.018 so_partial_sums 0.956 so_pidigits 1.006 so_random 0.943 so_reverse_complement 1.032 so_sieve 1.083 so_spectralnorm 1.030 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62088 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-29 10:04:50 +03:00
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
}
}
/* optimized X>Y. */
DEFINE_INSN
opt_gt
(CALL_INFO ci, CALL_CACHE cc)
(VALUE recv, VALUE obj)
(VALUE val)
{
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
val = vm_opt_gt(recv, obj);
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
if (val == Qundef) {
s/CALL_SIMPLE_METHOD/DISPATCH_ORIGINAL_INSN/ Now that DISPATCH_ORIGINAL_INSN is introduced, we can replace CALL_SIMPLE_METHOD with DISPATCH_ORIGINAL_INSN. These two macros differ in size very much and results in this big difference in compiled binary size. This changeset reduces the size of vm_exec_core from 32,352 bytes to 27,008 bytes on my machine. As a result it yields slightly better performance. Closes [GH-1779]. ----------------------------------------------------------- benchmark results: minimum results in each 3 measurements. Execution time (sec) name before after so_ackermann 0.484 0.454 so_array 0.837 0.779 so_binary_trees 5.928 5.801 so_concatenate 3.473 3.543 so_count_words 0.201 0.222 so_exception 0.255 0.252 so_fannkuch 1.080 1.019 so_fasta 1.459 1.463 so_k_nucleotide 1.218 1.180 so_lists 0.499 0.484 so_mandelbrot 2.189 2.324 so_matrix 0.510 0.496 so_meteor_contest 3.025 2.925 so_nbody 1.319 1.273 so_nested_loop 0.941 0.932 so_nsieve 1.806 1.647 so_nsieve_bits 2.151 2.078 so_object 0.632 0.621 so_partial_sums 1.560 1.632 so_pidigits 1.190 1.183 so_random 0.333 0.353 so_reverse_complement 0.604 0.586 so_sieve 0.521 0.481 so_spectralnorm 1.774 1.722 Speedup ratio: compare with the result of `before' (greater is better) name after so_ackermann 1.065 so_array 1.075 so_binary_trees 1.022 so_concatenate 0.980 so_count_words 0.903 so_exception 1.009 so_fannkuch 1.059 so_fasta 0.997 so_k_nucleotide 1.032 so_lists 1.032 so_mandelbrot 0.942 so_matrix 1.028 so_meteor_contest 1.034 so_nbody 1.036 so_nested_loop 1.009 so_nsieve 1.097 so_nsieve_bits 1.035 so_object 1.018 so_partial_sums 0.956 so_pidigits 1.006 so_random 0.943 so_reverse_complement 1.032 so_sieve 1.083 so_spectralnorm 1.030 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62088 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-29 10:04:50 +03:00
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
}
}
/* optimized X>=Y. */
DEFINE_INSN
opt_ge
(CALL_INFO ci, CALL_CACHE cc)
(VALUE recv, VALUE obj)
(VALUE val)
{
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
val = vm_opt_ge(recv, obj);
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
if (val == Qundef) {
s/CALL_SIMPLE_METHOD/DISPATCH_ORIGINAL_INSN/ Now that DISPATCH_ORIGINAL_INSN is introduced, we can replace CALL_SIMPLE_METHOD with DISPATCH_ORIGINAL_INSN. These two macros differ in size very much and results in this big difference in compiled binary size. This changeset reduces the size of vm_exec_core from 32,352 bytes to 27,008 bytes on my machine. As a result it yields slightly better performance. Closes [GH-1779]. ----------------------------------------------------------- benchmark results: minimum results in each 3 measurements. Execution time (sec) name before after so_ackermann 0.484 0.454 so_array 0.837 0.779 so_binary_trees 5.928 5.801 so_concatenate 3.473 3.543 so_count_words 0.201 0.222 so_exception 0.255 0.252 so_fannkuch 1.080 1.019 so_fasta 1.459 1.463 so_k_nucleotide 1.218 1.180 so_lists 0.499 0.484 so_mandelbrot 2.189 2.324 so_matrix 0.510 0.496 so_meteor_contest 3.025 2.925 so_nbody 1.319 1.273 so_nested_loop 0.941 0.932 so_nsieve 1.806 1.647 so_nsieve_bits 2.151 2.078 so_object 0.632 0.621 so_partial_sums 1.560 1.632 so_pidigits 1.190 1.183 so_random 0.333 0.353 so_reverse_complement 0.604 0.586 so_sieve 0.521 0.481 so_spectralnorm 1.774 1.722 Speedup ratio: compare with the result of `before' (greater is better) name after so_ackermann 1.065 so_array 1.075 so_binary_trees 1.022 so_concatenate 0.980 so_count_words 0.903 so_exception 1.009 so_fannkuch 1.059 so_fasta 0.997 so_k_nucleotide 1.032 so_lists 1.032 so_mandelbrot 0.942 so_matrix 1.028 so_meteor_contest 1.034 so_nbody 1.036 so_nested_loop 1.009 so_nsieve 1.097 so_nsieve_bits 1.035 so_object 1.018 so_partial_sums 0.956 so_pidigits 1.006 so_random 0.943 so_reverse_complement 1.032 so_sieve 1.083 so_spectralnorm 1.030 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62088 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-29 10:04:50 +03:00
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
}
}
/* << */
DEFINE_INSN
opt_ltlt
(CALL_INFO ci, CALL_CACHE cc)
(VALUE recv, VALUE obj)
(VALUE val)
{
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
val = vm_opt_ltlt(recv, obj);
if (val == Qundef) {
s/CALL_SIMPLE_METHOD/DISPATCH_ORIGINAL_INSN/ Now that DISPATCH_ORIGINAL_INSN is introduced, we can replace CALL_SIMPLE_METHOD with DISPATCH_ORIGINAL_INSN. These two macros differ in size very much and results in this big difference in compiled binary size. This changeset reduces the size of vm_exec_core from 32,352 bytes to 27,008 bytes on my machine. As a result it yields slightly better performance. Closes [GH-1779]. ----------------------------------------------------------- benchmark results: minimum results in each 3 measurements. Execution time (sec) name before after so_ackermann 0.484 0.454 so_array 0.837 0.779 so_binary_trees 5.928 5.801 so_concatenate 3.473 3.543 so_count_words 0.201 0.222 so_exception 0.255 0.252 so_fannkuch 1.080 1.019 so_fasta 1.459 1.463 so_k_nucleotide 1.218 1.180 so_lists 0.499 0.484 so_mandelbrot 2.189 2.324 so_matrix 0.510 0.496 so_meteor_contest 3.025 2.925 so_nbody 1.319 1.273 so_nested_loop 0.941 0.932 so_nsieve 1.806 1.647 so_nsieve_bits 2.151 2.078 so_object 0.632 0.621 so_partial_sums 1.560 1.632 so_pidigits 1.190 1.183 so_random 0.333 0.353 so_reverse_complement 0.604 0.586 so_sieve 0.521 0.481 so_spectralnorm 1.774 1.722 Speedup ratio: compare with the result of `before' (greater is better) name after so_ackermann 1.065 so_array 1.075 so_binary_trees 1.022 so_concatenate 0.980 so_count_words 0.903 so_exception 1.009 so_fannkuch 1.059 so_fasta 0.997 so_k_nucleotide 1.032 so_lists 1.032 so_mandelbrot 0.942 so_matrix 1.028 so_meteor_contest 1.034 so_nbody 1.036 so_nested_loop 1.009 so_nsieve 1.097 so_nsieve_bits 1.035 so_object 1.018 so_partial_sums 0.956 so_pidigits 1.006 so_random 0.943 so_reverse_complement 1.032 so_sieve 1.083 so_spectralnorm 1.030 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62088 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-29 10:04:50 +03:00
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
}
}
/* [] */
DEFINE_INSN
opt_aref
(CALL_INFO ci, CALL_CACHE cc)
(VALUE recv, VALUE obj)
(VALUE val)
{
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
val = vm_opt_aref(recv, obj);
if (val == Qundef) {
s/CALL_SIMPLE_METHOD/DISPATCH_ORIGINAL_INSN/ Now that DISPATCH_ORIGINAL_INSN is introduced, we can replace CALL_SIMPLE_METHOD with DISPATCH_ORIGINAL_INSN. These two macros differ in size very much and results in this big difference in compiled binary size. This changeset reduces the size of vm_exec_core from 32,352 bytes to 27,008 bytes on my machine. As a result it yields slightly better performance. Closes [GH-1779]. ----------------------------------------------------------- benchmark results: minimum results in each 3 measurements. Execution time (sec) name before after so_ackermann 0.484 0.454 so_array 0.837 0.779 so_binary_trees 5.928 5.801 so_concatenate 3.473 3.543 so_count_words 0.201 0.222 so_exception 0.255 0.252 so_fannkuch 1.080 1.019 so_fasta 1.459 1.463 so_k_nucleotide 1.218 1.180 so_lists 0.499 0.484 so_mandelbrot 2.189 2.324 so_matrix 0.510 0.496 so_meteor_contest 3.025 2.925 so_nbody 1.319 1.273 so_nested_loop 0.941 0.932 so_nsieve 1.806 1.647 so_nsieve_bits 2.151 2.078 so_object 0.632 0.621 so_partial_sums 1.560 1.632 so_pidigits 1.190 1.183 so_random 0.333 0.353 so_reverse_complement 0.604 0.586 so_sieve 0.521 0.481 so_spectralnorm 1.774 1.722 Speedup ratio: compare with the result of `before' (greater is better) name after so_ackermann 1.065 so_array 1.075 so_binary_trees 1.022 so_concatenate 0.980 so_count_words 0.903 so_exception 1.009 so_fannkuch 1.059 so_fasta 0.997 so_k_nucleotide 1.032 so_lists 1.032 so_mandelbrot 0.942 so_matrix 1.028 so_meteor_contest 1.034 so_nbody 1.036 so_nested_loop 1.009 so_nsieve 1.097 so_nsieve_bits 1.035 so_object 1.018 so_partial_sums 0.956 so_pidigits 1.006 so_random 0.943 so_reverse_complement 1.032 so_sieve 1.083 so_spectralnorm 1.030 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62088 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-29 10:04:50 +03:00
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
}
}
/* recv[obj] = set */
DEFINE_INSN
opt_aset
(CALL_INFO ci, CALL_CACHE cc)
(VALUE recv, VALUE obj, VALUE set)
(VALUE val)
{
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
val = vm_opt_aset(recv, obj, set);
if (val == Qundef) {
s/CALL_SIMPLE_METHOD/DISPATCH_ORIGINAL_INSN/ Now that DISPATCH_ORIGINAL_INSN is introduced, we can replace CALL_SIMPLE_METHOD with DISPATCH_ORIGINAL_INSN. These two macros differ in size very much and results in this big difference in compiled binary size. This changeset reduces the size of vm_exec_core from 32,352 bytes to 27,008 bytes on my machine. As a result it yields slightly better performance. Closes [GH-1779]. ----------------------------------------------------------- benchmark results: minimum results in each 3 measurements. Execution time (sec) name before after so_ackermann 0.484 0.454 so_array 0.837 0.779 so_binary_trees 5.928 5.801 so_concatenate 3.473 3.543 so_count_words 0.201 0.222 so_exception 0.255 0.252 so_fannkuch 1.080 1.019 so_fasta 1.459 1.463 so_k_nucleotide 1.218 1.180 so_lists 0.499 0.484 so_mandelbrot 2.189 2.324 so_matrix 0.510 0.496 so_meteor_contest 3.025 2.925 so_nbody 1.319 1.273 so_nested_loop 0.941 0.932 so_nsieve 1.806 1.647 so_nsieve_bits 2.151 2.078 so_object 0.632 0.621 so_partial_sums 1.560 1.632 so_pidigits 1.190 1.183 so_random 0.333 0.353 so_reverse_complement 0.604 0.586 so_sieve 0.521 0.481 so_spectralnorm 1.774 1.722 Speedup ratio: compare with the result of `before' (greater is better) name after so_ackermann 1.065 so_array 1.075 so_binary_trees 1.022 so_concatenate 0.980 so_count_words 0.903 so_exception 1.009 so_fannkuch 1.059 so_fasta 0.997 so_k_nucleotide 1.032 so_lists 1.032 so_mandelbrot 0.942 so_matrix 1.028 so_meteor_contest 1.034 so_nbody 1.036 so_nested_loop 1.009 so_nsieve 1.097 so_nsieve_bits 1.035 so_object 1.018 so_partial_sums 0.956 so_pidigits 1.006 so_random 0.943 so_reverse_complement 1.032 so_sieve 1.083 so_spectralnorm 1.030 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62088 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-29 10:04:50 +03:00
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
}
}
/* recv[str] = set */
DEFINE_INSN
opt_aset_with
eliminate CALL_SIMPLE_METHOD Arrange operands of several opt_something insns so that jumps to opt_send_without_block can be applied to them. This makes it possible to eliminate CALL_SIMPLE_METHOD macro at all. Results in binary size of vm_exec_core to change from 27,008 bytes to 26,016 bytes on my machine. [close GH-1779] Note however that PC can point somewhere non-instruction now. ----------------------------------------------------------- benchmark results: minimum results in each 3 measurements. Execution time (sec) name before after so_ackermann 0.450 0.426 so_array 0.789 0.824 so_binary_trees 5.760 5.635 so_concatenate 3.594 3.508 so_count_words 0.211 0.196 so_exception 0.256 0.244 so_fannkuch 1.049 1.044 so_fasta 1.485 1.472 so_k_nucleotide 1.195 1.216 so_lists 0.517 0.513 so_mandelbrot 2.264 2.394 so_matrix 0.501 0.468 so_meteor_contest 2.987 2.912 so_nbody 1.307 1.289 so_nested_loop 0.908 0.925 so_nsieve 1.679 1.614 so_nsieve_bits 2.131 2.092 so_object 0.620 0.625 so_partial_sums 1.623 1.675 so_pidigits 1.135 1.190 so_random 0.357 0.321 so_reverse_complement 0.619 0.583 so_sieve 0.493 0.496 so_spectralnorm 1.749 1.737 Speedup ratio: compare with the result of `before' (greater is better) name after so_ackermann 1.057 so_array 0.958 so_binary_trees 1.022 so_concatenate 1.024 so_count_words 1.077 so_exception 1.049 so_fannkuch 1.004 so_fasta 1.009 so_k_nucleotide 0.983 so_lists 1.007 so_mandelbrot 0.946 so_matrix 1.072 so_meteor_contest 1.026 so_nbody 1.013 so_nested_loop 0.982 so_nsieve 1.040 so_nsieve_bits 1.018 so_object 0.992 so_partial_sums 0.969 so_pidigits 0.954 so_random 1.111 so_reverse_complement 1.062 so_sieve 0.994 so_spectralnorm 1.007 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62089 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-29 10:15:08 +03:00
(VALUE key, CALL_INFO ci, CALL_CACHE cc)
(VALUE recv, VALUE val)
(VALUE val)
{
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
VALUE tmp = vm_opt_aset_with(recv, key, val);
if (tmp != Qundef) {
val = tmp;
}
else {
mjit_compile.c: merge initial JIT compiler which has been developed by Takashi Kokubun <takashikkbn@gmail> as YARV-MJIT. Many of its bugs are fixed by wanabe <s.wanabe@gmail.com>. This JIT compiler is designed to be a safe migration path to introduce JIT compiler to MRI. So this commit does not include any bytecode changes or dynamic instruction modifications, which are done in original MJIT. This commit even strips off some aggressive optimizations from YARV-MJIT, and thus it's slower than YARV-MJIT too. But it's still fairly faster than Ruby 2.5 in some benchmarks (attached below). Note that this JIT compiler passes `make test`, `make test-all`, `make test-spec` without JIT, and even with JIT. Not only it's perfectly safe with JIT disabled because it does not replace VM instructions unlike MJIT, but also with JIT enabled it stably runs Ruby applications including Rails applications. I'm expecting this version as just "initial" JIT compiler. I have many optimization ideas which are skipped for initial merging, and you may easily replace this JIT compiler with a faster one by just replacing mjit_compile.c. `mjit_compile` interface is designed for the purpose. common.mk: update dependencies for mjit_compile.c. internal.h: declare `rb_vm_insn_addr2insn` for MJIT. vm.c: exclude some definitions if `-DMJIT_HEADER` is provided to compiler. This avoids to include some functions which take a long time to compile, e.g. vm_exec_core. Some of the purpose is achieved in transform_mjit_header.rb (see `IGNORED_FUNCTIONS`) but others are manually resolved for now. Load mjit_helper.h for MJIT header. mjit_helper.h: New. This is a file used only by JIT-ed code. I'll refactor `mjit_call_cfunc` later. vm_eval.c: add some #ifdef switches to skip compiling some functions like Init_vm_eval. win32/mkexports.rb: export thread/ec functions, which are used by MJIT. include/ruby/defines.h: add MJIT_FUNC_EXPORTED macro alis to clarify that a function is exported only for MJIT. array.c: export a function used by MJIT. bignum.c: ditto. class.c: ditto. compile.c: ditto. error.c: ditto. gc.c: ditto. hash.c: ditto. iseq.c: ditto. numeric.c: ditto. object.c: ditto. proc.c: ditto. re.c: ditto. st.c: ditto. string.c: ditto. thread.c: ditto. variable.c: ditto. vm_backtrace.c: ditto. vm_insnhelper.c: ditto. vm_method.c: ditto. I would like to improve maintainability of function exports, but I believe this way is acceptable as initial merging if we clarify the new exports are for MJIT (so that we can use them as TODO list to fix) and add unit tests to detect unresolved symbols. I'll add unit tests of JIT compilations in succeeding commits. Author: Takashi Kokubun <takashikkbn@gmail.com> Contributor: wanabe <s.wanabe@gmail.com> Part of [Feature #14235] --- * Known issues * Code generated by gcc is faster than clang. The benchmark may be worse in macOS. Following benchmark result is provided by gcc w/ Linux. * Performance is decreased when Google Chrome is running * JIT can work on MinGW, but it doesn't improve performance at least in short running benchmark. * Currently it doesn't perform well with Rails. We'll try to fix this before release. --- * Benchmark reslts Benchmarked with: Intel 4.0GHz i7-4790K with 16GB memory under x86-64 Ubuntu 8 Cores - 2.0.0-p0: Ruby 2.0.0-p0 - r62186: Ruby trunk (early 2.6.0), before MJIT changes - JIT off: On this commit, but without `--jit` option - JIT on: On this commit, and with `--jit` option ** Optcarrot fps Benchmark: https://github.com/mame/optcarrot | |2.0.0-p0 |r62186 |JIT off |JIT on | |:--------|:--------|:--------|:--------|:--------| |fps |37.32 |51.46 |51.31 |58.88 | |vs 2.0.0 |1.00x |1.38x |1.37x |1.58x | ** MJIT benchmarks Benchmark: https://github.com/benchmark-driver/mjit-benchmarks (Original: https://github.com/vnmakarov/ruby/tree/rtl_mjit_branch/MJIT-benchmarks) | |2.0.0-p0 |r62186 |JIT off |JIT on | |:----------|:--------|:--------|:--------|:--------| |aread |1.00 |1.09 |1.07 |2.19 | |aref |1.00 |1.13 |1.11 |2.22 | |aset |1.00 |1.50 |1.45 |2.64 | |awrite |1.00 |1.17 |1.13 |2.20 | |call |1.00 |1.29 |1.26 |2.02 | |const2 |1.00 |1.10 |1.10 |2.19 | |const |1.00 |1.11 |1.10 |2.19 | |fannk |1.00 |1.04 |1.02 |1.00 | |fib |1.00 |1.32 |1.31 |1.84 | |ivread |1.00 |1.13 |1.12 |2.43 | |ivwrite |1.00 |1.23 |1.21 |2.40 | |mandelbrot |1.00 |1.13 |1.16 |1.28 | |meteor |1.00 |2.97 |2.92 |3.17 | |nbody |1.00 |1.17 |1.15 |1.49 | |nest-ntimes|1.00 |1.22 |1.20 |1.39 | |nest-while |1.00 |1.10 |1.10 |1.37 | |norm |1.00 |1.18 |1.16 |1.24 | |nsvb |1.00 |1.16 |1.16 |1.17 | |red-black |1.00 |1.02 |0.99 |1.12 | |sieve |1.00 |1.30 |1.28 |1.62 | |trees |1.00 |1.14 |1.13 |1.19 | |while |1.00 |1.12 |1.11 |2.41 | ** Discourse's script/bench.rb Benchmark: https://github.com/discourse/discourse/blob/v1.8.7/script/bench.rb NOTE: Rails performance was somehow a little degraded with JIT for now. We should fix this. (At least I know opt_aref is performing badly in JIT and I have an idea to fix it. Please wait for the fix.) *** JIT off Your Results: (note for timings- percentile is first, duration is second in millisecs) categories_admin: 50: 17 75: 18 90: 22 99: 29 home_admin: 50: 21 75: 21 90: 27 99: 40 topic_admin: 50: 17 75: 18 90: 22 99: 32 categories: 50: 35 75: 41 90: 43 99: 77 home: 50: 39 75: 46 90: 49 99: 95 topic: 50: 46 75: 52 90: 56 99: 101 *** JIT on Your Results: (note for timings- percentile is first, duration is second in millisecs) categories_admin: 50: 19 75: 21 90: 25 99: 33 home_admin: 50: 24 75: 26 90: 30 99: 35 topic_admin: 50: 19 75: 20 90: 25 99: 30 categories: 50: 40 75: 44 90: 48 99: 76 home: 50: 42 75: 48 90: 51 99: 89 topic: 50: 49 75: 55 90: 58 99: 99 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62197 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-02-04 14:22:28 +03:00
#ifndef MJIT_HEADER
eliminate CALL_SIMPLE_METHOD Arrange operands of several opt_something insns so that jumps to opt_send_without_block can be applied to them. This makes it possible to eliminate CALL_SIMPLE_METHOD macro at all. Results in binary size of vm_exec_core to change from 27,008 bytes to 26,016 bytes on my machine. [close GH-1779] Note however that PC can point somewhere non-instruction now. ----------------------------------------------------------- benchmark results: minimum results in each 3 measurements. Execution time (sec) name before after so_ackermann 0.450 0.426 so_array 0.789 0.824 so_binary_trees 5.760 5.635 so_concatenate 3.594 3.508 so_count_words 0.211 0.196 so_exception 0.256 0.244 so_fannkuch 1.049 1.044 so_fasta 1.485 1.472 so_k_nucleotide 1.195 1.216 so_lists 0.517 0.513 so_mandelbrot 2.264 2.394 so_matrix 0.501 0.468 so_meteor_contest 2.987 2.912 so_nbody 1.307 1.289 so_nested_loop 0.908 0.925 so_nsieve 1.679 1.614 so_nsieve_bits 2.131 2.092 so_object 0.620 0.625 so_partial_sums 1.623 1.675 so_pidigits 1.135 1.190 so_random 0.357 0.321 so_reverse_complement 0.619 0.583 so_sieve 0.493 0.496 so_spectralnorm 1.749 1.737 Speedup ratio: compare with the result of `before' (greater is better) name after so_ackermann 1.057 so_array 0.958 so_binary_trees 1.022 so_concatenate 1.024 so_count_words 1.077 so_exception 1.049 so_fannkuch 1.004 so_fasta 1.009 so_k_nucleotide 0.983 so_lists 1.007 so_mandelbrot 0.946 so_matrix 1.072 so_meteor_contest 1.026 so_nbody 1.013 so_nested_loop 0.982 so_nsieve 1.040 so_nsieve_bits 1.018 so_object 0.992 so_partial_sums 0.969 so_pidigits 0.954 so_random 1.111 so_reverse_complement 1.062 so_sieve 0.994 so_spectralnorm 1.007 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62089 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-29 10:15:08 +03:00
TOPN(0) = rb_str_resurrect(key);
PUSH(val);
eliminate CALL_SIMPLE_METHOD Arrange operands of several opt_something insns so that jumps to opt_send_without_block can be applied to them. This makes it possible to eliminate CALL_SIMPLE_METHOD macro at all. Results in binary size of vm_exec_core to change from 27,008 bytes to 26,016 bytes on my machine. [close GH-1779] Note however that PC can point somewhere non-instruction now. ----------------------------------------------------------- benchmark results: minimum results in each 3 measurements. Execution time (sec) name before after so_ackermann 0.450 0.426 so_array 0.789 0.824 so_binary_trees 5.760 5.635 so_concatenate 3.594 3.508 so_count_words 0.211 0.196 so_exception 0.256 0.244 so_fannkuch 1.049 1.044 so_fasta 1.485 1.472 so_k_nucleotide 1.195 1.216 so_lists 0.517 0.513 so_mandelbrot 2.264 2.394 so_matrix 0.501 0.468 so_meteor_contest 2.987 2.912 so_nbody 1.307 1.289 so_nested_loop 0.908 0.925 so_nsieve 1.679 1.614 so_nsieve_bits 2.131 2.092 so_object 0.620 0.625 so_partial_sums 1.623 1.675 so_pidigits 1.135 1.190 so_random 0.357 0.321 so_reverse_complement 0.619 0.583 so_sieve 0.493 0.496 so_spectralnorm 1.749 1.737 Speedup ratio: compare with the result of `before' (greater is better) name after so_ackermann 1.057 so_array 0.958 so_binary_trees 1.022 so_concatenate 1.024 so_count_words 1.077 so_exception 1.049 so_fannkuch 1.004 so_fasta 1.009 so_k_nucleotide 0.983 so_lists 1.007 so_mandelbrot 0.946 so_matrix 1.072 so_meteor_contest 1.026 so_nbody 1.013 so_nested_loop 0.982 so_nsieve 1.040 so_nsieve_bits 1.018 so_object 0.992 so_partial_sums 0.969 so_pidigits 0.954 so_random 1.111 so_reverse_complement 1.062 so_sieve 0.994 so_spectralnorm 1.007 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62089 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-29 10:15:08 +03:00
ADD_PC(1); /* !!! */
mjit_compile.c: merge initial JIT compiler which has been developed by Takashi Kokubun <takashikkbn@gmail> as YARV-MJIT. Many of its bugs are fixed by wanabe <s.wanabe@gmail.com>. This JIT compiler is designed to be a safe migration path to introduce JIT compiler to MRI. So this commit does not include any bytecode changes or dynamic instruction modifications, which are done in original MJIT. This commit even strips off some aggressive optimizations from YARV-MJIT, and thus it's slower than YARV-MJIT too. But it's still fairly faster than Ruby 2.5 in some benchmarks (attached below). Note that this JIT compiler passes `make test`, `make test-all`, `make test-spec` without JIT, and even with JIT. Not only it's perfectly safe with JIT disabled because it does not replace VM instructions unlike MJIT, but also with JIT enabled it stably runs Ruby applications including Rails applications. I'm expecting this version as just "initial" JIT compiler. I have many optimization ideas which are skipped for initial merging, and you may easily replace this JIT compiler with a faster one by just replacing mjit_compile.c. `mjit_compile` interface is designed for the purpose. common.mk: update dependencies for mjit_compile.c. internal.h: declare `rb_vm_insn_addr2insn` for MJIT. vm.c: exclude some definitions if `-DMJIT_HEADER` is provided to compiler. This avoids to include some functions which take a long time to compile, e.g. vm_exec_core. Some of the purpose is achieved in transform_mjit_header.rb (see `IGNORED_FUNCTIONS`) but others are manually resolved for now. Load mjit_helper.h for MJIT header. mjit_helper.h: New. This is a file used only by JIT-ed code. I'll refactor `mjit_call_cfunc` later. vm_eval.c: add some #ifdef switches to skip compiling some functions like Init_vm_eval. win32/mkexports.rb: export thread/ec functions, which are used by MJIT. include/ruby/defines.h: add MJIT_FUNC_EXPORTED macro alis to clarify that a function is exported only for MJIT. array.c: export a function used by MJIT. bignum.c: ditto. class.c: ditto. compile.c: ditto. error.c: ditto. gc.c: ditto. hash.c: ditto. iseq.c: ditto. numeric.c: ditto. object.c: ditto. proc.c: ditto. re.c: ditto. st.c: ditto. string.c: ditto. thread.c: ditto. variable.c: ditto. vm_backtrace.c: ditto. vm_insnhelper.c: ditto. vm_method.c: ditto. I would like to improve maintainability of function exports, but I believe this way is acceptable as initial merging if we clarify the new exports are for MJIT (so that we can use them as TODO list to fix) and add unit tests to detect unresolved symbols. I'll add unit tests of JIT compilations in succeeding commits. Author: Takashi Kokubun <takashikkbn@gmail.com> Contributor: wanabe <s.wanabe@gmail.com> Part of [Feature #14235] --- * Known issues * Code generated by gcc is faster than clang. The benchmark may be worse in macOS. Following benchmark result is provided by gcc w/ Linux. * Performance is decreased when Google Chrome is running * JIT can work on MinGW, but it doesn't improve performance at least in short running benchmark. * Currently it doesn't perform well with Rails. We'll try to fix this before release. --- * Benchmark reslts Benchmarked with: Intel 4.0GHz i7-4790K with 16GB memory under x86-64 Ubuntu 8 Cores - 2.0.0-p0: Ruby 2.0.0-p0 - r62186: Ruby trunk (early 2.6.0), before MJIT changes - JIT off: On this commit, but without `--jit` option - JIT on: On this commit, and with `--jit` option ** Optcarrot fps Benchmark: https://github.com/mame/optcarrot | |2.0.0-p0 |r62186 |JIT off |JIT on | |:--------|:--------|:--------|:--------|:--------| |fps |37.32 |51.46 |51.31 |58.88 | |vs 2.0.0 |1.00x |1.38x |1.37x |1.58x | ** MJIT benchmarks Benchmark: https://github.com/benchmark-driver/mjit-benchmarks (Original: https://github.com/vnmakarov/ruby/tree/rtl_mjit_branch/MJIT-benchmarks) | |2.0.0-p0 |r62186 |JIT off |JIT on | |:----------|:--------|:--------|:--------|:--------| |aread |1.00 |1.09 |1.07 |2.19 | |aref |1.00 |1.13 |1.11 |2.22 | |aset |1.00 |1.50 |1.45 |2.64 | |awrite |1.00 |1.17 |1.13 |2.20 | |call |1.00 |1.29 |1.26 |2.02 | |const2 |1.00 |1.10 |1.10 |2.19 | |const |1.00 |1.11 |1.10 |2.19 | |fannk |1.00 |1.04 |1.02 |1.00 | |fib |1.00 |1.32 |1.31 |1.84 | |ivread |1.00 |1.13 |1.12 |2.43 | |ivwrite |1.00 |1.23 |1.21 |2.40 | |mandelbrot |1.00 |1.13 |1.16 |1.28 | |meteor |1.00 |2.97 |2.92 |3.17 | |nbody |1.00 |1.17 |1.15 |1.49 | |nest-ntimes|1.00 |1.22 |1.20 |1.39 | |nest-while |1.00 |1.10 |1.10 |1.37 | |norm |1.00 |1.18 |1.16 |1.24 | |nsvb |1.00 |1.16 |1.16 |1.17 | |red-black |1.00 |1.02 |0.99 |1.12 | |sieve |1.00 |1.30 |1.28 |1.62 | |trees |1.00 |1.14 |1.13 |1.19 | |while |1.00 |1.12 |1.11 |2.41 | ** Discourse's script/bench.rb Benchmark: https://github.com/discourse/discourse/blob/v1.8.7/script/bench.rb NOTE: Rails performance was somehow a little degraded with JIT for now. We should fix this. (At least I know opt_aref is performing badly in JIT and I have an idea to fix it. Please wait for the fix.) *** JIT off Your Results: (note for timings- percentile is first, duration is second in millisecs) categories_admin: 50: 17 75: 18 90: 22 99: 29 home_admin: 50: 21 75: 21 90: 27 99: 40 topic_admin: 50: 17 75: 18 90: 22 99: 32 categories: 50: 35 75: 41 90: 43 99: 77 home: 50: 39 75: 46 90: 49 99: 95 topic: 50: 46 75: 52 90: 56 99: 101 *** JIT on Your Results: (note for timings- percentile is first, duration is second in millisecs) categories_admin: 50: 19 75: 21 90: 25 99: 33 home_admin: 50: 24 75: 26 90: 30 99: 35 topic_admin: 50: 19 75: 20 90: 25 99: 30 categories: 50: 40 75: 44 90: 48 99: 76 home: 50: 42 75: 48 90: 51 99: 89 topic: 50: 49 75: 55 90: 58 99: 99 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62197 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-02-04 14:22:28 +03:00
#endif
eliminate CALL_SIMPLE_METHOD Arrange operands of several opt_something insns so that jumps to opt_send_without_block can be applied to them. This makes it possible to eliminate CALL_SIMPLE_METHOD macro at all. Results in binary size of vm_exec_core to change from 27,008 bytes to 26,016 bytes on my machine. [close GH-1779] Note however that PC can point somewhere non-instruction now. ----------------------------------------------------------- benchmark results: minimum results in each 3 measurements. Execution time (sec) name before after so_ackermann 0.450 0.426 so_array 0.789 0.824 so_binary_trees 5.760 5.635 so_concatenate 3.594 3.508 so_count_words 0.211 0.196 so_exception 0.256 0.244 so_fannkuch 1.049 1.044 so_fasta 1.485 1.472 so_k_nucleotide 1.195 1.216 so_lists 0.517 0.513 so_mandelbrot 2.264 2.394 so_matrix 0.501 0.468 so_meteor_contest 2.987 2.912 so_nbody 1.307 1.289 so_nested_loop 0.908 0.925 so_nsieve 1.679 1.614 so_nsieve_bits 2.131 2.092 so_object 0.620 0.625 so_partial_sums 1.623 1.675 so_pidigits 1.135 1.190 so_random 0.357 0.321 so_reverse_complement 0.619 0.583 so_sieve 0.493 0.496 so_spectralnorm 1.749 1.737 Speedup ratio: compare with the result of `before' (greater is better) name after so_ackermann 1.057 so_array 0.958 so_binary_trees 1.022 so_concatenate 1.024 so_count_words 1.077 so_exception 1.049 so_fannkuch 1.004 so_fasta 1.009 so_k_nucleotide 0.983 so_lists 1.007 so_mandelbrot 0.946 so_matrix 1.072 so_meteor_contest 1.026 so_nbody 1.013 so_nested_loop 0.982 so_nsieve 1.040 so_nsieve_bits 1.018 so_object 0.992 so_partial_sums 0.969 so_pidigits 0.954 so_random 1.111 so_reverse_complement 1.062 so_sieve 0.994 so_spectralnorm 1.007 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62089 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-29 10:15:08 +03:00
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
}
}
/* recv[str] */
DEFINE_INSN
opt_aref_with
eliminate CALL_SIMPLE_METHOD Arrange operands of several opt_something insns so that jumps to opt_send_without_block can be applied to them. This makes it possible to eliminate CALL_SIMPLE_METHOD macro at all. Results in binary size of vm_exec_core to change from 27,008 bytes to 26,016 bytes on my machine. [close GH-1779] Note however that PC can point somewhere non-instruction now. ----------------------------------------------------------- benchmark results: minimum results in each 3 measurements. Execution time (sec) name before after so_ackermann 0.450 0.426 so_array 0.789 0.824 so_binary_trees 5.760 5.635 so_concatenate 3.594 3.508 so_count_words 0.211 0.196 so_exception 0.256 0.244 so_fannkuch 1.049 1.044 so_fasta 1.485 1.472 so_k_nucleotide 1.195 1.216 so_lists 0.517 0.513 so_mandelbrot 2.264 2.394 so_matrix 0.501 0.468 so_meteor_contest 2.987 2.912 so_nbody 1.307 1.289 so_nested_loop 0.908 0.925 so_nsieve 1.679 1.614 so_nsieve_bits 2.131 2.092 so_object 0.620 0.625 so_partial_sums 1.623 1.675 so_pidigits 1.135 1.190 so_random 0.357 0.321 so_reverse_complement 0.619 0.583 so_sieve 0.493 0.496 so_spectralnorm 1.749 1.737 Speedup ratio: compare with the result of `before' (greater is better) name after so_ackermann 1.057 so_array 0.958 so_binary_trees 1.022 so_concatenate 1.024 so_count_words 1.077 so_exception 1.049 so_fannkuch 1.004 so_fasta 1.009 so_k_nucleotide 0.983 so_lists 1.007 so_mandelbrot 0.946 so_matrix 1.072 so_meteor_contest 1.026 so_nbody 1.013 so_nested_loop 0.982 so_nsieve 1.040 so_nsieve_bits 1.018 so_object 0.992 so_partial_sums 0.969 so_pidigits 0.954 so_random 1.111 so_reverse_complement 1.062 so_sieve 0.994 so_spectralnorm 1.007 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62089 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-29 10:15:08 +03:00
(VALUE key, CALL_INFO ci, CALL_CACHE cc)
(VALUE recv)
(VALUE val)
{
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
val = vm_opt_aref_with(recv, key);
if (val == Qundef) {
mjit_compile.c: merge initial JIT compiler which has been developed by Takashi Kokubun <takashikkbn@gmail> as YARV-MJIT. Many of its bugs are fixed by wanabe <s.wanabe@gmail.com>. This JIT compiler is designed to be a safe migration path to introduce JIT compiler to MRI. So this commit does not include any bytecode changes or dynamic instruction modifications, which are done in original MJIT. This commit even strips off some aggressive optimizations from YARV-MJIT, and thus it's slower than YARV-MJIT too. But it's still fairly faster than Ruby 2.5 in some benchmarks (attached below). Note that this JIT compiler passes `make test`, `make test-all`, `make test-spec` without JIT, and even with JIT. Not only it's perfectly safe with JIT disabled because it does not replace VM instructions unlike MJIT, but also with JIT enabled it stably runs Ruby applications including Rails applications. I'm expecting this version as just "initial" JIT compiler. I have many optimization ideas which are skipped for initial merging, and you may easily replace this JIT compiler with a faster one by just replacing mjit_compile.c. `mjit_compile` interface is designed for the purpose. common.mk: update dependencies for mjit_compile.c. internal.h: declare `rb_vm_insn_addr2insn` for MJIT. vm.c: exclude some definitions if `-DMJIT_HEADER` is provided to compiler. This avoids to include some functions which take a long time to compile, e.g. vm_exec_core. Some of the purpose is achieved in transform_mjit_header.rb (see `IGNORED_FUNCTIONS`) but others are manually resolved for now. Load mjit_helper.h for MJIT header. mjit_helper.h: New. This is a file used only by JIT-ed code. I'll refactor `mjit_call_cfunc` later. vm_eval.c: add some #ifdef switches to skip compiling some functions like Init_vm_eval. win32/mkexports.rb: export thread/ec functions, which are used by MJIT. include/ruby/defines.h: add MJIT_FUNC_EXPORTED macro alis to clarify that a function is exported only for MJIT. array.c: export a function used by MJIT. bignum.c: ditto. class.c: ditto. compile.c: ditto. error.c: ditto. gc.c: ditto. hash.c: ditto. iseq.c: ditto. numeric.c: ditto. object.c: ditto. proc.c: ditto. re.c: ditto. st.c: ditto. string.c: ditto. thread.c: ditto. variable.c: ditto. vm_backtrace.c: ditto. vm_insnhelper.c: ditto. vm_method.c: ditto. I would like to improve maintainability of function exports, but I believe this way is acceptable as initial merging if we clarify the new exports are for MJIT (so that we can use them as TODO list to fix) and add unit tests to detect unresolved symbols. I'll add unit tests of JIT compilations in succeeding commits. Author: Takashi Kokubun <takashikkbn@gmail.com> Contributor: wanabe <s.wanabe@gmail.com> Part of [Feature #14235] --- * Known issues * Code generated by gcc is faster than clang. The benchmark may be worse in macOS. Following benchmark result is provided by gcc w/ Linux. * Performance is decreased when Google Chrome is running * JIT can work on MinGW, but it doesn't improve performance at least in short running benchmark. * Currently it doesn't perform well with Rails. We'll try to fix this before release. --- * Benchmark reslts Benchmarked with: Intel 4.0GHz i7-4790K with 16GB memory under x86-64 Ubuntu 8 Cores - 2.0.0-p0: Ruby 2.0.0-p0 - r62186: Ruby trunk (early 2.6.0), before MJIT changes - JIT off: On this commit, but without `--jit` option - JIT on: On this commit, and with `--jit` option ** Optcarrot fps Benchmark: https://github.com/mame/optcarrot | |2.0.0-p0 |r62186 |JIT off |JIT on | |:--------|:--------|:--------|:--------|:--------| |fps |37.32 |51.46 |51.31 |58.88 | |vs 2.0.0 |1.00x |1.38x |1.37x |1.58x | ** MJIT benchmarks Benchmark: https://github.com/benchmark-driver/mjit-benchmarks (Original: https://github.com/vnmakarov/ruby/tree/rtl_mjit_branch/MJIT-benchmarks) | |2.0.0-p0 |r62186 |JIT off |JIT on | |:----------|:--------|:--------|:--------|:--------| |aread |1.00 |1.09 |1.07 |2.19 | |aref |1.00 |1.13 |1.11 |2.22 | |aset |1.00 |1.50 |1.45 |2.64 | |awrite |1.00 |1.17 |1.13 |2.20 | |call |1.00 |1.29 |1.26 |2.02 | |const2 |1.00 |1.10 |1.10 |2.19 | |const |1.00 |1.11 |1.10 |2.19 | |fannk |1.00 |1.04 |1.02 |1.00 | |fib |1.00 |1.32 |1.31 |1.84 | |ivread |1.00 |1.13 |1.12 |2.43 | |ivwrite |1.00 |1.23 |1.21 |2.40 | |mandelbrot |1.00 |1.13 |1.16 |1.28 | |meteor |1.00 |2.97 |2.92 |3.17 | |nbody |1.00 |1.17 |1.15 |1.49 | |nest-ntimes|1.00 |1.22 |1.20 |1.39 | |nest-while |1.00 |1.10 |1.10 |1.37 | |norm |1.00 |1.18 |1.16 |1.24 | |nsvb |1.00 |1.16 |1.16 |1.17 | |red-black |1.00 |1.02 |0.99 |1.12 | |sieve |1.00 |1.30 |1.28 |1.62 | |trees |1.00 |1.14 |1.13 |1.19 | |while |1.00 |1.12 |1.11 |2.41 | ** Discourse's script/bench.rb Benchmark: https://github.com/discourse/discourse/blob/v1.8.7/script/bench.rb NOTE: Rails performance was somehow a little degraded with JIT for now. We should fix this. (At least I know opt_aref is performing badly in JIT and I have an idea to fix it. Please wait for the fix.) *** JIT off Your Results: (note for timings- percentile is first, duration is second in millisecs) categories_admin: 50: 17 75: 18 90: 22 99: 29 home_admin: 50: 21 75: 21 90: 27 99: 40 topic_admin: 50: 17 75: 18 90: 22 99: 32 categories: 50: 35 75: 41 90: 43 99: 77 home: 50: 39 75: 46 90: 49 99: 95 topic: 50: 46 75: 52 90: 56 99: 101 *** JIT on Your Results: (note for timings- percentile is first, duration is second in millisecs) categories_admin: 50: 19 75: 21 90: 25 99: 33 home_admin: 50: 24 75: 26 90: 30 99: 35 topic_admin: 50: 19 75: 20 90: 25 99: 30 categories: 50: 40 75: 44 90: 48 99: 76 home: 50: 42 75: 48 90: 51 99: 89 topic: 50: 49 75: 55 90: 58 99: 99 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62197 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-02-04 14:22:28 +03:00
#ifndef MJIT_HEADER
PUSH(rb_str_resurrect(key));
eliminate CALL_SIMPLE_METHOD Arrange operands of several opt_something insns so that jumps to opt_send_without_block can be applied to them. This makes it possible to eliminate CALL_SIMPLE_METHOD macro at all. Results in binary size of vm_exec_core to change from 27,008 bytes to 26,016 bytes on my machine. [close GH-1779] Note however that PC can point somewhere non-instruction now. ----------------------------------------------------------- benchmark results: minimum results in each 3 measurements. Execution time (sec) name before after so_ackermann 0.450 0.426 so_array 0.789 0.824 so_binary_trees 5.760 5.635 so_concatenate 3.594 3.508 so_count_words 0.211 0.196 so_exception 0.256 0.244 so_fannkuch 1.049 1.044 so_fasta 1.485 1.472 so_k_nucleotide 1.195 1.216 so_lists 0.517 0.513 so_mandelbrot 2.264 2.394 so_matrix 0.501 0.468 so_meteor_contest 2.987 2.912 so_nbody 1.307 1.289 so_nested_loop 0.908 0.925 so_nsieve 1.679 1.614 so_nsieve_bits 2.131 2.092 so_object 0.620 0.625 so_partial_sums 1.623 1.675 so_pidigits 1.135 1.190 so_random 0.357 0.321 so_reverse_complement 0.619 0.583 so_sieve 0.493 0.496 so_spectralnorm 1.749 1.737 Speedup ratio: compare with the result of `before' (greater is better) name after so_ackermann 1.057 so_array 0.958 so_binary_trees 1.022 so_concatenate 1.024 so_count_words 1.077 so_exception 1.049 so_fannkuch 1.004 so_fasta 1.009 so_k_nucleotide 0.983 so_lists 1.007 so_mandelbrot 0.946 so_matrix 1.072 so_meteor_contest 1.026 so_nbody 1.013 so_nested_loop 0.982 so_nsieve 1.040 so_nsieve_bits 1.018 so_object 0.992 so_partial_sums 0.969 so_pidigits 0.954 so_random 1.111 so_reverse_complement 1.062 so_sieve 0.994 so_spectralnorm 1.007 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62089 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-29 10:15:08 +03:00
ADD_PC(1); /* !!! */
mjit_compile.c: merge initial JIT compiler which has been developed by Takashi Kokubun <takashikkbn@gmail> as YARV-MJIT. Many of its bugs are fixed by wanabe <s.wanabe@gmail.com>. This JIT compiler is designed to be a safe migration path to introduce JIT compiler to MRI. So this commit does not include any bytecode changes or dynamic instruction modifications, which are done in original MJIT. This commit even strips off some aggressive optimizations from YARV-MJIT, and thus it's slower than YARV-MJIT too. But it's still fairly faster than Ruby 2.5 in some benchmarks (attached below). Note that this JIT compiler passes `make test`, `make test-all`, `make test-spec` without JIT, and even with JIT. Not only it's perfectly safe with JIT disabled because it does not replace VM instructions unlike MJIT, but also with JIT enabled it stably runs Ruby applications including Rails applications. I'm expecting this version as just "initial" JIT compiler. I have many optimization ideas which are skipped for initial merging, and you may easily replace this JIT compiler with a faster one by just replacing mjit_compile.c. `mjit_compile` interface is designed for the purpose. common.mk: update dependencies for mjit_compile.c. internal.h: declare `rb_vm_insn_addr2insn` for MJIT. vm.c: exclude some definitions if `-DMJIT_HEADER` is provided to compiler. This avoids to include some functions which take a long time to compile, e.g. vm_exec_core. Some of the purpose is achieved in transform_mjit_header.rb (see `IGNORED_FUNCTIONS`) but others are manually resolved for now. Load mjit_helper.h for MJIT header. mjit_helper.h: New. This is a file used only by JIT-ed code. I'll refactor `mjit_call_cfunc` later. vm_eval.c: add some #ifdef switches to skip compiling some functions like Init_vm_eval. win32/mkexports.rb: export thread/ec functions, which are used by MJIT. include/ruby/defines.h: add MJIT_FUNC_EXPORTED macro alis to clarify that a function is exported only for MJIT. array.c: export a function used by MJIT. bignum.c: ditto. class.c: ditto. compile.c: ditto. error.c: ditto. gc.c: ditto. hash.c: ditto. iseq.c: ditto. numeric.c: ditto. object.c: ditto. proc.c: ditto. re.c: ditto. st.c: ditto. string.c: ditto. thread.c: ditto. variable.c: ditto. vm_backtrace.c: ditto. vm_insnhelper.c: ditto. vm_method.c: ditto. I would like to improve maintainability of function exports, but I believe this way is acceptable as initial merging if we clarify the new exports are for MJIT (so that we can use them as TODO list to fix) and add unit tests to detect unresolved symbols. I'll add unit tests of JIT compilations in succeeding commits. Author: Takashi Kokubun <takashikkbn@gmail.com> Contributor: wanabe <s.wanabe@gmail.com> Part of [Feature #14235] --- * Known issues * Code generated by gcc is faster than clang. The benchmark may be worse in macOS. Following benchmark result is provided by gcc w/ Linux. * Performance is decreased when Google Chrome is running * JIT can work on MinGW, but it doesn't improve performance at least in short running benchmark. * Currently it doesn't perform well with Rails. We'll try to fix this before release. --- * Benchmark reslts Benchmarked with: Intel 4.0GHz i7-4790K with 16GB memory under x86-64 Ubuntu 8 Cores - 2.0.0-p0: Ruby 2.0.0-p0 - r62186: Ruby trunk (early 2.6.0), before MJIT changes - JIT off: On this commit, but without `--jit` option - JIT on: On this commit, and with `--jit` option ** Optcarrot fps Benchmark: https://github.com/mame/optcarrot | |2.0.0-p0 |r62186 |JIT off |JIT on | |:--------|:--------|:--------|:--------|:--------| |fps |37.32 |51.46 |51.31 |58.88 | |vs 2.0.0 |1.00x |1.38x |1.37x |1.58x | ** MJIT benchmarks Benchmark: https://github.com/benchmark-driver/mjit-benchmarks (Original: https://github.com/vnmakarov/ruby/tree/rtl_mjit_branch/MJIT-benchmarks) | |2.0.0-p0 |r62186 |JIT off |JIT on | |:----------|:--------|:--------|:--------|:--------| |aread |1.00 |1.09 |1.07 |2.19 | |aref |1.00 |1.13 |1.11 |2.22 | |aset |1.00 |1.50 |1.45 |2.64 | |awrite |1.00 |1.17 |1.13 |2.20 | |call |1.00 |1.29 |1.26 |2.02 | |const2 |1.00 |1.10 |1.10 |2.19 | |const |1.00 |1.11 |1.10 |2.19 | |fannk |1.00 |1.04 |1.02 |1.00 | |fib |1.00 |1.32 |1.31 |1.84 | |ivread |1.00 |1.13 |1.12 |2.43 | |ivwrite |1.00 |1.23 |1.21 |2.40 | |mandelbrot |1.00 |1.13 |1.16 |1.28 | |meteor |1.00 |2.97 |2.92 |3.17 | |nbody |1.00 |1.17 |1.15 |1.49 | |nest-ntimes|1.00 |1.22 |1.20 |1.39 | |nest-while |1.00 |1.10 |1.10 |1.37 | |norm |1.00 |1.18 |1.16 |1.24 | |nsvb |1.00 |1.16 |1.16 |1.17 | |red-black |1.00 |1.02 |0.99 |1.12 | |sieve |1.00 |1.30 |1.28 |1.62 | |trees |1.00 |1.14 |1.13 |1.19 | |while |1.00 |1.12 |1.11 |2.41 | ** Discourse's script/bench.rb Benchmark: https://github.com/discourse/discourse/blob/v1.8.7/script/bench.rb NOTE: Rails performance was somehow a little degraded with JIT for now. We should fix this. (At least I know opt_aref is performing badly in JIT and I have an idea to fix it. Please wait for the fix.) *** JIT off Your Results: (note for timings- percentile is first, duration is second in millisecs) categories_admin: 50: 17 75: 18 90: 22 99: 29 home_admin: 50: 21 75: 21 90: 27 99: 40 topic_admin: 50: 17 75: 18 90: 22 99: 32 categories: 50: 35 75: 41 90: 43 99: 77 home: 50: 39 75: 46 90: 49 99: 95 topic: 50: 46 75: 52 90: 56 99: 101 *** JIT on Your Results: (note for timings- percentile is first, duration is second in millisecs) categories_admin: 50: 19 75: 21 90: 25 99: 33 home_admin: 50: 24 75: 26 90: 30 99: 35 topic_admin: 50: 19 75: 20 90: 25 99: 30 categories: 50: 40 75: 44 90: 48 99: 76 home: 50: 42 75: 48 90: 51 99: 89 topic: 50: 49 75: 55 90: 58 99: 99 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62197 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-02-04 14:22:28 +03:00
#endif
eliminate CALL_SIMPLE_METHOD Arrange operands of several opt_something insns so that jumps to opt_send_without_block can be applied to them. This makes it possible to eliminate CALL_SIMPLE_METHOD macro at all. Results in binary size of vm_exec_core to change from 27,008 bytes to 26,016 bytes on my machine. [close GH-1779] Note however that PC can point somewhere non-instruction now. ----------------------------------------------------------- benchmark results: minimum results in each 3 measurements. Execution time (sec) name before after so_ackermann 0.450 0.426 so_array 0.789 0.824 so_binary_trees 5.760 5.635 so_concatenate 3.594 3.508 so_count_words 0.211 0.196 so_exception 0.256 0.244 so_fannkuch 1.049 1.044 so_fasta 1.485 1.472 so_k_nucleotide 1.195 1.216 so_lists 0.517 0.513 so_mandelbrot 2.264 2.394 so_matrix 0.501 0.468 so_meteor_contest 2.987 2.912 so_nbody 1.307 1.289 so_nested_loop 0.908 0.925 so_nsieve 1.679 1.614 so_nsieve_bits 2.131 2.092 so_object 0.620 0.625 so_partial_sums 1.623 1.675 so_pidigits 1.135 1.190 so_random 0.357 0.321 so_reverse_complement 0.619 0.583 so_sieve 0.493 0.496 so_spectralnorm 1.749 1.737 Speedup ratio: compare with the result of `before' (greater is better) name after so_ackermann 1.057 so_array 0.958 so_binary_trees 1.022 so_concatenate 1.024 so_count_words 1.077 so_exception 1.049 so_fannkuch 1.004 so_fasta 1.009 so_k_nucleotide 0.983 so_lists 1.007 so_mandelbrot 0.946 so_matrix 1.072 so_meteor_contest 1.026 so_nbody 1.013 so_nested_loop 0.982 so_nsieve 1.040 so_nsieve_bits 1.018 so_object 0.992 so_partial_sums 0.969 so_pidigits 0.954 so_random 1.111 so_reverse_complement 1.062 so_sieve 0.994 so_spectralnorm 1.007 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62089 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-29 10:15:08 +03:00
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
}
}
/* optimized length */
DEFINE_INSN
opt_length
(CALL_INFO ci, CALL_CACHE cc)
(VALUE recv)
(VALUE val)
{
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
val = vm_opt_length(recv, BOP_LENGTH);
if (val == Qundef) {
s/CALL_SIMPLE_METHOD/DISPATCH_ORIGINAL_INSN/ Now that DISPATCH_ORIGINAL_INSN is introduced, we can replace CALL_SIMPLE_METHOD with DISPATCH_ORIGINAL_INSN. These two macros differ in size very much and results in this big difference in compiled binary size. This changeset reduces the size of vm_exec_core from 32,352 bytes to 27,008 bytes on my machine. As a result it yields slightly better performance. Closes [GH-1779]. ----------------------------------------------------------- benchmark results: minimum results in each 3 measurements. Execution time (sec) name before after so_ackermann 0.484 0.454 so_array 0.837 0.779 so_binary_trees 5.928 5.801 so_concatenate 3.473 3.543 so_count_words 0.201 0.222 so_exception 0.255 0.252 so_fannkuch 1.080 1.019 so_fasta 1.459 1.463 so_k_nucleotide 1.218 1.180 so_lists 0.499 0.484 so_mandelbrot 2.189 2.324 so_matrix 0.510 0.496 so_meteor_contest 3.025 2.925 so_nbody 1.319 1.273 so_nested_loop 0.941 0.932 so_nsieve 1.806 1.647 so_nsieve_bits 2.151 2.078 so_object 0.632 0.621 so_partial_sums 1.560 1.632 so_pidigits 1.190 1.183 so_random 0.333 0.353 so_reverse_complement 0.604 0.586 so_sieve 0.521 0.481 so_spectralnorm 1.774 1.722 Speedup ratio: compare with the result of `before' (greater is better) name after so_ackermann 1.065 so_array 1.075 so_binary_trees 1.022 so_concatenate 0.980 so_count_words 0.903 so_exception 1.009 so_fannkuch 1.059 so_fasta 0.997 so_k_nucleotide 1.032 so_lists 1.032 so_mandelbrot 0.942 so_matrix 1.028 so_meteor_contest 1.034 so_nbody 1.036 so_nested_loop 1.009 so_nsieve 1.097 so_nsieve_bits 1.035 so_object 1.018 so_partial_sums 0.956 so_pidigits 1.006 so_random 0.943 so_reverse_complement 1.032 so_sieve 1.083 so_spectralnorm 1.030 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62088 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-29 10:04:50 +03:00
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
}
}
/* optimized size */
DEFINE_INSN
opt_size
(CALL_INFO ci, CALL_CACHE cc)
(VALUE recv)
(VALUE val)
{
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
val = vm_opt_length(recv, BOP_SIZE);
if (val == Qundef) {
s/CALL_SIMPLE_METHOD/DISPATCH_ORIGINAL_INSN/ Now that DISPATCH_ORIGINAL_INSN is introduced, we can replace CALL_SIMPLE_METHOD with DISPATCH_ORIGINAL_INSN. These two macros differ in size very much and results in this big difference in compiled binary size. This changeset reduces the size of vm_exec_core from 32,352 bytes to 27,008 bytes on my machine. As a result it yields slightly better performance. Closes [GH-1779]. ----------------------------------------------------------- benchmark results: minimum results in each 3 measurements. Execution time (sec) name before after so_ackermann 0.484 0.454 so_array 0.837 0.779 so_binary_trees 5.928 5.801 so_concatenate 3.473 3.543 so_count_words 0.201 0.222 so_exception 0.255 0.252 so_fannkuch 1.080 1.019 so_fasta 1.459 1.463 so_k_nucleotide 1.218 1.180 so_lists 0.499 0.484 so_mandelbrot 2.189 2.324 so_matrix 0.510 0.496 so_meteor_contest 3.025 2.925 so_nbody 1.319 1.273 so_nested_loop 0.941 0.932 so_nsieve 1.806 1.647 so_nsieve_bits 2.151 2.078 so_object 0.632 0.621 so_partial_sums 1.560 1.632 so_pidigits 1.190 1.183 so_random 0.333 0.353 so_reverse_complement 0.604 0.586 so_sieve 0.521 0.481 so_spectralnorm 1.774 1.722 Speedup ratio: compare with the result of `before' (greater is better) name after so_ackermann 1.065 so_array 1.075 so_binary_trees 1.022 so_concatenate 0.980 so_count_words 0.903 so_exception 1.009 so_fannkuch 1.059 so_fasta 0.997 so_k_nucleotide 1.032 so_lists 1.032 so_mandelbrot 0.942 so_matrix 1.028 so_meteor_contest 1.034 so_nbody 1.036 so_nested_loop 1.009 so_nsieve 1.097 so_nsieve_bits 1.035 so_object 1.018 so_partial_sums 0.956 so_pidigits 1.006 so_random 0.943 so_reverse_complement 1.032 so_sieve 1.083 so_spectralnorm 1.030 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62088 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-29 10:04:50 +03:00
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
}
}
/* optimized empty? */
DEFINE_INSN
opt_empty_p
(CALL_INFO ci, CALL_CACHE cc)
(VALUE recv)
(VALUE val)
{
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
val = vm_opt_empty_p(recv);
if (val == Qundef) {
s/CALL_SIMPLE_METHOD/DISPATCH_ORIGINAL_INSN/ Now that DISPATCH_ORIGINAL_INSN is introduced, we can replace CALL_SIMPLE_METHOD with DISPATCH_ORIGINAL_INSN. These two macros differ in size very much and results in this big difference in compiled binary size. This changeset reduces the size of vm_exec_core from 32,352 bytes to 27,008 bytes on my machine. As a result it yields slightly better performance. Closes [GH-1779]. ----------------------------------------------------------- benchmark results: minimum results in each 3 measurements. Execution time (sec) name before after so_ackermann 0.484 0.454 so_array 0.837 0.779 so_binary_trees 5.928 5.801 so_concatenate 3.473 3.543 so_count_words 0.201 0.222 so_exception 0.255 0.252 so_fannkuch 1.080 1.019 so_fasta 1.459 1.463 so_k_nucleotide 1.218 1.180 so_lists 0.499 0.484 so_mandelbrot 2.189 2.324 so_matrix 0.510 0.496 so_meteor_contest 3.025 2.925 so_nbody 1.319 1.273 so_nested_loop 0.941 0.932 so_nsieve 1.806 1.647 so_nsieve_bits 2.151 2.078 so_object 0.632 0.621 so_partial_sums 1.560 1.632 so_pidigits 1.190 1.183 so_random 0.333 0.353 so_reverse_complement 0.604 0.586 so_sieve 0.521 0.481 so_spectralnorm 1.774 1.722 Speedup ratio: compare with the result of `before' (greater is better) name after so_ackermann 1.065 so_array 1.075 so_binary_trees 1.022 so_concatenate 0.980 so_count_words 0.903 so_exception 1.009 so_fannkuch 1.059 so_fasta 0.997 so_k_nucleotide 1.032 so_lists 1.032 so_mandelbrot 0.942 so_matrix 1.028 so_meteor_contest 1.034 so_nbody 1.036 so_nested_loop 1.009 so_nsieve 1.097 so_nsieve_bits 1.035 so_object 1.018 so_partial_sums 0.956 so_pidigits 1.006 so_random 0.943 so_reverse_complement 1.032 so_sieve 1.083 so_spectralnorm 1.030 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62088 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-29 10:04:50 +03:00
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
}
}
/* optimized succ */
DEFINE_INSN
opt_succ
(CALL_INFO ci, CALL_CACHE cc)
(VALUE recv)
(VALUE val)
{
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
val = vm_opt_succ(recv);
if (val == Qundef) {
s/CALL_SIMPLE_METHOD/DISPATCH_ORIGINAL_INSN/ Now that DISPATCH_ORIGINAL_INSN is introduced, we can replace CALL_SIMPLE_METHOD with DISPATCH_ORIGINAL_INSN. These two macros differ in size very much and results in this big difference in compiled binary size. This changeset reduces the size of vm_exec_core from 32,352 bytes to 27,008 bytes on my machine. As a result it yields slightly better performance. Closes [GH-1779]. ----------------------------------------------------------- benchmark results: minimum results in each 3 measurements. Execution time (sec) name before after so_ackermann 0.484 0.454 so_array 0.837 0.779 so_binary_trees 5.928 5.801 so_concatenate 3.473 3.543 so_count_words 0.201 0.222 so_exception 0.255 0.252 so_fannkuch 1.080 1.019 so_fasta 1.459 1.463 so_k_nucleotide 1.218 1.180 so_lists 0.499 0.484 so_mandelbrot 2.189 2.324 so_matrix 0.510 0.496 so_meteor_contest 3.025 2.925 so_nbody 1.319 1.273 so_nested_loop 0.941 0.932 so_nsieve 1.806 1.647 so_nsieve_bits 2.151 2.078 so_object 0.632 0.621 so_partial_sums 1.560 1.632 so_pidigits 1.190 1.183 so_random 0.333 0.353 so_reverse_complement 0.604 0.586 so_sieve 0.521 0.481 so_spectralnorm 1.774 1.722 Speedup ratio: compare with the result of `before' (greater is better) name after so_ackermann 1.065 so_array 1.075 so_binary_trees 1.022 so_concatenate 0.980 so_count_words 0.903 so_exception 1.009 so_fannkuch 1.059 so_fasta 0.997 so_k_nucleotide 1.032 so_lists 1.032 so_mandelbrot 0.942 so_matrix 1.028 so_meteor_contest 1.034 so_nbody 1.036 so_nested_loop 1.009 so_nsieve 1.097 so_nsieve_bits 1.035 so_object 1.018 so_partial_sums 0.956 so_pidigits 1.006 so_random 0.943 so_reverse_complement 1.032 so_sieve 1.083 so_spectralnorm 1.030 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62088 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-29 10:04:50 +03:00
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
}
}
/* optimized not */
DEFINE_INSN
opt_not
(CALL_INFO ci, CALL_CACHE cc)
(VALUE recv)
(VALUE val)
{
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
val = vm_opt_not(ci, cc, recv);
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
if (val == Qundef) {
s/CALL_SIMPLE_METHOD/DISPATCH_ORIGINAL_INSN/ Now that DISPATCH_ORIGINAL_INSN is introduced, we can replace CALL_SIMPLE_METHOD with DISPATCH_ORIGINAL_INSN. These two macros differ in size very much and results in this big difference in compiled binary size. This changeset reduces the size of vm_exec_core from 32,352 bytes to 27,008 bytes on my machine. As a result it yields slightly better performance. Closes [GH-1779]. ----------------------------------------------------------- benchmark results: minimum results in each 3 measurements. Execution time (sec) name before after so_ackermann 0.484 0.454 so_array 0.837 0.779 so_binary_trees 5.928 5.801 so_concatenate 3.473 3.543 so_count_words 0.201 0.222 so_exception 0.255 0.252 so_fannkuch 1.080 1.019 so_fasta 1.459 1.463 so_k_nucleotide 1.218 1.180 so_lists 0.499 0.484 so_mandelbrot 2.189 2.324 so_matrix 0.510 0.496 so_meteor_contest 3.025 2.925 so_nbody 1.319 1.273 so_nested_loop 0.941 0.932 so_nsieve 1.806 1.647 so_nsieve_bits 2.151 2.078 so_object 0.632 0.621 so_partial_sums 1.560 1.632 so_pidigits 1.190 1.183 so_random 0.333 0.353 so_reverse_complement 0.604 0.586 so_sieve 0.521 0.481 so_spectralnorm 1.774 1.722 Speedup ratio: compare with the result of `before' (greater is better) name after so_ackermann 1.065 so_array 1.075 so_binary_trees 1.022 so_concatenate 0.980 so_count_words 0.903 so_exception 1.009 so_fannkuch 1.059 so_fasta 0.997 so_k_nucleotide 1.032 so_lists 1.032 so_mandelbrot 0.942 so_matrix 1.028 so_meteor_contest 1.034 so_nbody 1.036 so_nested_loop 1.009 so_nsieve 1.097 so_nsieve_bits 1.035 so_object 1.018 so_partial_sums 0.956 so_pidigits 1.006 so_random 0.943 so_reverse_complement 1.032 so_sieve 1.083 so_spectralnorm 1.030 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62088 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-29 10:04:50 +03:00
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
}
}
/* optimized regexp match */
DEFINE_INSN
opt_regexpmatch1
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
(VALUE recv)
(VALUE obj)
(VALUE val)
{
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
val = vm_opt_regexpmatch1(recv, obj);
}
/* optimized regexp match 2 */
DEFINE_INSN
opt_regexpmatch2
(CALL_INFO ci, CALL_CACHE cc)
(VALUE obj2, VALUE obj1)
(VALUE val)
{
split insns.def into functions Contemporary C compilers are good at function inlining. They fold multiple functions into one. However they are not yet smart enough to unfold a function into several ones. So generally speaking, it is wiser for a C programmer to manually split C functions whenever possible. That should make rooms for compilers to optimize at will. Before this changeset insns.def was converted into single HUGE function called vm_exec_core(). By moving each instruction's core into individual functions, generated C source code is reduced from 3,428 lines to 2,847 lines. Looking at the generated assembly however, it seems my compiler (gcc 6.2) is extraordinary smart so that it inlines almost all functions I introduced in this changeset back into that vm_exec_core. On my machine compiled machine binary of the function does not shrink very much in size (28,432 bytes to 26,816 bytes, according to nm(1)). I believe this change is zero-cost. Several benchmarks I exercised showed no significant difference beyond error mergin. For instance 3 repeated runs of optcarrot benchmark on my machine resulted in: before this: 28.330329285707490, 27.513378371065920, 29.40420215754537 after this: 27.107195867280414, 25.549324021385907, 30.31581919050884 in fps (greater==faster). ---- * internal.h (rb_obj_not_equal): used from vm_insnhelper.c * insns.def: move vast majority of lines into vm_insnhelper.c * vm_insnhelper.c: moved here. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 13:58:49 +03:00
val = vm_opt_regexpmatch2(obj2, obj1);
if (val == Qundef) {
s/CALL_SIMPLE_METHOD/DISPATCH_ORIGINAL_INSN/ Now that DISPATCH_ORIGINAL_INSN is introduced, we can replace CALL_SIMPLE_METHOD with DISPATCH_ORIGINAL_INSN. These two macros differ in size very much and results in this big difference in compiled binary size. This changeset reduces the size of vm_exec_core from 32,352 bytes to 27,008 bytes on my machine. As a result it yields slightly better performance. Closes [GH-1779]. ----------------------------------------------------------- benchmark results: minimum results in each 3 measurements. Execution time (sec) name before after so_ackermann 0.484 0.454 so_array 0.837 0.779 so_binary_trees 5.928 5.801 so_concatenate 3.473 3.543 so_count_words 0.201 0.222 so_exception 0.255 0.252 so_fannkuch 1.080 1.019 so_fasta 1.459 1.463 so_k_nucleotide 1.218 1.180 so_lists 0.499 0.484 so_mandelbrot 2.189 2.324 so_matrix 0.510 0.496 so_meteor_contest 3.025 2.925 so_nbody 1.319 1.273 so_nested_loop 0.941 0.932 so_nsieve 1.806 1.647 so_nsieve_bits 2.151 2.078 so_object 0.632 0.621 so_partial_sums 1.560 1.632 so_pidigits 1.190 1.183 so_random 0.333 0.353 so_reverse_complement 0.604 0.586 so_sieve 0.521 0.481 so_spectralnorm 1.774 1.722 Speedup ratio: compare with the result of `before' (greater is better) name after so_ackermann 1.065 so_array 1.075 so_binary_trees 1.022 so_concatenate 0.980 so_count_words 0.903 so_exception 1.009 so_fannkuch 1.059 so_fasta 0.997 so_k_nucleotide 1.032 so_lists 1.032 so_mandelbrot 0.942 so_matrix 1.028 so_meteor_contest 1.034 so_nbody 1.036 so_nested_loop 1.009 so_nsieve 1.097 so_nsieve_bits 1.035 so_object 1.018 so_partial_sums 0.956 so_pidigits 1.006 so_random 0.943 so_reverse_complement 1.032 so_sieve 1.083 so_spectralnorm 1.030 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62088 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-29 10:04:50 +03:00
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
}
}
/* call native compiled method */
DEFINE_INSN
opt_call_c_function
(rb_insn_func_t funcptr)
()
()
move ADD_PC around to optimize PC manipluiations This commit introduces new attribute handles_flame and if that is _not_ the case, places ADD_PC right after INC_SP. This improves locality of PC manipulations to prevents unnecessary register spill- outs. As a result, it reduces the size of vm_exec_core from 32,688 bytes to 32,384 bytes on my machine. Speedup is very faint, but certain. ----------------------------------------------------------- benchmark results: minimum results in each 3 measurements. Execution time (sec) name before after so_ackermann 0.476 0.464 so_array 0.742 0.728 so_binary_trees 5.493 5.466 so_concatenate 3.619 3.395 so_count_words 0.190 0.184 so_exception 0.249 0.239 so_fannkuch 0.994 0.953 so_fasta 1.369 1.374 so_k_nucleotide 1.111 1.111 so_lists 0.470 0.481 so_mandelbrot 2.059 2.050 so_matrix 0.466 0.465 so_meteor_contest 2.712 2.781 so_nbody 1.154 1.204 so_nested_loop 0.852 0.846 so_nsieve 1.636 1.623 so_nsieve_bits 2.073 2.039 so_object 0.616 0.584 so_partial_sums 1.464 1.481 so_pidigits 1.075 1.082 so_random 0.321 0.317 so_reverse_complement 0.555 0.558 so_sieve 0.495 0.490 so_spectralnorm 1.634 1.627 Speedup ratio: compare with the result of `before' (greater is better) name after so_ackermann 1.025 so_array 1.019 so_binary_trees 1.005 so_concatenate 1.066 so_count_words 1.030 so_exception 1.040 so_fannkuch 1.043 so_fasta 0.996 so_k_nucleotide 1.000 so_lists 0.978 so_mandelbrot 1.004 so_matrix 1.001 so_meteor_contest 0.975 so_nbody 0.959 so_nested_loop 1.007 so_nsieve 1.008 so_nsieve_bits 1.017 so_object 1.056 so_partial_sums 0.989 so_pidigits 0.994 so_random 1.014 so_reverse_complement 0.996 so_sieve 1.010 so_spectralnorm 1.004 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62051 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-01-26 09:30:58 +03:00
// attr bool handles_frame = true;
{
reg_cfp = (funcptr)(ec, reg_cfp);
if (reg_cfp == 0) {
VALUE err = ec->errinfo;
ec->errinfo = Qnil;
THROW_EXCEPTION(err);
}
mjit_compile.c: use local variables for stack if catch_except_p is FALSE. If catch_except_p is TRUE, stack values should be on VM's stack when exception is thrown and the JIT-ed frame is re-executed by VM's exception handler. If it's FALSE, the JIT-ed frame won't be re-executed and don't need to keep values on VM's stack. Using local variables allows us to reduce cfp->sp motion. Moving cfp->sp is needed only for insns whose handles_frame? is false. So it improves performance. _mjit_compile_insn.erb: Prepare `stack_size` variable for GET_SP, STACK_ADDR_FROM_TOP, TOPN macros. Share pc and sp motion partial view. Use cancel handler created in mjit_compile.c. _mjit_compile_send.erb: ditto. Also, when iseq->body->catch_except_p is TRUE, this stops to call mjit_exec directly. I described the reason in vm_insnhelper.h's comment for EXEC_EC_CFP. _mjit_compile_pc_and_sp.erb: Shared logic for moving sp and pc. As you can see from thsi file, when status->local_stack_p is TRUE and insn.handles_frame? is false, moving sp is skipped. But if insn.handles_frame? is true, values should be rolled back to VM's stack. common.mk: add dependency for the file _mjit_compile_insn_body.erb: Set sp value before canceling JIT on DISPATCH_ORIGINAL_INSN. Replace GET_SP, STACK_ADDR_FROM_TOP, TOPN macros for the case ocal_stack_p is TRUE and insn.handles_frame? is false. In that case, values are not available on VM's stack and those macros should be replaced. mjit_compile.inc.erb: updated comments of macros which are supported by JIT compiler. All references to `cfp->sp` should be replaced and thus INC_SP, SET_SV, PUSH are no longer supported for now, because they are not used now. vm_exec.h: moved EXEC_EC_CFP definition to vm_insnhelper.h because it's tighly coupled to CALL_METHOD. vm_insnhelper.h: Have revised EXEC_EC_CFP definition moved from vm_exec.h. Now it triggers mjit_exec for VM, and has the guard for catch_except_p on JIT-ed code. See comments for details. CALL_METHOD delegates triggering mjit_exec to EXEC_EC_CFP. insns.def: Stopped using EXEC_EC_CFP for the case we don't want to trigger mjit_exec. Those insns (defineclass, opt_call_c_function) are not supported by JIT and it's safe to use RESTORE_REGS(), NEXT_INSN(). expandarray is changed to pass GET_SP() to replace the macro in _mjit_compile_insn_body.erb. vm_insnhelper.c: change to take sp for the above reason. [close https://github.com/ruby/ruby/pull/1828] This patch resurrects the performance which was attached in [Feature #14235]. * Benchmark Optcarrot (with configuration for benchmark_driver.gem) https://github.com/benchmark-driver/optcarrot $ benchmark-driver benchmark.yml --verbose 1 --rbenv 'before;before+JIT::before,--jit;after;after+JIT::after,--jit' --repeat-count 10 before: ruby 2.6.0dev (2018-03-04 trunk 62652) [x86_64-linux] before+JIT: ruby 2.6.0dev (2018-03-04 trunk 62652) +JIT [x86_64-linux] after: ruby 2.6.0dev (2018-03-04 local-variable.. 62652) [x86_64-linux] last_commit=mjit_compile.c: use local variables for stack after+JIT: ruby 2.6.0dev (2018-03-04 local-variable.. 62652) +JIT [x86_64-linux] last_commit=mjit_compile.c: use local variables for stack Calculating ------------------------------------- before before+JIT after after+JIT optcarrot 53.552 59.680 53.697 63.358 fps Comparison: optcarrot after+JIT: 63.4 fps before+JIT: 59.7 fps - 1.06x slower after: 53.7 fps - 1.18x slower before: 53.6 fps - 1.18x slower git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62655 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-03-04 10:04:40 +03:00
RESTORE_REGS();
NEXT_INSN();
}
/* BLT */
DEFINE_INSN
bitblt
()
()
(VALUE ret)
{
ret = rb_str_new2("a bit of bacon, lettuce and tomato");
}
/* The Answer to Life, the Universe, and Everything */
DEFINE_INSN
answer
()
()
(VALUE ret)
{
ret = INT2FIX(42);
}