Граф коммитов

844 Коммитов

Автор SHA1 Сообщение Дата
Jeremy Evans d2c41b1bff Reduce allocations for keyword argument hashes
Previously, passing a keyword splat to a method always allocated
a hash on the caller side, and accepting arbitrary keywords in
a method allocated a separate hash on the callee side.  Passing
explicit keywords to a method that accepted a keyword splat
did not allocate a hash on the caller side, but resulted in two
hashes allocated on the callee side.

This commit makes passing a single keyword splat to a method not
allocate a hash on the caller side.  Passing multiple keyword
splats or a mix of explicit keywords and a keyword splat still
generates a hash on the caller side.  On the callee side,
if arbitrary keywords are not accepted, it does not allocate a
hash.  If arbitrary keywords are accepted, it will allocate a
hash, but this commit uses a callinfo flag to indicate whether
the caller already allocated a hash, and if so, the callee can
use the passed hash without duplicating it.  So this commit
should make it so that a maximum of a single hash is allocated
during method calls.

To set the callinfo flag appropriately, method call argument
compilation checks if only a single keyword splat is given.
If only one keyword splat is given, the VM_CALL_KW_SPLAT_MUT
callinfo flag is not set, since in that case the keyword
splat is passed directly and not mutable.  If more than one
splat is used, a new hash needs to be generated on the caller
side, and in that case the callinfo flag is set, indicating
the keyword splat is mutable by the callee.

In compile_hash, used for both hash and keyword argument
compilation, if compiling keyword arguments and only a
single keyword splat is used, pass the argument directly.

On the caller side, in vm_args.c, the callinfo flag needs to
be recognized and handled.  Because the keyword splat
argument may not be a hash, it needs to be converted to a
hash first if not.  Then, unless the callinfo flag is set,
the hash needs to be duplicated.  The temporary copy of the
callinfo flag, kw_flag, is updated if a hash was duplicated,
to prevent the need to duplicate it again.  If we are
converting to a hash or duplicating a hash, we need to update
the argument array, which can including duplicating the
positional splat array if one was passed.  CALLER_SETUP_ARG
and a couple other places needs to be modified to handle
similar issues for other types of calls.

This includes fairly comprehensive tests for different ways
keywords are handled internally, checking that you get equal
results but that keyword splats on the caller side result in
distinct objects for keyword rest parameters.

Included are benchmarks for keyword argument calls.
Brief results when compiled without optimization:

  def kw(a: 1) a end
  def kws(**kw) kw end
  h = {a: 1}

  kw(a: 1)       # about same
  kw(**h)        # 2.37x faster
  kws(a: 1)      # 1.30x faster
  kws(**h)       # 2.19x faster
  kw(a: 1, **h)  # 1.03x slower
  kw(**h, **h)   # about same
  kws(a: 1, **h) # 1.16x faster
  kws(**h, **h)  # 1.14x faster
2020-03-17 12:09:43 -07:00
卜部昌平 f12b9a3338 %p is for void *
See also
35eb12c063
6f5eb28507
687308cf0d
b6a2d63eb3
2020-03-04 12:30:42 +09:00
Koichi Sasada 91de0daaa2 method_missing_reason should be set.
send() has special method launcher in VM and it has special
method_missing caller. This path doesn't set
ec->method_missing_reason which is used at exception creation,
so setup this information. Without this setting, NoMethodError
exception becomes NameError.

This patch will fix:
http://ci.rvm.jp/results/trunk-random1@phosphorus-docker/2761643
2020-03-03 02:44:02 +09:00
Koichi Sasada 18674aef0d check imemo_type
check imemo_type to debug
http://ci.rvm.jp/results/trunk-vm-asserts@silicon-docker/2744755
2020-02-27 10:50:20 +09:00
Koichi Sasada b9007b6c54 Introduce disposable call-cache.
This patch contains several ideas:

(1) Disposable inline method cache (IMC) for race-free inline method cache
    * Making call-cache (CC) as a RVALUE (GC target object) and allocate new
      CC on cache miss.
    * This technique allows race-free access from parallel processing
      elements like RCU.
(2) Introduce per-Class method cache (pCMC)
    * Instead of fixed-size global method cache (GMC), pCMC allows flexible
      cache size.
    * Caching CCs reduces CC allocation and allow sharing CC's fast-path
      between same call-info (CI) call-sites.
(3) Invalidate an inline method cache by invalidating corresponding method
    entries (MEs)
    * Instead of using class serials, we set "invalidated" flag for method
      entry itself to represent cache invalidation.
    * Compare with using class serials, the impact of method modification
      (add/overwrite/delete) is small.
    * Updating class serials invalidate all method caches of the class and
      sub-classes.
    * Proposed approach only invalidate the method cache of only one ME.

See [Feature #16614] for more details.
2020-02-22 09:58:59 +09:00
Koichi Sasada f2286925f0 VALUE size packed callinfo (ci).
Now, rb_call_info contains how to call the method with tuple of
(mid, orig_argc, flags, kwarg). Most of cases, kwarg == NULL and
mid+argc+flags only requires 64bits. So this patch packed
rb_call_info to VALUE (1 word) on such cases. If we can not
represent it in VALUE, then use imemo_callinfo which contains
conventional callinfo (rb_callinfo, renamed from rb_call_info).

iseq->body->ci_kw_size is removed because all of callinfo is VALUE
size (packed ci or a pointer to imemo_callinfo).

To access ci information, we need to use these functions:
vm_ci_mid(ci), _flag(ci), _argc(ci), _kwarg(ci).

struct rb_call_info_kw_arg is renamed to rb_callinfo_kwarg.

rb_funcallv_with_cc() and rb_method_basic_definition_p_with_cc()
is temporary removed because cd->ci should be marked.
2020-02-22 09:58:59 +09:00
Nobuyoshi Nakada 0b4500d982
Adjusted indent [ci skip] 2020-02-22 00:17:31 +09:00
Koichi Sasada 99a8742067 should be compared with called_id
me->called_id and me->def->original_id can be different sometimes
so we should compare with called_id, which is mtbl's key.
(fix GH-PR #2869)
2020-02-13 03:30:22 +09:00
John Hawthorn ed7b46b66b Use inline cache for super calls 2020-02-13 00:14:55 +09:00
Koichi Sasada a635c93fde support MJIT with debug option.
VM_CHECK_MODE > 0 with optflags=-O0 can not run JIT tests because
of link problems. This patch fix them.
2020-02-03 16:57:41 +09:00
Jeremy Evans beae6cbf0f Fully separate positional arguments and keyword arguments
This removes the warnings added in 2.7, and changes the behavior
so that a final positional hash is not treated as keywords or
vice-versa.

To handle the arg_setup_block splat case correctly with keyword
arguments, we need to check if we are taking a keyword hash.
That case didn't have a test, but it affects real-world code,
so add a test for it.

This removes rb_empty_keyword_given_p() and related code, as
that is not needed in Ruby 3.  The empty keyword case is the
same as the no keyword case in Ruby 3.

This changes rb_scan_args to implement keyword argument
separation for C functions when the : character is used.
For backwards compatibility, it returns a duped hash.
This is a bad idea for performance, but not duping the hash
breaks at least Enumerator::ArithmeticSequence#inspect.

Instead of having RB_PASS_CALLED_KEYWORDS be a number,
simplify the code by just making it be rb_keyword_given_p().
2020-01-02 18:40:45 -08:00
卜部昌平 5e22f873ed decouple internal.h headers
Saves comitters' daily life by avoid #include-ing everything from
internal.h to make each file do so instead.  This would significantly
speed up incremental builds.

We take the following inclusion order in this changeset:

1.  "ruby/config.h", where _GNU_SOURCE is defined (must be the very
    first thing among everything).
2.  RUBY_EXTCONF_H if any.
3.  Standard C headers, sorted alphabetically.
4.  Other system headers, maybe guarded by #ifdef
5.  Everything else, sorted alphabetically.

Exceptions are those win32-related headers, which tend not be self-
containing (headers have inclusion order dependencies).
2019-12-26 20:45:12 +09:00
卜部昌平 0958e19ffb add several __has_something macro
With these macros implemented we can write codes just like we can assume
the compiler being clang.  MSC_VERSION_SINCE is defined to implement
those macros, but turned out to be handy for other places.  The -fdeclspec
compiler flag is necessary for clang to properly handle __has_declspec().
2019-12-26 20:45:12 +09:00
Nobuyoshi Nakada db16629008
Fixed misspellings
Fixed misspellings reported at [Bug #16437], only in ruby and rubyspec.
2019-12-20 09:32:42 +09:00
卜部昌平 f054f11a38 per-method serial number
Methods and their definitions can be allocated/deallocated on-the-fly.
One pathological situation is when a method is deallocated then another
one is allocated immediately after that.  Address of those old/new method
entries/definitions can be the same then, depending on underlying
malloc/free implementation.

So pointer comparison is insufficient.  We have to check the contents.
To do so we introduce def->method_serial, which is an integer unique to
that specific method definition.

PS: Note that method_serial being uintptr_t rather than rb_serial_t is
intentional.  This is because rb_serial_t can be bigger than a pointer
on a 32bit system (rb_serial_t is at least 64bit).  In order to preserve
old packing of struct rb_call_cache, rb_serial_t is inappropriate.
2019-12-18 12:52:28 +09:00
Koichi Sasada fbe229906b add debug counter to count `call` reusing cases. 2019-12-17 13:15:38 +09:00
卜部昌平 ba11a74745 ensure cc->def == cc->me->def
The equation shall hold for every call cache.  However prior to this
changeset cc->me could be updated without also updating cc->def.  Let's
make it sure by introducing new macro named CC_SET_ME which sets cc->me
and cc->def at once.
2019-12-16 17:52:18 +09:00
Jeremy Evans 55b7ba3686 Make super in instance_eval in method in module raise TypeError
This makes behavior the same as super in instance_eval in method
in class.  The reason this wasn't implemented before is that
there is a check to determine if the self in the current context
is of the expected class, and a module itself can be included
in multiple classes, so it doesn't have an expected class.

Implementing this requires giving iclasses knowledge of which
class created them, so that super call in the module method
knows the expected class for super calls.  This reference
is called includer, and should only be set for iclasses.

Note that the approach Ruby uses in this check is not robust. If
you instance_eval another object of the same class and call super,
instead of an TypeError, you get super called with the
instance_eval receiver instead of the method receiver.  Truly
fixing super would require keeping a reference to the super object
(method receiver) in each frame where scope has changed, and using
that instead of current self when calling super.

Fixes [Bug #11636]
2019-12-12 15:50:19 +09:00
Aaron Patterson 2c8d186c6e
Introduce an "Inline IVAR cache" struct
This commit introduces an "inline ivar cache" struct.  The reason we
need this is so compaction can differentiate from an ivar cache and a
regular inline cache.  Regular inline caches contain references to
`VALUE` and ivar caches just contain references to the ivar index.  With
this new struct we can easily update references for inline caches (but
not inline var caches as they just contain an int)
2019-12-05 13:37:02 -08:00
Koichi Sasada 36da0b3da1 check interrupts at each frame pop timing.
Asynchronous events such as signal trap, finalization timing,
thread switching and so on are managed by "interrupt_flag".
Ruby's threads check this flag periodically and if a thread
does not check this flag, above events doesn't happen.

This checking is CHECK_INTS() (related) macro and it is placed
at some places (laeve instruction and so on). However, at the end
of C methods, C blocks (IMEMO_IFUNC) etc there are no checking
and it can introduce uninterruptible thread.

To modify this situation, we decide to place CHECK_INTS() at
vm_pop_frame(). It increases interrupt checking points.
[Bug #16366]

This patch can introduce unexpected events...
2019-11-29 17:47:02 +09:00
Yusuke Endoh 191ce5344e Reduce duplicated warnings for the change of Ruby 3 keyword arguments
By this change, the following code prints only one warning.

```
def foo(**opt); end
100.times { foo({kw:1}) }
```

A global variable `st_table *caller_to_callees` is a map from caller to
a set of callee methods.  It remembers that a warning is already printed
for each pair of caller and callee.

[Feature #16289]
2019-11-29 17:32:27 +09:00
Koichi Sasada f38b6d197f Revert "export for MJIT"
This reverts commit 2e6f1cf8b2.
2019-11-29 03:22:24 +09:00
Koichi Sasada e4e41840ad Revert "* remove trailing spaces. [ci skip]"
This reverts commit 27d0d7c0d3.
2019-11-29 03:22:13 +09:00
git 27d0d7c0d3 * remove trailing spaces. [ci skip] 2019-11-29 03:18:19 +09:00
Koichi Sasada 2e6f1cf8b2 export for MJIT 2019-11-29 03:17:52 +09:00
Koichi Sasada dd723771c1 fastpath for ivar read of FL_EXIVAR objects.
vm_getivar() provides fastpath for T_OBJECT by caching an index
of ivar. This patch also provides fastpath for FL_EXIVAR objects.
FL_EXIVAR objects have an each ivar array and index can be cached
as T_OBJECT. To access this ivar array, generic_iv_tbl is exposed
by rb_ivar_generic_ivtbl() (declared in variable.h which is newly
introduced).

Benchmark script:

Benchmark.driver(repeat_count: 3){|x|
  x.executable name: 'clean', command: %w'../clean/miniruby'
  x.executable name: 'trunk', command: %w'./miniruby'

  objs = [Object.new, 'str', {a: 1, b: 2}, [1, 2]]

  objs.each.with_index{|obj, i|
    rep = obj.inspect
    rep = 'Object.new' if /\#/ =~ rep
    x.prelude str = %Q{
      v#{i} = #{rep}
      def v#{i}.foo
        @iv # ivar access method (attr_reader)
      end
      v#{i}.instance_variable_set(:@iv, :iv)
    }
    puts str
    x.report %Q{
      v#{i}.foo
    }
  }
}

Result:

      v0.foo # T_OBJECT

               clean:  85387141.8 i/s
               trunk:  85249373.6 i/s - 1.00x  slower

      v1.foo # T_STRING

               trunk:  57894407.5 i/s
               clean:  39957178.6 i/s - 1.45x  slower

      v2.foo # T_HASH

               trunk:  56629413.2 i/s
               clean:  39227088.9 i/s - 1.44x  slower

      v3.foo # T_ARRAY

               trunk:  55797530.2 i/s
               clean:  38263572.9 i/s - 1.46x  slower
2019-11-29 03:11:04 +09:00
Kazuhiro NISHIYAMA 09e76e9828
Improve consistency of bool/true/false 2019-11-25 15:09:09 +09:00
Koichi Sasada e27acb6148 add fast path for argc==0.
If calling builtin functions with no arguments, we don't need to
calculate argv location.
2019-11-25 14:04:21 +09:00
卜部昌平 f6239ce0fc peep-hole optimize VM instructions
Some minor optimizations.

Calculating -------------------------------------
                           ours       trunk
          vm2_regexp     8.479M      8.346M i/s -      6.000M times in 0.707612s 0.718916s
   vm2_regexp_invert     8.605M      8.350M i/s -      6.000M times in 0.697298s 0.718576s

Comparison:
                       vm2_regexp
                ours:   8479223.3 i/s
               trunk:   8345893.8 i/s - 1.02x  slower

                vm2_regexp_invert
                ours:   8604647.4 i/s
               trunk:   8349852.8 i/s - 1.03x  slower

Calculating -------------------------------------
                           ours+jit   trunk+jit
Optcarrot Lan_Master.nes     68.603      64.167 fps

Comparison:
             Optcarrot Lan_Master.nes
                ours+jit:        68.6 fps
               trunk+jit:        64.2 fps - 1.07x  slower
2019-11-19 13:56:13 +09:00
Koichi Sasada 57cd4623cf should not use __func__ 2019-11-18 13:53:32 +09:00
Koichi Sasada 5e34ab5406 add casts.
add casts to avoid compile error.
http://ci.rvm.jp/results/trunk_clang_39@silicon-docker/2402215
2019-11-18 10:36:48 +09:00
Koichi Sasada 71fee9bc72 vm_invoke_builtin_delegate with start index.
opt_invokebuiltin_delegate and opt_invokebuiltin_delegate_leave
invokes builtin functions with same parameters of the method.
This technique eliminate stack push operations. However, delegation
parameters should be completely same as given parameters.
(e.g. `def foo(a, b, c) __builtin_foo(a, b, c)` is okay, but
__builtin_foo(b, c) is not allowed)

This patch relaxes this restriction. ISeq has a local variables
table which includes parameters. For example, the method defined
as `def foo(a, b, c) x=y=nil`, then local variables table contains
[a, b, c, x, y]. If calling builtin-function with arguments which
are sub-array of the lvar table, use opt_invokebuiltin_delegate
instruction with start index. For example, `__builtin_foo(b, c)`,
`__builtin_bar(c, x, y)` is okay, and so on.
2019-11-18 10:16:11 +09:00
Koichi Sasada 179062dd80 move rb_vm_lvar_exposed() correctly.
rb_vm_lvar_exposed() is prepared for __builtin_inline!(), needed for
mini_builtin.c and builtin.c. However, it's only on builtin.c.
So move it to make it as a part of VM.
2019-11-14 04:21:24 +09:00
Dylan Thacker-Smith ac112f2b5d Avoid top-level search for nested constant reference from nil in defined?
Fixes [Bug #16332]

Constant access was changed to no longer allow top-level constant access
through `nil`, but `defined?` wasn't changed at the same time to stay
consistent.

Use a separate defined type to distinguish between a constant
referenced from the current lexical scope and one referenced from
another namespace.
2019-11-13 15:36:58 +09:00
Koichi Sasada 9142f802f1 rewrite comment.
Pointed by nagachika-san.
https://ruby-trunk-changes.hatenablog.com/entry/ruby_trunk_changes_20191109
2019-11-11 16:47:50 +09:00
Koichi Sasada 43ceedecc0 use STACK_ADDR_FROM_TOP()
vm_invoke_builtin() accesses VM stack via cfp->sp. However, MJIT
can use their own stack. To access them appropriately, we need to
use STACK_ADDR_FROM_TOP().
2019-11-09 16:18:58 +09:00
Koichi Sasada 21f7cca2c6 initialize kw special local var.
A method which has keyword parameters has an implicit local variable
to specify which keywords are (un)specified.

vm_call_iseq_setup_kwparm_nokwarg() is special function to invoke
a ISeq method without any keyword arguments. However, it should
also initialize the special local var. Without this initialization,
the implicit lvar can points a freed (T_NONE) object.
2019-11-09 10:04:04 +09:00
卜部昌平 90fc555258 name the result of calccall
This is a pure refactoring for better understanding of what is
happening here.  Should change nothing but readability.
2019-11-08 12:09:01 +09:00
卜部昌平 a1a08ac9aa describe vm_cache_check_for_class_serial [ci skip]
Added comments describing what it is.  Requested by ko1.
2019-11-08 10:31:06 +09:00
Koichi Sasada 46acd0075d support builtin features with Ruby and C.
Support loading builtin features written in Ruby, which implement
with C builtin functions.
[Feature #16254]

Several features:

(1) Load .rb file at boottime with native binary.

Now, prelude.rb is loaded at boottime. However, this file is contained
into the interpreter as a text format and we need to compile it.
This patch contains a feature to load from binary format.

(2) __builtin_func() in Ruby call func() written in C.

In Ruby file, we can write `__builtin_func()` like method call.
However this is not a method call, but special syntax to call
a function `func()` written in C. C functions should be defined
in a file (same compile unit) which load this .rb file.

Functions (`func` in above example) should be defined with
  (a) 1st parameter: rb_execution_context_t *ec
  (b) rest parameters (0 to 15).
  (c) VALUE return type.
This is very similar requirements for functions used by
rb_define_method(), however `rb_execution_context_t *ec`
is new requirement.

(3) automatic C code generation from .rb files.

tool/mk_builtin_loader.rb creates a C code to load .rb files
needed by miniruby and ruby command. This script is run by
BASERUBY, so *.rb should be written in BASERUBY compatbile
syntax. This script load a .rb file and find all of __builtin_
prefix method calls, and generate a part of C code to export
functions.

tool/mk_builtin_binary.rb creates a C code which contains
binary compiled Ruby files needed by ruby command.
2019-11-08 09:09:29 +09:00
卜部昌平 d45a013a1a extend rb_call_cache
Prior to this changeset, majority of inline cache mishits resulted
into the same method entry when rb_callable_method_entry() resolves
a method search.  Let's not call the function at the first place on
such situations.

In doing so we extend the struct rb_call_cache from 44 bytes (in
case of 64 bit machine) to 64 bytes, and fill the gap with
secondary class serial(s).  Call cache's class serials now behavies
as a LRU cache.

Calculating -------------------------------------
                           ours         2.7         2.6
vm2_poly_same_method     2.339M      1.744M      1.369M i/s - 6.000M times in 2.565086s 3.441329s 4.381386s

Comparison:
             vm2_poly_same_method
                ours:   2339103.0 i/s
                 2.7:   1743512.3 i/s - 1.34x  slower
                 2.6:   1369429.8 i/s - 1.71x  slower
2019-11-07 17:41:30 +09:00
卜部昌平 6ff1250739 rb_method_basic_definition_p with CC
Noticed that rb_method_basic_definition_p is frequently called.
Its callers include vm_caller_setup_args_block(),
rb_hash_default_value(), rb_num_neative_int_p(), and a lot more.

It seems worth caching the method resolution part.  Majority of
rb_method_basic_definion_p() usages take fixed class and fixed
method id combinations.

Calculating -------------------------------------
                           ours       trunk
           so_matrix      2.379       2.115 i/s -       1.000 times in 0.420409s 0.472879s

Comparison:
                        so_matrix
                ours:         2.4 i/s
               trunk:         2.1 i/s - 1.12x  slower
2019-11-05 11:39:35 +09:00
卜部昌平 cc5580f175 fix bug in keyword + protected combination
Test included for the situation formerly was not working.
2019-10-28 14:38:05 +09:00
卜部昌平 356e203a3a more on struct rb_call_data
Replacing adjacent struct rb_call_info and struct rb_call_cache
into a struct rb_call_data.
2019-10-25 12:24:22 +09:00
wanabe 4ff2c58f91 retry tailcall optimization (#2529)
Sorry, f62f90367f is push miss.
2019-10-25 04:40:39 +09:00
Jeremy Evans d6a2507e49 Duplicate hash when converting keyword hash to keywords
This mirrors the behavior when manually splatting a hash.  This
mirrors the changes made in setup_parameters_complex in
6081ddd6e6, so that splatting to a
non-iseq method works the same as splatting to an iseq method.
2019-10-24 12:35:04 -07:00
Alan Wu 89e7997622 Combine call info and cache to speed up method invocation
To perform a regular method call, the VM needs two structs,
`rb_call_info` and `rb_call_cache`. At the moment, we allocate these two
structures in separate buffers. In the worst case, the CPU needs to read
4 cache lines to complete a method call. Putting the two structures
together reduces the maximum number of cache line reads to 2.

Combining the structures also saves 8 bytes per call site as the current
layout uses separate two pointers for the call info and the call cache.
This saves about 2 MiB on Discourse.

This change improves the Optcarrot benchmark at least 3%. For more
details, see attached bugs.ruby-lang.org ticket.

Complications:
 - A new instruction attribute `comptime_sp_inc` is introduced to
 calculate SP increase at compile time without using call caches. At
 compile time, a `TS_CALLDATA` operand points to a call info struct, but
 at runtime, the same operand points to a call data struct. Instruction
 that explicitly define `sp_inc` also need to define `comptime_sp_inc`.
 - MJIT code for copying call cache becomes slightly more complicated.
 - This changes the bytecode format, which might break existing tools.

[Misc #16258]
2019-10-24 18:03:42 +09:00
Nobuyoshi Nakada 42edb05626
extracted declare_under 2019-10-10 01:08:42 +09:00
Koichi Sasada ddf5020e4f Revert "tailcall optimization again (#2528)"
This reverts commit f62f90367f.
2019-10-06 17:01:00 +09:00
wanabe f62f90367f tailcall optimization again (#2528)
This is follow up of r67315.
2019-10-06 16:52:09 +09:00