Граф коммитов

55 Коммитов

Автор SHA1 Сообщение Дата
Peter Zhu 330830dd1a Add IMEMO_NEW
Rather than exposing that an imemo has a flag and four fields, this
changes the implementation to only expose one field (the klass) and
fills the rest with 0. The type will have to fill in the values themselves.
2024-02-21 11:33:05 -05:00
John Hawthorn 1c97abaaba De-dup identical callinfo objects
Previously every call to vm_ci_new (when the CI was not packable) would
result in a different callinfo being returned this meant that every
kwarg callsite had its own CI.

When calling, different CIs result in different CCs. These CIs and CCs
both end up persisted on the T_CLASS inside cc_tbl. So in an eval loop
this resulted in a memory leak of both types of object. This also likely
resulted in extra memory used, and extra time searching, in non-eval
cases.

For simplicity in this commit I always allocate a CI object inside
rb_vm_ci_lookup, but ideally we would lazily allocate it only when
needed. I hope to do that as a follow up in the future.
2024-02-20 18:55:00 -08:00
Jeremy Evans 22e488464a Add VM_CALL_ARGS_SPLAT_MUT callinfo flag
This flag is set when the caller has already created a new array to
handle a splat, such as for `f(*a, b)` and `f(*a, *b)`.  Previously,
if `f` was defined as `def f(*a)`, these calls would create an extra
array on the callee side, instead of using the new array created
by the caller.

This modifies `setup_args_core` to set the flag whenver it would add
a `splatarray true` instruction.  However, when `splatarray true` is
changed to `splatarray false` in the peephole optimizer, to avoid
unnecessary allocations on the caller side, the flag must be removed.
Add `optimize_args_splat_no_copy` and have the peephole optimizer call
that.  This significantly simplifies the related peephole optimizer
code.

On the callee side, in `setup_parameters_complex`, set
`args->rest_dupped` to true if the flag is set.

This takes a similar approach for optimizing regular splats that was
previiously used for keyword splats in
d2c41b1bff (via VM_CALL_KW_SPLAT_MUT).
2024-01-24 18:25:55 -08:00
Jeremy Evans 3081c83169 Support tracing of struct member accessor methods
This follows the same approach used for attr_reader/attr_writer in
2d98593bf5, skipping the checking for
tracing after the first call using the call cache, and clearing the
call cache when tracing is turned on/off.

Fixes [Bug #18886]
2023-12-07 10:29:33 -08:00
HParker c74dc8b4af Use reference counting to avoid memory leak in kwargs
Tracks other callinfo that references the same kwargs and frees them when all references are cleared.

[bug #19906]

Co-authored-by: Peter Zhu <peter@peterzhu.ca>
2023-10-01 10:55:19 -04:00
Koichi Sasada cfd7729ce7 use inline cache for refinements
From Ruby 3.0, refined method invocations are slow because
resolved methods are not cached by inline cache because of
conservertive strategy. However, `using` clears all caches
so that it seems safe to cache resolved method entries.

This patch caches resolved method entries in inline cache
and clear all of inline method caches when `using` is called.

fix [Bug #18572]

```ruby
 # without refinements

class C
  def foo = :C
end

N = 1_000_000

obj = C.new
require 'benchmark'
Benchmark.bm{|x|
  x.report{N.times{
    obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
    obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
    obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
    obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
  }}
}

_END__
              user     system      total        real
master    0.362859   0.002544   0.365403 (  0.365424)
modified  0.357251   0.000000   0.357251 (  0.357258)
```

```ruby
 # with refinment but without using

class C
  def foo = :C
end

module R
  refine C do
    def foo = :R
  end
end

N = 1_000_000

obj = C.new
require 'benchmark'
Benchmark.bm{|x|
  x.report{N.times{
    obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
    obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
    obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
    obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
  }}
}
__END__
               user     system      total        real
master     0.957182   0.000000   0.957182 (  0.957212)
modified   0.359228   0.000000   0.359228 (  0.359238)
```

```ruby
 # with using

class C
  def foo = :C
end

module R
  refine C do
    def foo = :R
  end
end

N = 1_000_000

using R

obj = C.new
require 'benchmark'
Benchmark.bm{|x|
  x.report{N.times{
    obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
    obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
    obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
    obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
  }}
}
2023-07-31 17:13:43 +09:00
Koichi Sasada 36023d5cb7 mark `cc->cme_` if it is for `super`
`vm_search_super_method()` makes orphan CCs (they are not connected
from ccs) and `cc->cme_` can be collected before without marking.
2023-07-31 14:04:31 +09:00
Ruby 6391132c03 fix typo (CACH_ -> CACHE_) 2023-07-28 18:01:42 +09:00
Nobuyoshi Nakada 469e644c93
Compile code for non-embedded CI always 2023-06-30 23:59:04 +09:00
Takashi Kokubun df1b007fbd Remove unused VM_CALL_BLOCKISEQ flag 2023-04-01 10:22:47 -07:00
Takashi Kokubun 175538e433 Improve explanation of FCALL and VCALL 2023-04-01 10:17:59 -07:00
Koichi Sasada c9fd81b860 `vm_call_single_noarg_inline_builtin`
If the iseq only contains `opt_invokebuiltin_delegate_leave` insn and
the builtin-function (bf) is inline-able, the caller doesn't need to
build a method frame.

`vm_call_single_noarg_inline_builtin` is fast path for such cases.
2023-03-23 14:03:12 +09:00
Takashi Kokubun 2e875549a9 s/MJIT/RJIT/ 2023-03-06 23:44:01 -08:00
Yusuke Endoh 2cc3963a00 Prevent wrong integer expansion
`(attr_index + 1)` leads to wrong integer expansion on 32-bit machines
(including Solaris 10 CI) because `attr_index_t` is uint16_t.

http://rubyci.s3.amazonaws.com/solaris10-gcc/ruby-master/log/20221013T080004Z.fail.html.gz
```
  1) Failure:
TestRDocClassModule#test_marshal_load_version_2 [/export/home/users/chkbuild/cb-gcc/tmp/build/20221013T080004Z/ruby/test/rdoc/test_rdoc_class_module.rb:493]:
<[doc: [doc (file.rb): [para: "this is a comment"]]]> expected but was
<[doc: [doc (file.rb): [para: "this is a comment"]]]>.

  2) Failure:
TestRDocStats#test_report_method_line [/export/home/users/chkbuild/cb-gcc/tmp/build/20221013T080004Z/ruby/test/rdoc/test_rdoc_stats.rb:460]:
Expected /\#\ in\ file\ file\.rb:4/ to match "The following items are not documented:\n" +
"\n" +
"  class C # is documented\n" +
"\n" +
"    # in file file.rb\n" +
"    def m1; end\n" +
"\n" +
"  end\n" +
"\n" +
"\n".

  3) Failure:
TestRDocStats#test_report_attr_line [/export/home/users/chkbuild/cb-gcc/tmp/build/20221013T080004Z/ruby/test/rdoc/test_rdoc_stats.rb:91]:
Expected /\#\ in\ file\ file\.rb:3/ to match "The following items are not documented:\n" +
"\n" +
"  class C # is documented\n" +
"\n" +
"    attr_accessor :a # in file file.rb\n" +
"\n" +
"  end\n" +
"\n" +
"\n".
```
2022-10-13 08:14:04 -07:00
Nobuyoshi Nakada b55e3b842a Initialize shape attr index also in non-markable CC 2022-10-12 09:14:55 -07:00
Yusuke Endoh 7a9f865a1d Do not read cached_id from callcache on stack
The inline cache is initialized by vm_cc_attr_index_set only when
vm_cc_markable(cc). However, vm_getivar attempted to read the cache
even if the cc is not vm_cc_markable.

This caused a condition that depends on uninitialized value.
Here is an output of valgrind:

```
==10483== Conditional jump or move depends on uninitialised value(s)
==10483==    at 0x4C1D60: vm_getivar (vm_insnhelper.c:1171)
==10483==    by 0x4C1D60: vm_call_ivar (vm_insnhelper.c:3257)
==10483==    by 0x4E8E48: vm_call_symbol (vm_insnhelper.c:3481)
==10483==    by 0x4EAD8C: vm_sendish (vm_insnhelper.c:5035)
==10483==    by 0x4C62B2: vm_exec_core (insns.def:820)
==10483==    by 0x4DD519: rb_vm_exec (vm.c:0)
==10483==    by 0x4F00B3: invoke_block (vm.c:1417)
==10483==    by 0x4F00B3: invoke_iseq_block_from_c (vm.c:1473)
==10483==    by 0x4F00B3: invoke_block_from_c_bh (vm.c:1491)
==10483==    by 0x4D42B6: rb_yield (vm_eval.c:0)
==10483==    by 0x259128: rb_ary_each (array.c:2733)
==10483==    by 0x4E8730: vm_call_cfunc_with_frame (vm_insnhelper.c:3227)
==10483==    by 0x4EAD8C: vm_sendish (vm_insnhelper.c:5035)
==10483==    by 0x4C6254: vm_exec_core (insns.def:801)
==10483==    by 0x4DD519: rb_vm_exec (vm.c:0)
==10483==
```

In fact, the CI on FreeBSD 12 started failing since ad63b668e2.

```
gmake[1]: Entering directory '/usr/home/chkbuild/chkbuild/tmp/build/20221011T163003Z/ruby'
/usr/home/chkbuild/chkbuild/tmp/build/20221011T163003Z/ruby/lib/optparse.rb:924:in `complete': undefined method `complete' for nil:NilClass (NoMethodError)
	from /usr/home/chkbuild/chkbuild/tmp/build/20221011T163003Z/ruby/lib/optparse.rb:1816:in `block in visit'
	from /usr/home/chkbuild/chkbuild/tmp/build/20221011T163003Z/ruby/lib/optparse.rb:1815:in `reverse_each'
	from /usr/home/chkbuild/chkbuild/tmp/build/20221011T163003Z/ruby/lib/optparse.rb:1815:in `visit'
	from /usr/home/chkbuild/chkbuild/tmp/build/20221011T163003Z/ruby/lib/optparse.rb:1847:in `block in complete'
	from /usr/home/chkbuild/chkbuild/tmp/build/20221011T163003Z/ruby/lib/optparse.rb:1846:in `catch'
	from /usr/home/chkbuild/chkbuild/tmp/build/20221011T163003Z/ruby/lib/optparse.rb:1846:in `complete'
	from /usr/home/chkbuild/chkbuild/tmp/build/20221011T163003Z/ruby/lib/optparse.rb:1640:in `block in parse_in_order'
	from /usr/home/chkbuild/chkbuild/tmp/build/20221011T163003Z/ruby/lib/optparse.rb:1632:in `catch'
	from /usr/home/chkbuild/chkbuild/tmp/build/20221011T163003Z/ruby/lib/optparse.rb:1632:in `parse_in_order'
	from /usr/home/chkbuild/chkbuild/tmp/build/20221011T163003Z/ruby/lib/optparse.rb:1626:in `order!'
	from /usr/home/chkbuild/chkbuild/tmp/build/20221011T163003Z/ruby/lib/optparse.rb:1732:in `permute!'
	from /usr/home/chkbuild/chkbuild/tmp/build/20221011T163003Z/ruby/lib/optparse.rb:1757:in `parse!'
	from ./ext/extmk.rb:359:in `parse_args'
	from ./ext/extmk.rb:396:in `<main>'
```

This change adds a guard to read the cache only when vm_cc_markable(cc).
It might be better to initialize the cache as INVALID_SHAPE_ID when the
cc is not vm_cc_markable.
2022-10-12 20:28:24 +09:00
Jemma Issroff 913979bede
Make inline cache reads / writes atomic with object shapes
Prior to this commit, we were reading and writing ivar index and
shape ID in inline caches in two separate instructions when
getting and setting ivars. This meant there was a race condition
with ractors and these caches where one ractor could change
a value in the cache while another was still reading from it.

This commit instead reads and writes shape ID and ivar index to
inline caches atomically so there is no longer a race condition.

Co-Authored-By: Aaron Patterson <tenderlove@ruby-lang.org>
Co-Authored-By: John Hawthorn <john@hawthorn.email>
2022-10-11 08:40:56 -07:00
Jemma Issroff ad63b668e2
Revert "Revert "This commit implements the Object Shapes technique in CRuby.""
This reverts commit 9a6803c90b.
2022-10-11 08:40:56 -07:00
Aaron Patterson 9a6803c90b
Revert "This commit implements the Object Shapes technique in CRuby."
This reverts commit 68bc9e2e97d12f80df0d113e284864e225f771c2.
2022-09-30 16:01:50 -07:00
Jemma Issroff d594a5a8bd
This commit implements the Object Shapes technique in CRuby.
Object Shapes is used for accessing instance variables and representing the
"frozenness" of objects.  Object instances have a "shape" and the shape
represents some attributes of the object (currently which instance variables are
set and the "frozenness").  Shapes form a tree data structure, and when a new
instance variable is set on an object, that object "transitions" to a new shape
in the shape tree.  Each shape has an ID that is used for caching. The shape
structure is independent of class, so objects of different types can have the
same shape.

For example:

```ruby
class Foo
  def initialize
    # Starts with shape id 0
    @a = 1 # transitions to shape id 1
    @b = 1 # transitions to shape id 2
  end
end

class Bar
  def initialize
    # Starts with shape id 0
    @a = 1 # transitions to shape id 1
    @b = 1 # transitions to shape id 2
  end
end

foo = Foo.new # `foo` has shape id 2
bar = Bar.new # `bar` has shape id 2
```

Both `foo` and `bar` instances have the same shape because they both set
instance variables of the same name in the same order.

This technique can help to improve inline cache hits as well as generate more
efficient machine code in JIT compilers.

This commit also adds some methods for debugging shapes on objects.  See
`RubyVM::Shape` for more details.

For more context on Object Shapes, see [Feature: #18776]

Co-Authored-By: Aaron Patterson <tenderlove@ruby-lang.org>
Co-Authored-By: Eileen M. Uchitelle <eileencodes@gmail.com>
Co-Authored-By: John Hawthorn <john@hawthorn.email>
2022-09-28 08:26:21 -07:00
Aaron Patterson 06abfa5be6
Revert this until we can figure out WB issues or remove shapes from GC
Revert "* expand tabs. [ci skip]"

This reverts commit 830b5b5c35.

Revert "This commit implements the Object Shapes technique in CRuby."

This reverts commit 9ddfd2ca00.
2022-09-26 16:10:11 -07:00
Jemma Issroff 9ddfd2ca00 This commit implements the Object Shapes technique in CRuby.
Object Shapes is used for accessing instance variables and representing the
"frozenness" of objects.  Object instances have a "shape" and the shape
represents some attributes of the object (currently which instance variables are
set and the "frozenness").  Shapes form a tree data structure, and when a new
instance variable is set on an object, that object "transitions" to a new shape
in the shape tree.  Each shape has an ID that is used for caching. The shape
structure is independent of class, so objects of different types can have the
same shape.

For example:

```ruby
class Foo
  def initialize
    # Starts with shape id 0
    @a = 1 # transitions to shape id 1
    @b = 1 # transitions to shape id 2
  end
end

class Bar
  def initialize
    # Starts with shape id 0
    @a = 1 # transitions to shape id 1
    @b = 1 # transitions to shape id 2
  end
end

foo = Foo.new # `foo` has shape id 2
bar = Bar.new # `bar` has shape id 2
```

Both `foo` and `bar` instances have the same shape because they both set
instance variables of the same name in the same order.

This technique can help to improve inline cache hits as well as generate more
efficient machine code in JIT compilers.

This commit also adds some methods for debugging shapes on objects.  See
`RubyVM::Shape` for more details.

For more context on Object Shapes, see [Feature: #18776]

Co-Authored-By: Aaron Patterson <tenderlove@ruby-lang.org>
Co-Authored-By: Eileen M. Uchitelle <eileencodes@gmail.com>
Co-Authored-By: John Hawthorn <john@hawthorn.email>
2022-09-26 09:21:30 -07:00
Jemma Issroff ecff334995 Extract vm_ic_entry API to mimic vm_cc behavior 2022-07-18 12:44:01 -07:00
Nobuyoshi Nakada 90a8b1c543
Remove a typo hash [ci skip] 2022-01-29 15:33:13 +09:00
Jemma Issroff 1a180b7e18 Streamline cached attr reader / writer indexes
This commit removes the need to increment and decrement the indexes
used by vm_cc_attr_index getters and setters. It also introduces a
vm_cc_attr_index_p predicate function, and a vm_cc_attr_index_initalize
function.
2022-01-26 09:02:59 -08:00
Koichi Sasada df48db987d `mandatory_only_cme` should not be in `def`
`def` (`rb_method_definition_t`) is shared by multiple callable
method entries (cme, `rb_callable_method_entry_t`).

There are two issues:

* old -> young reference: `cme1->def->mandatory_only_cme = monly_cme`
  if `cme1` is young and `monly_cme` is young, there is no problem.
  Howevr, another old `cme2` can refer `def`, in this case, old `cme2`
  points young `monly_cme` and it violates gengc assumption.
* cme can have different `defined_class` but `monly_cme` only has
  one `defined_class`. It does not make sense and `monly_cme`
  should be created for a cme (not `def`).

To solve these issues, this patch allocates `monly_cme` per `cme`.
`cme` does not have another room to store a pointer to the `monly_cme`,
so this patch introduces `overloaded_cme_table`, which is weak key map
`[cme] -> [monly_cme]`.

`def::body::iseqptr::monly_cme` is deleted.

The first issue is reported by Alan Wu.
2021-12-21 11:03:09 +09:00
Koichi Sasada ca21eed6eb fix assertion on `gc_cc_cme()`
`cc->cme_` can be NULL when it is not initialized yet.
It can be observed on `GC.stress == true` running.
2021-11-25 13:57:49 +09:00
Koichi Sasada 7ec1fc37f4 add `VM_CALLCACHE_ON_STACK`
check if iseq refers to on stack CC (it shouldn't).
2021-11-17 22:21:42 +09:00
Koichi Sasada 8d7116552d assert `cc->cme_ != NULL`
when `vm_cc_markable(cc)`.
2021-11-17 22:21:42 +09:00
Koichi Sasada b2255153cf `vm_empty_cc_for_super`
Same as `vm_empty_cc`, introduce a global variable which has
`.call_ = vm_call_super_method`. Use it if the `cme == NULL` on
`vm_search_super_method`.
2021-11-17 22:21:42 +09:00
Koichi Sasada 84aba25031 assert `cc->call_ != NULL` 2021-11-17 22:21:42 +09:00
Koichi Sasada b1b73936c1 `Primitive.mandatory_only?` for fast path
Compare with the C methods, A built-in methods written in Ruby is
slower if only mandatory parameters are given because it needs to
check the argumens and fill default values for optional and keyword
parameters (C methods can check the number of parameters with `argc`,
so there are no overhead). Passing mandatory arguments are common
(optional arguments are exceptional, in many cases) so it is important
to provide the fast path for such common cases.

`Primitive.mandatory_only?` is a special builtin function used with
`if` expression like that:

```ruby
  def self.at(time, subsec = false, unit = :microsecond, in: nil)
    if Primitive.mandatory_only?
      Primitive.time_s_at1(time)
    else
      Primitive.time_s_at(time, subsec, unit, Primitive.arg!(:in))
    end
  end
```

and it makes two ISeq,

```
  def self.at(time, subsec = false, unit = :microsecond, in: nil)
    Primitive.time_s_at(time, subsec, unit, Primitive.arg!(:in))
  end

  def self.at(time)
    Primitive.time_s_at1(time)
  end
```

and (2) is pointed by (1). Note that `Primitive.mandatory_only?`
should be used only in a condition of an `if` statement and the
`if` statement should be equal to the methdo body (you can not
put any expression before and after the `if` statement).

A method entry with `mandatory_only?` (`Time.at` on the above case)
is marked as `iseq_overload`. When the method will be dispatch only
with mandatory arguments (`Time.at(0)` for example), make another
method entry with ISeq (2) as mandatory only method entry and it
will be cached in an inline method cache.

The idea is similar discussed in https://bugs.ruby-lang.org/issues/16254
but it only checks mandatory parameters or more, because many cases
only mandatory parameters are given. If we find other cases (optional
or keyword parameters are used frequently and it hurts performance),
we can extend the feature.
2021-11-15 15:58:56 +09:00
Aaron Patterson 0d63600e4f Partial revert of ceebc7fc98
I'm looking through the places where YJIT needs notifications.  It looks
like these changes to gc.c and vm_callinfo.h have become unnecessary
since 84ab77ba59.  This commit just makes the diff against upstream
smaller, but otherwise shouldn't change any behavior.
2021-10-20 18:19:36 -04:00
Alan Wu c378c7a7cb MicroJIT: generate less code for CFUNCs
Added UJIT_CHECK_MODE. Set to 1 to double check method dispatch in
generated code.

It's surprising to me that we need to watch both cc and cme. There might
be opportunities to simplify there.
2021-10-20 18:19:26 -04:00
Nobuyoshi Nakada cd829bb078 Remove printf family from the mjit header
Linking printf family functions makes mjit objects to link
unnecessary code.
2021-09-11 08:41:32 +09:00
卜部昌平 daf0c04a47 internal/*.h: skip doxygen
These contents are purely implementation details, not worth appearing in
CAPI documents. [ci skip]
2021-09-10 20:00:06 +09:00
Koichi Sasada 1ecda21366 global call-cache cache table for rb_funcall*
rb_funcall* (rb_funcall(), rb_funcallv(), ...) functions invokes
Ruby's method with given receiver. Ruby 2.7 introduced inline method
cache with static memory area. However, Ruby 3.0 reimplemented the
method cache data structures and the inline cache was removed.

Without inline cache, rb_funcall* searched methods everytime.
Most of cases per-Class Method Cache (pCMC) will be helped but
pCMC requires VM-wide locking and it hurts performance on
multi-Ractor execution, especially all Ractors calls methods
with rb_funcall*.

This patch introduced Global Call-Cache Cache Table (gccct) for
rb_funcall*. Call-Cache was introduced from Ruby 3.0 to manage
method cache entry atomically and gccct enables method-caching
without VM-wide locking. This table solves the performance issue
on multi-ractor execution.
[Bug #17497]

Ruby-level method invocation does not use gccct because it has
inline-method-cache and the table size is limited. Basically
rb_funcall* is not used frequently, so 1023 entries can be enough.
We will revisit the table size if it is not enough.
2021-01-29 16:22:12 +09:00
Koichi Sasada a7d933e502 fix conditon of vm_cc_invalidated_p()
vm_cc_invalidated_p() returns false when the cme is *NOT*
invalidated.
2021-01-19 14:16:37 +09:00
Koichi Sasada 954d6c7432 remove invalidated cc
if cc is invalidated, cc should be released from iseq.
2021-01-06 14:57:48 +09:00
Koichi Sasada aa6287cd26 fix inline method cache sync bug
`cd` is passed to method call functions to method invocation
functions, but `cd` can be manipulated by other ractors simultaneously
so it contains thread-safety issue.

To solve this issue, this patch stores `ci` and found `cc` to `calling`
and stops to pass `cd`.
2020-12-15 13:29:30 +09:00
Alan Wu 196eada8c6 mutete -> mutate 2020-10-22 18:20:35 -04:00
卜部昌平 e1e84fbb4f VM_CI_NEW_ID: USE_EMBED_CI could be false
It was a wrong idea to assume CIs are always embedded.
2020-06-09 09:52:46 +09:00
卜部昌平 46728557c1 rb_vm_call0: on-stack call info
This changeset reduces the generated binary of rb_vm_call0 from 281
bytes to 211 bytes on my machine.  Should reduce GC pressure as well.
2020-06-09 09:52:46 +09:00
卜部昌平 8f3d4090f0 rb_equal_opt: fully static call data
This changeset reduces the generated binary of rb_equal_opt from 129 bytes
to 17 bytes on my machine, according to nm(1).
2020-06-09 09:52:46 +09:00
卜部昌平 77293cef91 vm_ci_markable: added
CIs are created on-the-fly, which increases GC pressure.  However they
include no references to other objects, and those on-the-fly CIs tend to
be short lived.  Why not skip allocation of them.  In doing so we need
to add a flag denotes the CI object does not reside inside of objspace.
2020-06-09 09:52:46 +09:00
Nobuyoshi Nakada a9e0046c26
Moved vm_empty_cc to local in vm.c [Bug #16934]
Missed to commit a staged change.
2020-06-04 17:13:56 +09:00
卜部昌平 4ff3f20540 add #include guard hack
According to MSVC manual (*1), cl.exe can skip including a header file
when that:

- contains #pragma once, or
- starts with #ifndef, or
- starts with #if ! defined.

GCC has a similar trick (*2), but it acts more stricter (e. g. there
must be _no tokens_ outside of #ifndef...#endif).

Sun C lacked #pragma once for a looong time.  Oracle Developer Studio
12.5 finally implemented it, but we cannot assume such recent version.

This changeset modifies header files so that each of them include
strictly one #ifndef...#endif.  I believe this is the most portable way
to trigger compiler optimizations. [Bug #16770]

*1: https://docs.microsoft.com/en-us/cpp/preprocessor/once
*2: https://gcc.gnu.org/onlinedocs/cppinternals/Guard-Macros.html
2020-04-13 16:06:00 +09:00
Alan Wu 82fdffc5ec
Avoid UB with flexible array member
Accessing past the end of an array is technically UB. Use C99 flexible
array member instead to avoid the UB and simplify allocation size
calculation.

See also: DCL38-C in the SEI CERT C Coding Standard
2020-04-12 15:19:06 -04:00
卜部昌平 9e6e39c351
Merge pull request #2991 from shyouhei/ruby.h
Split ruby.h
2020-04-08 13:28:13 +09:00
Jeremy Evans d2c41b1bff Reduce allocations for keyword argument hashes
Previously, passing a keyword splat to a method always allocated
a hash on the caller side, and accepting arbitrary keywords in
a method allocated a separate hash on the callee side.  Passing
explicit keywords to a method that accepted a keyword splat
did not allocate a hash on the caller side, but resulted in two
hashes allocated on the callee side.

This commit makes passing a single keyword splat to a method not
allocate a hash on the caller side.  Passing multiple keyword
splats or a mix of explicit keywords and a keyword splat still
generates a hash on the caller side.  On the callee side,
if arbitrary keywords are not accepted, it does not allocate a
hash.  If arbitrary keywords are accepted, it will allocate a
hash, but this commit uses a callinfo flag to indicate whether
the caller already allocated a hash, and if so, the callee can
use the passed hash without duplicating it.  So this commit
should make it so that a maximum of a single hash is allocated
during method calls.

To set the callinfo flag appropriately, method call argument
compilation checks if only a single keyword splat is given.
If only one keyword splat is given, the VM_CALL_KW_SPLAT_MUT
callinfo flag is not set, since in that case the keyword
splat is passed directly and not mutable.  If more than one
splat is used, a new hash needs to be generated on the caller
side, and in that case the callinfo flag is set, indicating
the keyword splat is mutable by the callee.

In compile_hash, used for both hash and keyword argument
compilation, if compiling keyword arguments and only a
single keyword splat is used, pass the argument directly.

On the caller side, in vm_args.c, the callinfo flag needs to
be recognized and handled.  Because the keyword splat
argument may not be a hash, it needs to be converted to a
hash first if not.  Then, unless the callinfo flag is set,
the hash needs to be duplicated.  The temporary copy of the
callinfo flag, kw_flag, is updated if a hash was duplicated,
to prevent the need to duplicate it again.  If we are
converting to a hash or duplicating a hash, we need to update
the argument array, which can including duplicating the
positional splat array if one was passed.  CALLER_SETUP_ARG
and a couple other places needs to be modified to handle
similar issues for other types of calls.

This includes fairly comprehensive tests for different ways
keywords are handled internally, checking that you get equal
results but that keyword splats on the caller side result in
distinct objects for keyword rest parameters.

Included are benchmarks for keyword argument calls.
Brief results when compiled without optimization:

  def kw(a: 1) a end
  def kws(**kw) kw end
  h = {a: 1}

  kw(a: 1)       # about same
  kw(**h)        # 2.37x faster
  kws(a: 1)      # 1.30x faster
  kws(**h)       # 2.19x faster
  kw(a: 1, **h)  # 1.03x slower
  kw(**h, **h)   # about same
  kws(a: 1, **h) # 1.16x faster
  kws(**h, **h)  # 1.14x faster
2020-03-17 12:09:43 -07:00