This updates the trace instructions to directly dispatch to
opt_send_without_block. So this should cause no slowdown in
non-trace mode.
To enable the tracing of the optimized methods, RUBY_EVENT_C_CALL
and RUBY_EVENT_C_RETURN are added as events to the specialized
instructions.
Fixes [Bug #14870]
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
[0] => [0, *, a]
#=> [0] length mismatch (given 1, expected 2+) (NoMatchingPatternError)
Ignore test failures of typeprof caused by this change for now.
On -DUSE_EMBED_CI=0, there are more GC allocations and the old code
didn't keep old_operands[0] reachable while allocating. On a Debian
based system, I get a crash requiring erb under GC stress mode. On
macOS, tool/transcode-tblgen.rb runs incorrectly if I put GC.stress=true
as the first line.
Pin matching for local variables and constants is already supported,
and it is fairly simple to add support for these variable types.
Note that pin matching for method calls is still not supported
without wrapping in parentheses (pin expressions). I think that's
for the best as method calls are far more complex (arguments/blocks).
Implements [Feature #17724]
Since b2fc592c30 nothing was holding a reference to the dup'd CDHASH
during IBF loading. If a GC happened to run during IBF load then the
copied hash wouldn't have anything to keep it alive. We don't really
want to keep the originally loaded CDHASH hash, so this patch just
overwrites the original hash with the copied / modified hash.
[Bug #17984] [ruby-core:104259]
If the type is ADJUST we don't want to treat it like an INSN so we have
to check the type before reading from `insn_info.events`.
[Bug #18001] [ruby-core:104371]
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
Redo of 34a2acdac788602c14bf05fb616215187badd504 and
931138b00696419945dc03e10f033b1f53cd50f3 which were reverted.
GitHub PR #4340.
This change implements a cache for class variables. Previously there was
no cache for cvars. Cvar access is slow due to needing to travel all the
way up th ancestor tree before returning the cvar value. The deeper the
ancestor tree the slower cvar access will be.
The benefits of the cache are more visible with a higher number of
included modules due to the way Ruby looks up class variables. The
benchmark here includes 26 modules and shows with the cache, this branch
is 6.5x faster when accessing class variables.
```
compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105c) [x86_64-darwin19]
built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be009) [x86_64-darwin19]
| |compare-ruby|built-ruby|
|:--------|-----------:|---------:|
|vm_cvar | 5.681M| 36.980M|
| | -| 6.51x|
```
Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails
application. ActiveRecord::Base.logger has 71 ancestors. The more
ancestors a tree has, the more clear the speed increase. IE if Base had
only one ancestor we'd see no improvement. This benchmark is run on a
vanilla Rails application.
Benchmark code:
```ruby
require "benchmark/ips"
require_relative "config/environment"
Benchmark.ips do |x|
x.report "logger" do
ActiveRecord::Base.logger
end
end
```
Ruby 3.0 master / Rails 6.1:
```
Warming up --------------------------------------
logger 155.251k i/100ms
Calculating -------------------------------------
```
Ruby 3.0 with cvar cache / Rails 6.1:
```
Warming up --------------------------------------
logger 1.546M i/100ms
Calculating -------------------------------------
logger 14.857M (± 4.8%) i/s - 74.198M in 5.006202s
```
Lastly we ran a benchmark to demonstate the difference between master
and our cache when the number of modules increases. This benchmark
measures 1 ancestor, 30 ancestors, and 100 ancestors.
Ruby 3.0 master:
```
Warming up --------------------------------------
1 module 1.231M i/100ms
30 modules 432.020k i/100ms
100 modules 145.399k i/100ms
Calculating -------------------------------------
1 module 12.210M (± 2.1%) i/s - 61.553M in 5.043400s
30 modules 4.354M (± 2.7%) i/s - 22.033M in 5.063839s
100 modules 1.434M (± 2.9%) i/s - 7.270M in 5.072531s
Comparison:
1 module: 12209958.3 i/s
30 modules: 4354217.8 i/s - 2.80x (± 0.00) slower
100 modules: 1434447.3 i/s - 8.51x (± 0.00) slower
```
Ruby 3.0 with cvar cache:
```
Warming up --------------------------------------
1 module 1.641M i/100ms
30 modules 1.655M i/100ms
100 modules 1.620M i/100ms
Calculating -------------------------------------
1 module 16.279M (± 3.8%) i/s - 82.038M in 5.046923s
30 modules 15.891M (± 3.9%) i/s - 79.459M in 5.007958s
100 modules 16.087M (± 3.6%) i/s - 81.005M in 5.041931s
Comparison:
1 module: 16279458.0 i/s
100 modules: 16087484.6 i/s - same-ish: difference falls within error
30 modules: 15891406.2 i/s - same-ish: difference falls within error
```
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
RubyVM::AST.of(Thread::Backtrace::Location) returns a node that
corresponds to the location. Typically, the node is a method call, but
not always.
This change also includes iseq's dump/load support of node_ids for each
instructions.
by merging `rb_ast_body_t#line_count` and `#script_lines`.
Fortunately `line_count == RARRAY_LEN(script_lines)` was always
satisfied. When script_lines is saved, it has an array of lines, and
when not saved, it has a Fixnum that represents the old line_count.
This option makes the parser keep the original source as an array of
the original code lines. This feature exploits the mechanism of
`SCRIPT_LINES__` but records only the specified code that is passed to
RubyVM::AST.of or .parse, instead of recording all parsed program texts.
The checkmatch instruction with VM_CHECKMATCH_TYPE_CASE calls
=== without a call cache. Emit a send instruction to make the call
instead. It includes a call cache.
The call cache improves throughput of using when statements to check the
class of a given object. This is useful for say, JSON serialization.
Use of a regular send instead of checkmatch also avoids taking the VM
lock every time, which is good for multi-ractor workloads.
Calculating -------------------------------------
master post
vm_case_classes 11.013M 16.172M i/s - 6.000M times in 0.544795s 0.371009s
vm_case_lit 2.296 2.263 i/s - 1.000 times in 0.435606s 0.441826s
vm_case 74.098M 64.338M i/s - 6.000M times in 0.080974s 0.093257s
Comparison:
vm_case_classes
post: 16172114.4 i/s
master: 11013316.9 i/s - 1.47x slower
vm_case_lit
master: 2.3 i/s
post: 2.3 i/s - 1.01x slower
vm_case
master: 74097858.6 i/s
post: 64338333.9 i/s - 1.15x slower
The vm_case benchmark is a bit slower post patch, possibily due to the
larger instruction sequence. The benchmark dispatches using
opt_case_dispatch so was not running checkmatch and does not make the
=== call post patch.
It looks for "checkmatch", when it could be applied to anything that has
"newrange".
Making the optimization target more ranges might only be fair play when
all ranges are frozen. So I'm putting a reference to the ticket that
froze all ranges.
[Feature #15504]
Before this change, CDHASH operands were built as plain hashes when
loaded from binary. Without setting up the hash with the correct
st_table type, the hash can sometimes be an ar_table. When the hash is
an ar_table, lookups can call the `eql?` method on keys of the hash,
which makes the `opt_case_dispatch` instruction not "leaf" as it
implicitly declares.
The following script trips the stack canary for checking the leaf
attribute for `opt_case_dispatch` on VM_CHECK_MODE > 0 (enabled by
default with RUBY_DEBUG).
rb_vm_iseq = RubyVM::InstructionSequence
iseq = rb_vm_iseq.compile(<<-EOF)
case Class.new(String).new("foo")
when "foo"
42
end
EOF
puts rb_vm_iseq.load_from_binary(iseq.to_binary).eval
This commit changes the binary loading logic to build CDHASH with the
right st_table type. The dumping logic and the dump format stays the
same
609de71f04 fixes the issue by using
`throw` insn if `ensure` is used. However, that patch introduce
additional `throw` even if it is not needed. This patch solves
the issue.
This issue is pointed by @mame.
Rational literals are those integers suffixed with `r`. They tend to
be a part of more complex expressions like `123/456r`, but in theory
they can live alone. When such "bare" rational literals are passed to
case-when branch, we have to take care of them. Fixes [Bug #17854]
Instead of on read. Once it's in the inline cache we never have to make
one again. We want to eventually put the value into the cache, and the
best opportunity to do that is when you write the value.
This change implements a cache for class variables. Previously there was
no cache for cvars. Cvar access is slow due to needing to travel all the
way up th ancestor tree before returning the cvar value. The deeper the
ancestor tree the slower cvar access will be.
The benefits of the cache are more visible with a higher number of
included modules due to the way Ruby looks up class variables. The
benchmark here includes 26 modules and shows with the cache, this branch
is 6.5x faster when accessing class variables.
```
compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105ca45) [x86_64-darwin19]
built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be0093ae) [x86_64-darwin19]
| |compare-ruby|built-ruby|
|:--------|-----------:|---------:|
|vm_cvar | 5.681M| 36.980M|
| | -| 6.51x|
```
Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails
application. ActiveRecord::Base.logger has 71 ancestors. The more
ancestors a tree has, the more clear the speed increase. IE if Base had
only one ancestor we'd see no improvement. This benchmark is run on a
vanilla Rails application.
Benchmark code:
```ruby
require "benchmark/ips"
require_relative "config/environment"
Benchmark.ips do |x|
x.report "logger" do
ActiveRecord::Base.logger
end
end
```
Ruby 3.0 master / Rails 6.1:
```
Warming up --------------------------------------
logger 155.251k i/100ms
Calculating -------------------------------------
```
Ruby 3.0 with cvar cache / Rails 6.1:
```
Warming up --------------------------------------
logger 1.546M i/100ms
Calculating -------------------------------------
logger 14.857M (± 4.8%) i/s - 74.198M in 5.006202s
```
Lastly we ran a benchmark to demonstate the difference between master
and our cache when the number of modules increases. This benchmark
measures 1 ancestor, 30 ancestors, and 100 ancestors.
Ruby 3.0 master:
```
Warming up --------------------------------------
1 module 1.231M i/100ms
30 modules 432.020k i/100ms
100 modules 145.399k i/100ms
Calculating -------------------------------------
1 module 12.210M (± 2.1%) i/s - 61.553M in 5.043400s
30 modules 4.354M (± 2.7%) i/s - 22.033M in 5.063839s
100 modules 1.434M (± 2.9%) i/s - 7.270M in 5.072531s
Comparison:
1 module: 12209958.3 i/s
30 modules: 4354217.8 i/s - 2.80x (± 0.00) slower
100 modules: 1434447.3 i/s - 8.51x (± 0.00) slower
```
Ruby 3.0 with cvar cache:
```
Warming up --------------------------------------
1 module 1.641M i/100ms
30 modules 1.655M i/100ms
100 modules 1.620M i/100ms
Calculating -------------------------------------
1 module 16.279M (± 3.8%) i/s - 82.038M in 5.046923s
30 modules 15.891M (± 3.9%) i/s - 79.459M in 5.007958s
100 modules 16.087M (± 3.6%) i/s - 81.005M in 5.041931s
Comparison:
1 module: 16279458.0 i/s
100 modules: 16087484.6 i/s - same-ish: difference falls within error
30 modules: 15891406.2 i/s - same-ish: difference falls within error
```
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
... then, new_insn_core extracts nd_line(node).
Also, if a macro "EXPERIMENTAL_ISEQ_NODE_ID" is defined, this changeset
keeps nd_node_id(node) for each instruction. This is intended for
TypeProf to identify what AST::Node corresponds to each instruction.
This patch is originally authored by @yui-knk for showing which column a
NoMethodError occurred.
https://github.com/ruby/ruby/compare/master...yui-knk:feature/node_id
Co-Authored-By: Yuichiro Kaneko <yui-knk@ruby-lang.org>
add_ensure_iseq() adds ensure block to the end of
jump such as next/redo/return. However, if the rescue
cause are in the body, this rescue catches the exception
in ensure clause.
iter do
next
rescue
R
ensure
raise
end
In this case, R should not be executed, but executed without this patch.
Fixes [Bug #13930]
Fixes [Bug #16618]
A part of tests are written by @jeremyevans https://github.com/ruby/ruby/pull/4291
In regular assignment, Ruby evaluates the left hand side before
the right hand side. For example:
```ruby
foo[0] = bar
```
Calls `foo`, then `bar`, then `[]=` on the result of `foo`.
Previously, multiple assignment didn't work this way. If you did:
```ruby
abc.def, foo[0] = bar, baz
```
Ruby would previously call `bar`, then `baz`, then `abc`, then
`def=` on the result of `abc`, then `foo`, then `[]=` on the
result of `foo`.
This change makes multiple assignment similar to single assignment,
changing the evaluation order of the above multiple assignment code
to calling `abc`, then `foo`, then `bar`, then `baz`, then `def=` on
the result of `abc`, then `[]=` on the result of `foo`.
Implementing this is challenging with the stack-based virtual machine.
We need to keep track of all of the left hand side attribute setter
receivers and setter arguments, and then keep track of the stack level
while handling the assignment processing, so we can issue the
appropriate topn instructions to get the receiver. Here's an example
of how the multiple assignment is executed, showing the stack and
instructions:
```
self # putself
abc # send
abc, self # putself
abc, foo # send
abc, foo, 0 # putobject 0
abc, foo, 0, [bar, baz] # evaluate RHS
abc, foo, 0, [bar, baz], baz, bar # expandarray
abc, foo, 0, [bar, baz], baz, bar, abc # topn 5
abc, foo, 0, [bar, baz], baz, abc, bar # swap
abc, foo, 0, [bar, baz], baz, def= # send
abc, foo, 0, [bar, baz], baz # pop
abc, foo, 0, [bar, baz], baz, foo # topn 3
abc, foo, 0, [bar, baz], baz, foo, 0 # topn 3
abc, foo, 0, [bar, baz], baz, foo, 0, baz # topn 2
abc, foo, 0, [bar, baz], baz, []= # send
abc, foo, 0, [bar, baz], baz # pop
abc, foo, 0, [bar, baz] # pop
[bar, baz], foo, 0, [bar, baz] # setn 3
[bar, baz], foo, 0 # pop
[bar, baz], foo # pop
[bar, baz] # pop
```
As multiple assignment must deal with splats, post args, and any level
of nesting, it gets quite a bit more complex than this in non-trivial
cases. To handle this, struct masgn_state is added to keep
track of the overall state of the mass assignment, which stores a linked
list of struct masgn_attrasgn, one for each assigned attribute.
This adds a new optimization that replaces a topn 1/pop instruction
combination with a single swap instruction for multiple assignment
to non-aref attributes.
This new approach isn't compatible with one of the optimizations
previously used, in the case where the multiple assignment return value
was not needed, there was no lhs splat, and one of the left hand side
used an attribute setter. This removes that optimization. Removing
the optimization allowed for removing the POP_ELEMENT and adjust_stack
functions.
This adds a benchmark to measure how much slower multiple
assignment is with the correct evaluation order.
This benchmark shows:
* 4-9% decrease for attribute sets
* 14-23% decrease for array member sets
* Basically same speed for local variable sets
Importantly, it shows no significant difference between the popped
(where return value of the multiple assignment is not needed) and
!popped (where return value of the multiple assignment is needed)
cases for attribute and array member sets. This indicates the
previous optimization, which was dropped in the evaluation
order fix and only affected the popped case, is not important to
performance.
Fixes [Bug #4443]
Previously, defined? could result in many more method calls than
the code it was checking. `defined? a.b.c.d.e.f` generated 15 calls,
with `a` called 5 times, `b` called 4 times, etc.. This was due to
the fact that defined works in a recursive manner, but it previously
did not cache results. So for `defined? a.b.c.d.e.f`, the logic was
similar to
```ruby
return nil unless defined? a
return nil unless defined? a.b
return nil unless defined? a.b.c
return nil unless defined? a.b.c.d
return nil unless defined? a.b.c.d.e
return nil unless defined? a.b.c.d.e.f
"method"
```
With this change, the logic is similar to the following, without
the creation of a local variable:
```ruby
return nil unless defined? a
_ = a
return nil unless defined? _.b
_ = _.b
return nil unless defined? _.c
_ = _.c
return nil unless defined? _.d
_ = _.d
return nil unless defined? _.e
_ = _.e
return nil unless defined? _.f
"method"
```
In addition to eliminating redundant method calls for defined
statements, this greatly simplifies the instruction sequences by
eliminating duplication. Previously:
```
0000 putnil ( 1)[Li]
0001 putself
0002 defined func, :a, false
0006 branchunless 73
0008 putself
0009 opt_send_without_block <calldata!mid:a, argc:0, FCALL|VCALL|ARGS_SIMPLE>
0011 defined method, :b, false
0015 branchunless 73
0017 putself
0018 opt_send_without_block <calldata!mid:a, argc:0, FCALL|VCALL|ARGS_SIMPLE>
0020 opt_send_without_block <calldata!mid:b, argc:0, ARGS_SIMPLE>
0022 defined method, :c, false
0026 branchunless 73
0028 putself
0029 opt_send_without_block <calldata!mid:a, argc:0, FCALL|VCALL|ARGS_SIMPLE>
0031 opt_send_without_block <calldata!mid:b, argc:0, ARGS_SIMPLE>
0033 opt_send_without_block <calldata!mid:c, argc:0, ARGS_SIMPLE>
0035 defined method, :d, false
0039 branchunless 73
0041 putself
0042 opt_send_without_block <calldata!mid:a, argc:0, FCALL|VCALL|ARGS_SIMPLE>
0044 opt_send_without_block <calldata!mid:b, argc:0, ARGS_SIMPLE>
0046 opt_send_without_block <calldata!mid:c, argc:0, ARGS_SIMPLE>
0048 opt_send_without_block <calldata!mid:d, argc:0, ARGS_SIMPLE>
0050 defined method, :e, false
0054 branchunless 73
0056 putself
0057 opt_send_without_block <calldata!mid:a, argc:0, FCALL|VCALL|ARGS_SIMPLE>
0059 opt_send_without_block <calldata!mid:b, argc:0, ARGS_SIMPLE>
0061 opt_send_without_block <calldata!mid:c, argc:0, ARGS_SIMPLE>
0063 opt_send_without_block <calldata!mid:d, argc:0, ARGS_SIMPLE>
0065 opt_send_without_block <calldata!mid:e, argc:0, ARGS_SIMPLE>
0067 defined method, :f, true
0071 swap
0072 pop
0073 leave
```
After change:
```
0000 putnil ( 1)[Li]
0001 putself
0002 dup
0003 defined func, :a, false
0007 branchunless 52
0009 opt_send_without_block <calldata!mid:a, argc:0, FCALL|VCALL|ARGS_SIMPLE>
0011 dup
0012 defined method, :b, false
0016 branchunless 52
0018 opt_send_without_block <calldata!mid:b, argc:0, ARGS_SIMPLE>
0020 dup
0021 defined method, :c, false
0025 branchunless 52
0027 opt_send_without_block <calldata!mid:c, argc:0, ARGS_SIMPLE>
0029 dup
0030 defined method, :d, false
0034 branchunless 52
0036 opt_send_without_block <calldata!mid:d, argc:0, ARGS_SIMPLE>
0038 dup
0039 defined method, :e, false
0043 branchunless 52
0045 opt_send_without_block <calldata!mid:e, argc:0, ARGS_SIMPLE>
0047 defined method, :f, true
0051 swap
0052 pop
0053 leave
```
This fixes issues where for pathological small examples, Ruby would generate
huge instruction sequences.
Unfortunately, implementing this support is kind of a hack. This adds another
parameter to compile_call for whether we should assume the receiver is already
present on the stack, and has defined? set that parameter for the specific
case where it is compiling a method call where the receiver is also a method
call.
defined_expr0 also takes an additional parameter for whether it should leave
the results of the method call on the stack. If that argument is true, in
the case where the method isn't defined, we jump to the pop before the leave,
so the extra result is not left on the stack. This requires space for an
additional label, so lfinish now needs to be able to hold 3 labels.
Fixes [Bug #17649]
Fixes [Bug #13708]
Peephole optimization doesn't play well with find pattern at
least. The only case when a pattern matching could have
unreachable patterns is when we have lasgn/dasgn node, which
shouldn't happen in real-life.
Fixes https://bugs.ruby-lang.org/issues/17534
Callinfo was being written in to an array and the GC would not see the
reference on the stack. `new_insn_send` creates a new callinfo object,
then it calls `new_insn_core`. `new_insn_core` allocates a new INSN
linked list item, which can end up calling `xmalloc` which will trigger
a GC:
70cd351c7c/compile.c (L968-L969)
Since the callinfo object isn't on the stack, the GC won't see it, and
it can get collected. This patch just refactors `new_insn_send` to keep
the object on the stack
Co-authored-by: John Hawthorn <john@hawthorn.email>
We don't need nop padding when the catch tables are only for break /
next / redo, so lets avoid them. This eliminates nop padding in
many lambdas.
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
constant cache `IC` is accessed by non-atomic manner and there are
thread-safety issues, so Ruby 3.0 disables to use const cache on
non-main ractors.
This patch enables it by introducing `imemo_constcache` and allocates
it by every re-fill of const cache like `imemo_callcache`.
[Bug #17510]
Now `IC` only has one entry `IC::entry` and it points to
`iseq_inline_constant_cache_entry`, managed by T_IMEMO object.
`IC` is atomic data structure so `rb_mjit_before_vm_ic_update()` and
`rb_mjit_after_vm_ic_update()` is not needed.
Previously we would add code to check if an ivar was defined when using
`@foo ||= 123`, which was slower than `@foo || (@foo = 123)` when `@foo`
was already defined.
Recently 01b7d5acc7 made it so that
accessing an undefined variable no longer generates a warning, making
the defined check unnecessary and both statements exactly equal.
This commit avoids emitting the defined instruction when compiling
NODE_OP_ASGN_OR with a NODE_IVAR.
Before:
$ ruby --dump=insn -e '@foo ||= 123'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,12)> (catch: FALSE)
0000 putnil ( 1)[Li]
0001 defined instance-variable, :@foo, false
0005 branchunless 14
0007 getinstancevariable :@foo, <is:0>
0010 dup
0011 branchif 20
0013 pop
0014 putobject 123
0016 dup
0017 setinstancevariable :@foo, <is:0>
0020 leave
After:
$ ./ruby --dump=insn -e '@foo ||= 123'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,12)> (catch: FALSE)
0000 getinstancevariable :@foo, <is:0> ( 1)[Li]
0003 dup
0004 branchif 13
0006 pop
0007 putobject 123
0009 dup
0010 setinstancevariable :@foo, <is:0>
0013 leave
This seems to be about 50% faster in this benchmark:
require "benchmark/ips"
class Foo
def initialize
@foo = nil
end
def test1
@foo ||= 123
end
def test2
@foo || (@foo = 123)
end
end
FOO = Foo.new
Benchmark.ips do |x|
x.report("test1", "FOO.test1")
x.report("test2", "FOO.test2")
end
Before:
$ ruby benchmark_ivar.rb
Warming up --------------------------------------
test1 1.957M i/100ms
test2 3.125M i/100ms
Calculating -------------------------------------
test1 20.030M (± 1.7%) i/s - 101.780M in 5.083040s
test2 31.227M (± 4.5%) i/s - 156.262M in 5.015936s
After:
$ ./ruby benchmark_ivar.rb
Warming up --------------------------------------
test1 3.205M i/100ms
test2 3.197M i/100ms
Calculating -------------------------------------
test1 32.066M (± 1.1%) i/s - 163.440M in 5.097581s
test2 31.438M (± 4.9%) i/s - 159.860M in 5.098961s
* `GC.auto_compact=`, `GC.auto_compact` can be used to control when
compaction runs. Setting `auto_compact=` to true will cause
compaction to occurr duing major collections. At the moment,
compaction adds significant overhead to major collections, so please
test first!
[Feature #17176]
iv_index_tbl manages instance variable indexes (ID -> index).
This data structure should be synchronized with other ractors
so introduce some VM locks.
This patch also introduced atomic ivar cache used by
set/getinlinecache instructions. To make updating ivar cache (IVC),
we changed iv_index_tbl data structure to manage (ID -> entry)
and an entry points serial and index. IVC points to this entry so
that cache update becomes atomically.
The write barrier wasn't being called for this object, so add the
missing WB. Automatic compaction moved the reference because it didn't
know about the relationship (that's how I found the missing WB).
This commit introduces Ractor mechanism to run Ruby program in
parallel. See doc/ractor.md for more details about Ractor.
See ticket [Feature #17100] to see the implementation details
and discussions.
[Feature #17100]
This commit does not complete the implementation. You can find
many bugs on using Ractor. Also the specification will be changed
so that this feature is experimental. You will see a warning when
you make the first Ractor with `Ractor.new`.
I hope this feature can help programmers from thread-safety issues.
This reverts commit 3a4be429b5.
To fix following warning:
```
compiling ../compile.c
../compile.c:6336:20: warning: variable 'line' is uninitialized when used here [-Wuninitialized]
ADD_INSN(head, line, putnil); /* allocate stack for cached #deconstruct value */
^~~~
../compile.c:220:57: note: expanded from macro 'ADD_INSN'
ADD_ELEM((seq), (LINK_ELEMENT *) new_insn_body(iseq, (line), BIN(insn), 0))
^~~~
../compile.c:6327:13: note: initialize the variable 'line' to silence this warning
int line;
^
= 0
1 warning generated.
```
Use ID instead of GENTRY for gvars.
Global variables are compiled into GENTRY (a pointer to struct
rb_global_entry). This patch replace this GENTRY to ID and
make the code simple.
We need to search GENTRY from ID every time (st_lookup), so
additional overhead will be introduced.
However, the performance of accessing global variables is not
important now a day and this simplicity helps Ractor development.
Moved this hack mark to an argument to `compile_hash`.
> Bad Hack: temporarily mark hash node with flag so
> compile_hash can compile call differently.
Formerly, branch coverage measurement counters are generated for each
compilation traverse of the AST. However, ensure clause node is
traversed twice; one is for normal-exit case (the resulted bytecode is
embedded in its outer scope), and the other is for exceptional case (the
resulted bytecode is used in catch table). Two branch coverage counters
are generated for the two cases, but it is not desired.
This changeset revamps the internal representation of branch coverage
measurement. Branch coverage counters are generated only at the first
visit of a branch node. Visiting the same node reuses the
already-generated counter, so double counting is avoided.
This makes:
```ruby
args = [1, 2, -> {}]; foo(*args, &args.pop)
```
call `foo` with 1, 2, and the lambda, in addition to passing the
lambda as a block. This is different from the previous behavior,
which passed the lambda as a block but not as a regular argument,
which goes against the expected left-to-right evaluation order.
This is how Ruby already compiled arguments if using leading
arguments, trailing arguments, or keywords in the same call.
This works by disabling the optimization that skipped duplicating
the array during the splat (splatarray instruction argument
switches from false to true). In the above example, the splat
call duplicates the array. I've tested and cases where a
local variable or symbol are used do not duplicate the array,
so I don't expect this to decrease the performance of most Ruby
programs. However, programs such as:
```ruby
foo(*args, &bar)
```
could see a decrease in performance, if `bar` is a method call
and not a local variable.
This is not a perfect solution, there are ways to get around
this:
```ruby
args = Struct.new(:a).new([:x, :y])
def args.to_a; a; end
def args.to_proc; a.pop; ->{}; end
foo(*args, &args)
# calls foo with 1 argument (:x)
# not 2 arguments (:x and :y)
```
A perfect solution would require completely disabling the
optimization.
Fixes [Bug #16504]
Fixes [Bug #16500]
These crashes are due to alignment issues, casting ADJUST to INSN
and then accessing after the end of the ADJUST. These patches
come from Stefan Sperling <stsp@apache.org>, who reported the
issue.
The GC will not disassemble incomplete instruction sequences. So it is
important that when instructions are being assembled, any objects the
instructions point at should not be moved.
This patch implements a fixed width array that pins its references.
When the instructions are done being assembled, the pinning array goes
away and the objects inside the iseqs are allowed to move.
With compiling `CPDEBUG >= 2`, `rb_iseq_disasm` segfaults if this
table has not been created. Also `ibf_load_iseq_each` calls
`rb_iseq_insns_info_encode_positions`.
Accessing past the end of an array is technically UB. Use C99 flexible
array member instead to avoid the UB and simplify allocation size
calculation.
See also: DCL38-C in the SEI CERT C Coding Standard
Previously, passing a keyword splat to a method always allocated
a hash on the caller side, and accepting arbitrary keywords in
a method allocated a separate hash on the callee side. Passing
explicit keywords to a method that accepted a keyword splat
did not allocate a hash on the caller side, but resulted in two
hashes allocated on the callee side.
This commit makes passing a single keyword splat to a method not
allocate a hash on the caller side. Passing multiple keyword
splats or a mix of explicit keywords and a keyword splat still
generates a hash on the caller side. On the callee side,
if arbitrary keywords are not accepted, it does not allocate a
hash. If arbitrary keywords are accepted, it will allocate a
hash, but this commit uses a callinfo flag to indicate whether
the caller already allocated a hash, and if so, the callee can
use the passed hash without duplicating it. So this commit
should make it so that a maximum of a single hash is allocated
during method calls.
To set the callinfo flag appropriately, method call argument
compilation checks if only a single keyword splat is given.
If only one keyword splat is given, the VM_CALL_KW_SPLAT_MUT
callinfo flag is not set, since in that case the keyword
splat is passed directly and not mutable. If more than one
splat is used, a new hash needs to be generated on the caller
side, and in that case the callinfo flag is set, indicating
the keyword splat is mutable by the callee.
In compile_hash, used for both hash and keyword argument
compilation, if compiling keyword arguments and only a
single keyword splat is used, pass the argument directly.
On the caller side, in vm_args.c, the callinfo flag needs to
be recognized and handled. Because the keyword splat
argument may not be a hash, it needs to be converted to a
hash first if not. Then, unless the callinfo flag is set,
the hash needs to be duplicated. The temporary copy of the
callinfo flag, kw_flag, is updated if a hash was duplicated,
to prevent the need to duplicate it again. If we are
converting to a hash or duplicating a hash, we need to update
the argument array, which can including duplicating the
positional splat array if one was passed. CALLER_SETUP_ARG
and a couple other places needs to be modified to handle
similar issues for other types of calls.
This includes fairly comprehensive tests for different ways
keywords are handled internally, checking that you get equal
results but that keyword splats on the caller side result in
distinct objects for keyword rest parameters.
Included are benchmarks for keyword argument calls.
Brief results when compiled without optimization:
def kw(a: 1) a end
def kws(**kw) kw end
h = {a: 1}
kw(a: 1) # about same
kw(**h) # 2.37x faster
kws(a: 1) # 1.30x faster
kws(**h) # 2.19x faster
kw(a: 1, **h) # 1.03x slower
kw(**h, **h) # about same
kws(a: 1, **h) # 1.16x faster
kws(**h, **h) # 1.14x faster
Previously, method call keyword splats and hash keyword splats
were compiled exactly the same. This is because parse-wise, they
operate on indentical nodes when it comes to compiling the **{}.
Fix this by using an ugly hack of temporarily modifying the
nd_brace flag in the method call keyword splat case. Inside
compile_hash, only optimize the **{} case for hashes where the
nd_brace flag has been modified to reflect we are in the method
call keyword splat case and it is safe to do so.
Since compile_keyword_args is only called in one place, move the
keyword_node_p call out of that method to the single caller to
avoid duplicating the code.
This patch contains several ideas:
(1) Disposable inline method cache (IMC) for race-free inline method cache
* Making call-cache (CC) as a RVALUE (GC target object) and allocate new
CC on cache miss.
* This technique allows race-free access from parallel processing
elements like RCU.
(2) Introduce per-Class method cache (pCMC)
* Instead of fixed-size global method cache (GMC), pCMC allows flexible
cache size.
* Caching CCs reduces CC allocation and allow sharing CC's fast-path
between same call-info (CI) call-sites.
(3) Invalidate an inline method cache by invalidating corresponding method
entries (MEs)
* Instead of using class serials, we set "invalidated" flag for method
entry itself to represent cache invalidation.
* Compare with using class serials, the impact of method modification
(add/overwrite/delete) is small.
* Updating class serials invalidate all method caches of the class and
sub-classes.
* Proposed approach only invalidate the method cache of only one ME.
See [Feature #16614] for more details.
Now, rb_call_info contains how to call the method with tuple of
(mid, orig_argc, flags, kwarg). Most of cases, kwarg == NULL and
mid+argc+flags only requires 64bits. So this patch packed
rb_call_info to VALUE (1 word) on such cases. If we can not
represent it in VALUE, then use imemo_callinfo which contains
conventional callinfo (rb_callinfo, renamed from rb_call_info).
iseq->body->ci_kw_size is removed because all of callinfo is VALUE
size (packed ci or a pointer to imemo_callinfo).
To access ci information, we need to use these functions:
vm_ci_mid(ci), _flag(ci), _argc(ci), _kwarg(ci).
struct rb_call_info_kw_arg is renamed to rb_callinfo_kwarg.
rb_funcallv_with_cc() and rb_method_basic_definition_p_with_cc()
is temporary removed because cd->ci should be marked.
This behavior was deprecated in 2.7 and scheduled to be removed
in 3.0.
Calling yield in a class definition outside a method is now a
SyntaxError instead of a LocalJumpError, as well.
We were inefficient in cases where there are a lot of duplicates due to
the use of linear search. Use a hash table instead.
These cases are not that rare in the wild.
[Feature #16505]
Saves comitters' daily life by avoid #include-ing everything from
internal.h to make each file do so instead. This would significantly
speed up incremental builds.
We take the following inclusion order in this changeset:
1. "ruby/config.h", where _GNU_SOURCE is defined (must be the very
first thing among everything).
2. RUBY_EXTCONF_H if any.
3. Standard C headers, sorted alphabetically.
4. Other system headers, maybe guarded by #ifdef
5. Everything else, sorted alphabetically.
Exceptions are those win32-related headers, which tend not be self-
containing (headers have inclusion order dependencies).
On USE_LAZY_LOAD=1, the iseq should be loaded. So rb_iseq_check()
is needed. Furthermore, now lazy loading with builtin_function_table
is not supported, so it should cancel lazy loading.
`foo(*rest, post, **empty_kw)` is compiled like
`foo(*rest + [post, **empty_kw])`, and `**empty_kw` is removed by
"newarraykwsplat" instruction.
However, the method call still has a flag of KW_SPLAT, so "post" is
considered as a keyword hash, which caused a segfault.
Note that the flag cannot be removed if "empty_kw" is not always empty.
This change fixes the issue by compiling arguments with "newarray"
instead of "newarraykwsplat".
[Bug #16442]
Now, C functions written by __builtin_cexpr!(code) and others are
named as "__builtin_inline#{n}". However, it is difficult to know
what the function is. This patch rename them into
"__builtin_foo_#{lineno}" when cexpr! is in 'foo' method.
%p is for void *. Becuase fprintf is a function with variadic arguments
automatic cast from any pointer to void * does not work. We have to be
explicit.
(This is the second try of 036bc1da6c6c9b0fa9b7f5968d897a9554dd770e.)
If iseq is GC'ed, the pointer of iseq may be reused, which may hide a
deprecation warning of keyword argument change.
http://ci.rvm.jp/results/trunk-test1@phosphorus-docker/2474221
```
1) Failure:
TestKeywordArguments#test_explicit_super_kwsplat [/tmp/ruby/v2/src/trunk-test1/test/ruby/test_keyword.rb:549]:
--- expected
+++ actual
@@ -1 +1 @@
-/The keyword argument is passed as the last hash parameter.* for `m'/m
+""
```
This change ad-hocly adds iseq_unique_id for each iseq, and use it
instead of iseq pointer. This covers the case where caller is GC'ed.
Still, the case where callee is GC'ed, is not covered.
But anyway, it is very rare that iseq is GC'ed. Even when it occurs, it
just hides some warnings. It's no big deal.
This commit introduces an "inline ivar cache" struct. The reason we
need this is so compaction can differentiate from an ivar cache and a
regular inline cache. Regular inline caches contain references to
`VALUE` and ivar caches just contain references to the ivar index. With
this new struct we can easily update references for inline caches (but
not inline var caches as they just contain an int)
jump-jump optimization ignores the event flags of the jump instruction
being skipped, which leads to overlook of line events.
This changeset stops the wrong optimization when coverage measurement is
neabled and when the jump instruction has any event flag.
Note that this issue is not only for coverage but also for TracePoint,
and this change does not fix TracePoint.
However, fixing it fundamentally is tough (which requires revamp of
the compiler). This issue is critical in terms of coverage measurement,
but minor for TracePoint (ko1 said), so we here choose a stopgap
measurement.
[Bug #15980] [Bug #16397]
Note for backporters: this changeset can be viewed by `git diff -w`.
rename __builtin_inline!(code) to __builtin_cstmt(code).
Also this commit introduce the following inlining C code features.
* __builtin_cstmt!(STMT)
(renamed from __builtin_inline!)
Define a function which run STMT implicitly and call this function at
evatuation time. Note that you need to return some value in STMT.
If there is a local variables (includes method parameters), you can
read these values.
static VALUE func(ec, self) {
VALUE x = ...;
STMT
}
Usage:
def double a
# a is readable from C code.
__builtin_cstmt! 'return INT2FIX(FIX2INT(a) * 2);'
end
* __builtin_cexpr!(EXPR)
Define a function which invoke EXPR implicitly like `__builtin_cstmt!`.
Different from cstmt!, which compiled with `return EXPR;`.
(`return` and `;` are added implicitly)
static VALUE func(ec, self) {
VALUE x = ...;
return EXPPR;
}
Usage:
def double a
__builtin_cexpr! 'INT2FIX(FIX2INT(a) * 2)'
end
* __builtin_cconst!(EXPR)
Define a function which invoke EXPR implicitly like cexpr!.
However, the function is called once at compile time, not evaluated time.
Any local variables are not accessible (because there is no local variable
at compile time).
Usage:
GCC = __builtin_cconst! '__GNUC__'
* __builtin_cinit!(STMT)
STMT are writtein in auto-generated code.
This code does not return any value.
Usage:
__builtin_cinit! '#include <zlib.h>'
def no_compression?
__builtin_cconst! 'Z_NO_COMPRESSION ? Qtrue : Qfalse'
end
These functions are used from within a compilation unit so we can
make them static, for better binary size. This changeset reduces
the size of generated ruby binary from 26,590,128 bytes to
26,584,472 bytes on my macihne.
opt_invokebuiltin_delegate and opt_invokebuiltin_delegate_leave
invokes builtin functions with same parameters of the method.
This technique eliminate stack push operations. However, delegation
parameters should be completely same as given parameters.
(e.g. `def foo(a, b, c) __builtin_foo(a, b, c)` is okay, but
__builtin_foo(b, c) is not allowed)
This patch relaxes this restriction. ISeq has a local variables
table which includes parameters. For example, the method defined
as `def foo(a, b, c) x=y=nil`, then local variables table contains
[a, b, c, x, y]. If calling builtin-function with arguments which
are sub-array of the lvar table, use opt_invokebuiltin_delegate
instruction with start index. For example, `__builtin_foo(b, c)`,
`__builtin_bar(c, x, y)` is okay, and so on.
Fixes [Bug #16332]
Constant access was changed to no longer allow top-level constant access
through `nil`, but `defined?` wasn't changed at the same time to stay
consistent.
Use a separate defined type to distinguish between a constant
referenced from the current lexical scope and one referenced from
another namespace.
Add an experimental `__builtin_inline!(c_expression)` special intrinsic
which run a C code snippet.
In `c_expression`, you can access the following variables:
* ec (rb_execution_context_t *)
* self (const VALUE)
* local variables (const VALUE)
Not that you can read these variables, but you can not write them.
You need to return from this expression and return value will be a
result of __builtin_inline!().
Examples:
`def foo(x) __builtin_inline!('return rb_p(x);'); end` calls `p(x)`.
`def double(x) __builtin_inline!('return INT2NUM(NUM2INT(x) * 2);')`
returns x*2.
Support loading builtin features written in Ruby, which implement
with C builtin functions.
[Feature #16254]
Several features:
(1) Load .rb file at boottime with native binary.
Now, prelude.rb is loaded at boottime. However, this file is contained
into the interpreter as a text format and we need to compile it.
This patch contains a feature to load from binary format.
(2) __builtin_func() in Ruby call func() written in C.
In Ruby file, we can write `__builtin_func()` like method call.
However this is not a method call, but special syntax to call
a function `func()` written in C. C functions should be defined
in a file (same compile unit) which load this .rb file.
Functions (`func` in above example) should be defined with
(a) 1st parameter: rb_execution_context_t *ec
(b) rest parameters (0 to 15).
(c) VALUE return type.
This is very similar requirements for functions used by
rb_define_method(), however `rb_execution_context_t *ec`
is new requirement.
(3) automatic C code generation from .rb files.
tool/mk_builtin_loader.rb creates a C code to load .rb files
needed by miniruby and ruby command. This script is run by
BASERUBY, so *.rb should be written in BASERUBY compatbile
syntax. This script load a .rb file and find all of __builtin_
prefix method calls, and generate a part of C code to export
functions.
tool/mk_builtin_binary.rb creates a C code which contains
binary compiled Ruby files needed by ruby command.
Get rid of these redundant and useless warnings.
```
$ ruby -e 'def bar(a) a; end; def foo(...) bar(...) end; foo({})'
-e:1: warning: The last argument is used as the keyword parameter
-e:1: warning: for `foo' defined here
-e:1: warning: The keyword argument is passed as the last hash parameter
-e:1: warning: for `bar' defined here
```
To perform a regular method call, the VM needs two structs,
`rb_call_info` and `rb_call_cache`. At the moment, we allocate these two
structures in separate buffers. In the worst case, the CPU needs to read
4 cache lines to complete a method call. Putting the two structures
together reduces the maximum number of cache line reads to 2.
Combining the structures also saves 8 bytes per call site as the current
layout uses separate two pointers for the call info and the call cache.
This saves about 2 MiB on Discourse.
This change improves the Optcarrot benchmark at least 3%. For more
details, see attached bugs.ruby-lang.org ticket.
Complications:
- A new instruction attribute `comptime_sp_inc` is introduced to
calculate SP increase at compile time without using call caches. At
compile time, a `TS_CALLDATA` operand points to a call info struct, but
at runtime, the same operand points to a call data struct. Instruction
that explicitly define `sp_inc` also need to define `comptime_sp_inc`.
- MJIT code for copying call cache becomes slightly more complicated.
- This changes the bytecode format, which might break existing tools.
[Misc #16258]
This changeset basically replaces `ruby_xmalloc(x * y)` into
`ruby_xmalloc2(x, y)`. Some convenient functions are also
provided for instance `rb_xmalloc_mul_add(x, y, z)` which allocates
x * y + z byes.
The parser needs to determine whether a local varaiable is defined or
not in outer scope. For the sake, "base_block" field has kept the outer
block.
However, the whole block was actually unneeded; the parser used only
base_block->iseq.
So, this change lets parser_params have the iseq directly, instead of
the whole block.
`freeze_string` essentially called iseq_add_mark_object_compile_time. I
need to know where all writes occur on the `rb_iseq_t`, so this commit
separates the function calls so we can add write barriers in the right
place.
Move the "add mark object" function to the location where we should be
calling RB_OBJ_WRITTEN. I'm going to add verification code next so we
can make sure the objects we're adding to the array are also reachable
from the mark function.
This makes it consistent with calling private attribute assignment
methods, which currently is allowed (e.g. `self.value =`).
Calling a private method in this way can be useful when trying to
assign the return value to a local variable with the same name.
[Feature #11297] [Feature #16123]
The output of RubyVM::InstructionSequence#to_binary is extremely large.
We have reduced the output of #to_binary by more than 70%.
The execution speed of RubyVM::InstructionSequence.load_from_binary is about 7% slower, but when reading a binary from a file, it may be faster than the master.
Since Bootsnap gem uses #to_binary, this proposal reduces the compilation cache size of Rails projects to about 1/4.
See details: [Feature #16163]
RubyVM::InstructionSequence.to_binary generates a bytecode binary
representation. To check compatibility with binary and loading
MRI we prepared major/minor version and compare them at loading
time. However, development version of MRI can change this format
but we can not increment minor version to make them consistent
with Ruby's major/minor versions.
To solve this issue, we introduce new minor version scheme
(binary's minor_version = ruby's minor * 10000 + dev ver)
and we can check incompatibility with older dev version.
The original code looks unnecessarily complicated (to me).
Also, it creates a pre-allocated array only for the prefix of the array.
The new code optimizes not only the prefix but also the subsequence that
is longer than 0x40 elements.
# not optimized
10000000.times { [1+1, 1,2,3,4,...,63] } # 2.12 sec.
# (1+1; push 1; push 2; ...; puts 63; newarray 64; concatarray)
# optimized
10000000.times { [1+1, 1,2,3,4,...,63,64] } # 1.46 sec.
# (1+1; newarray 1; putobject [1,2,3,...,64]; concatarray)
"popped" case can be so simple, so this change moves the branch to the
first, instead of scattering `if (popped)` branches to the main part.
Also, the return value "len" is not used. So it returns just 0 or 1.
compile_list was for the compilation of Array literal and Hash literal.
I guess it was originally reasonable to handle them in one function, but
now, compilation of Array is very different from Hash. So the function
was complicated by many branches for Array and Hash.
This change separates the function to two ones for Array and Hash.
An array literal [1,2,...,301] was compiled to the following iseq:
duparray [1,2,...,300]
putobject [301]
concatarray
The Array literal optimization took every two elements maybe because it
must handle not only Array but also Hash.
Now the optimization takes each element if it is an Array literal. So
the new iseq is: duparray [1,2,...,301].
`[{}, {}, {}, ..., {}, *{}]` is wrongly created.
A big array literal is created and concatenated for every 256 elements.
The newarraykwsplat must be emitted only at the last chunk.
and NODE_ZARRAY to NODE_ZLIST.
NODE_ARRAY is used not only by an Array literal, but also the contents
of Hash literals, method call arguments, dynamic string literals, etc.
In addition, the structure of NODE_ARRAY is a linked list, not an array.
This is very confusing, so I believe `NODE_LIST` is a better name.
Previously, **{} was removed by the parser:
```
$ ruby --dump=parse -e '{**{}}'
@ NODE_SCOPE (line: 1, location: (1,0)-(1,6))
+- nd_tbl: (empty)
+- nd_args:
| (null node)
+- nd_body:
@ NODE_HASH (line: 1, location: (1,0)-(1,6))*
+- nd_brace: 1 (hash literal)
+- nd_head:
(null node)
```
Since it was removed by the parser, the compiler did not know
about it, and `m(**{})` was therefore treated as `m()`.
This modifies the parser to not remove the `**{}`. A simple
approach for this is fairly simple by just removing a few
lines from the parser, but that would cause two hash
allocations every time it was used. The approach taken here
modifies both the parser and the compiler, and results in `**{}`
not allocating any hashes in the usual case.
The basic idea is we use a literal node in the parser containing
a frozen empty hash literal. In the compiler, we recognize when
that is used, and if it is the only keyword present, we just
push it onto the VM stack (no creation of a new hash or merging
of keywords). If it is the first keyword present, we push a
new empty hash onto the VM stack, so that later keywords can
merge into it. If it is not the first keyword present, we can
ignore it, since the there is no reason to merge an empty hash
into the existing hash.
Example instructions for `m(**{})`
Before (note ARGS_SIMPLE):
```
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,7)> (catch: FALSE)
0000 putself ( 1)[Li]
0001 opt_send_without_block <callinfo!mid:m, argc:0, FCALL|ARGS_SIMPLE>, <callcache>
0004 leave
```
After (note putobject and KW_SPLAT):
```
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,7)> (catch: FALSE)
0000 putself ( 1)[Li]
0001 putobject {}
0003 opt_send_without_block <callinfo!mid:m, argc:1, FCALL|KW_SPLAT>, <callcache>
0006 leave
```
Example instructions for `m(**h, **{})`
Before and After (no change):
```
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,12)> (catch: FALSE)
0000 putself ( 1)[Li]
0001 putspecialobject 1
0003 newhash 0
0005 putself
0006 opt_send_without_block <callinfo!mid:h, argc:0, FCALL|VCALL|ARGS_SIMPLE>, <callcache>
0009 opt_send_without_block <callinfo!mid:core#hash_merge_kwd, argc:2, ARGS_SIMPLE>, <callcache>
0012 opt_send_without_block <callinfo!mid:m, argc:1, FCALL|KW_SPLAT>, <callcache>
0015 leave
```
Example instructions for `m(**{}, **h)`
Before:
```
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,12)> (catch: FALSE)
0000 putself ( 1)[Li]
0001 putspecialobject 1
0003 newhash 0
0005 putself
0006 opt_send_without_block <callinfo!mid:h, argc:0, FCALL|VCALL|ARGS_SIMPLE>, <callcache>
0009 opt_send_without_block <callinfo!mid:core#hash_merge_kwd, argc:2, ARGS_SIMPLE>, <callcache>
0012 opt_send_without_block <callinfo!mid:m, argc:1, FCALL|KW_SPLAT>, <callcache>
0015 leave
```
After (basically the same except for the addition of swap):
```
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,12)> (catch: FALSE)
0000 putself ( 1)[Li]
0001 newhash 0
0003 putspecialobject 1
0005 swap
0006 putself
0007 opt_send_without_block <callinfo!mid:h, argc:0, FCALL|VCALL|ARGS_SIMPLE>, <callcache>
0010 opt_send_without_block <callinfo!mid:core#hash_merge_kwd, argc:2, ARGS_SIMPLE>, <callcache>
0013 opt_send_without_block <callinfo!mid:m, argc:1, FCALL|KW_SPLAT>, <callcache>
0016 leave
```
for simplicity and consistency.
Now SUPPORT_JOKE needs to be prefixed with OPT_ to make the config
visible in `RubyVM::VmOptsH`, and the inconsistency was introduced.
As it has never been available for override in configure (no #ifndef
guard), it should be fine to rename the config.
This syntax means the method should be treated as a method that
uses keyword arguments, but no specific keyword arguments are
supported, and therefore calling the method with keyword arguments
will raise an ArgumentError. It is still allowed to double splat
an empty hash when calling the method, as that does not pass
any keyword arguments.
After 5e86b005c0, I now think ANYARGS is
dangerous and should be extinct. This commit adds function prototypes
for rb_hash_foreach / st_foreach_safe. Also fixes some prototype
mismatches.
After 5e86b005c0, I now think ANYARGS is
dangerous and should be extinct. This commit deletes ANYARGS from
struct vm_ifunc, but in doing so we also have to decouple the usage
of this struct in compile.c, which (I think) is an abuse of ANYARGS.
Some tooling depends on the current bytecode, and adding an operand
changes the bytecode. While tooling can be updated for new bytecode,
this support doesn't warrant such a change.
This was an intentional bug added in 1.9.
The approach taken here is to add a second operand to the
getconstant instruction for whether nil should be allowed and
treated as current scope.
Fixes [Bug #11718]
Binary dumping the iseq for `case foo in []; end` used to crash as
there was no handling for these exception classes.
Pattern matching generates these classes as operands to `putobject`.
[Bug #16088]
Closes: https://github.com/ruby/ruby/pull/2325
`(ID)1` was assigned to NODE_ARGS#rest_arg for `{|x,| }`.
This change removes the magic number by introducing an explicit macro
variable for it: NODE_SPECIAL_EXCESSED_COMMA.
* internal.h (UNALIGNED_MEMBER_ACCESS, UNALIGNED_MEMBER_PTR):
moved from eval_intern.h.
* compile.c iseq.c, vm.c: use UNALIGNED_MEMBER_PTR for `entries`
in `struct iseq_catch_table`.
* vm_eval.c, vm_insnhelper.c: use UNALIGNED_MEMBER_PTR for `body`
in `rb_method_definition_t`.
Because hard to specify commits related to r67479 only.
So please commit again.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67499 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
ISeq pins references in the mark array during compile, so it manually
marks references in the mark_ary. This was causing write barrier
misses, so we need to add a write barrier when pushing on the mark
array.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67489 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* insns.def: add definemethod and definesmethod (singleton method)
instructions. Old YARV contains these instructions, but it is moved
to methods of FrozenCore class because remove number of instructions
can improve performance for some techniques (static stack caching
and so on). However, we don't employ these technique and it is hard
to optimize/analysis definition sequence. So I decide to introduce
them (and remove definition methods). `putiseq` insn is also removed.
* vm_method.c (rb_scope_visibility_get): renamed to
`vm_scope_visibility_get()` and make it accept `ec`.
Same for `vm_scope_module_func_check()`.
These fixes are result of refactoring `vm_define_method`.
* vm_insnhelper.c (rb_vm_get_cref): renamed to `vm_get_cref`
because of consistency with other functions.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67442 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
NODE_HASH#nd_brace is a flag that is 1 for `foo({ k: 1 })` and 0 for
`foo(k: 1)`.
nd_alen had been abused for the flag (and the implementation is
completely the same), but an explicit name is better to read.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67266 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
For unknown reason, setup_args processed the arguments from the last to
the first. This is not only difficult to read, but also inefficient in
some cases. For example, the arguments of `foo(*a1, *a2, *a3)` was
compiled like `a1.dup << (a2.dup << a3)`. The second dup (`a2.dup`) is
not needed.
This change refactors the function so that it processes the arguments
forward: `foo(*a1, *a2, *a3)` is compiled as `a1.dup << a2 << a3`, and
in my opinion, the source code is now much more readable.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67255 b2dd03c8-39d4-4d8f-98ff-823fe69b080e