There's no point in making a copy of an array just to expand it. Saves
an unnecessary array allocation in the multiple assignment case, with
a 35-84% improvement in affected cases in benchmark/masgn.yml.
This optimizes unbalanced multiple assignment cases such as:
```ruby
a.b, c.d = e, f, g
a.b, c.d, e.f = g, h
```
Previously, this would use:
```
newarray(3)
expandarray(2, 0)
newarray(2)
expandarray(3, 0)
```
These would both allocate arrays. This switches to opt_reverse
with either pop or putnil:
```
pop
opt_reverse(2)
putnil
opt_reverse(3)
```
This avoids an unnecessary array allocation, and results in a 35-76%
performance increase in these types of unbalanced cases (tested with
benchmark/masgn.yml).
This renames the reverse instruction to opt_reverse, since now it
is only added by the optimizer. Then it uses as a more general
form of swap. This optimizes multiple assignment in the popped
case with more than two elements.
An optimization for multiple assignment in the popped case to avoid
array allocation was lost in my fix to make multiple assignment follow
left-to-right evaluation (50c54d40a8).
Before, in the two element case, swap was used. Afterward, newarray(2)
and expandarray(2, 0) were used, which is the same as swap, with the
addition of an unnecessary allocation.
Because this issue is not specific to multiple assignment, and the
multiple assignment code is complex enough as it is, this updates
the peephole optimizer to do the newarray(2)/expandarray(2, 0) -> swap
conversion.
A more general optimization pass for
newarray(X)/expandarray(X, 0) -> reverse(X) will follow, but that
requires readding the reverse instruction.
rb_ary_tmp_new suggests that the array is temporary in some way, but
that's not true, it just creates an array that's hidden and not on the
transient heap. This commit renames it to rb_ary_hidden_new.
The RARRAY_LITERAL_FLAG was added in commit
5871ecf956 to improve CoW performance for
array literals by not keeping track of reference counts.
This commit reverts that commit and has an alternate implementation that
is more generic for all frozen arrays. Since frozen arrays cannot be
modified, we don't need to set the RARRAY_SHARED_ROOT_FLAG and we don't
need to do reference counting.
Array created as literals during iseq compilation don't need a
reference count since they can never be modified. The previous
implementation would mutate the hidden array's reference count,
causing copy-on-write invalidation.
This commit adds a RARRAY_LITERAL_FLAG for arrays created through
rb_ary_literal_new. Arrays created with this flag do not have reference
count stored and just assume they have infinite number of references.
Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
At that commit, I fixed a wrong conditional expression that was always
true. However, that seemed to have caused a regression. [Bug #18906]
This change removes the condition to make the code always enabled.
It had been enabled until that commit, albeit unintentionally, and even
if it is enabled it only consumes a tiny bit of memory, so I believe it
is harmless. [Bug #18906]
We need to dump relative offsets for inline storage entries so that
loading iseqs as an array works as well. This commit also has some
minor refactoring to make computing relative ISE information easier.
This should fix the iseq dump / load as array tests we're seeing fail in
CI.
Co-Authored-By: John Hawthorn <john@hawthorn.email>
ISeqs loaded from binary were breaking because the storage partition
calculation had bugs in it. Specifically it couldn't take in to account
the case when inline storage was overallocated (for example when we
allocate inline storage for an instruction but peephole optimization
eliminates that instruction).
`RUBY_ISEQ_DUMP_DEBUG=to_binary make test-all` would break, and this
patch fixes it
This commit adds a bitfield to the iseq body that stores offsets inside
the iseq buffer that contain values we need to mark. We can use this
bitfield to mark objects instead of disassembling the instructions.
This commit also groups inline storage entries and adds a counter for
each entry. This allows us to iterate and mark each entry without
disassembling instructions
Since we have a bitfield and grouped inline caches, we can mark all
VALUE objects associated with instructions without actually
disassembling the instructions at mark time.
[Feature #18875] [ruby-core:109042]
... because insns_info_index could not be zero here. Also it adds an
invariant check for that.
This change will prevent the following warning of GCC 12.1
http://rubyci.s3.amazonaws.com/arch/ruby-master/log/20220613T000004Z.log.html.gz
```
compile.c:2230:39: warning: array subscript 2147483647 is outside array bounds of ‘struct iseq_insn_info_entry[2147483647]’ [-Warray-bounds]
2230 | insns_info[insns_info_index-1].line_no != adjust->line_no) {
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~
```
Use ISEQ_BODY macro to get the rb_iseq_constant_body of the ISeq. Using
this macro will make it easier for us to change the allocation strategy
of rb_iseq_constant_body when using Variable Width Allocation.
Previously, the right hand side was always evaluated before the
left hand side for constant assignments. For the following:
```ruby
lhs::C = rhs
```
rhs was evaluated before lhs, which is inconsistant with attribute
assignment (lhs.m = rhs), and apparently also does not conform to
JIS 3017:2013 11.4.2.2.3.
Fix this by changing evaluation order. Previously, the above
compiled to:
```
0000 putself ( 1)[Li]
0001 opt_send_without_block <calldata!mid:rhs, argc:0, FCALL|VCALL|ARGS_SIMPLE>
0003 dup
0004 putself
0005 opt_send_without_block <calldata!mid:lhs, argc:0, FCALL|VCALL|ARGS_SIMPLE>
0007 setconstant :C
0009 leave
```
After this change:
```
0000 putself ( 1)[Li]
0001 opt_send_without_block <calldata!mid:lhs, argc:0, FCALL|VCALL|ARGS_SIMPLE>
0003 putself
0004 opt_send_without_block <calldata!mid:rhs, argc:0, FCALL|VCALL|ARGS_SIMPLE>
0006 swap
0007 topn 1
0009 swap
0010 setconstant :C
0012 leave
```
Note that if expr is not a module/class, then a TypeError is not
raised until after the evaluation of rhs. This is because that
error is raised by setconstant. If we wanted to raise TypeError
before evaluation of rhs, we would have to add a VM instruction
for calling vm_check_if_namespace.
Changing assignment order for single assignments caused problems
in the multiple assignment code, revealing that the issue also
affected multiple assignment. Fix the multiple assignment code
so left-to-right evaluation also works for constant assignments.
Do some refactoring of the multiple assignment code to reduce
duplication after adding support for constants. Rename struct
masgn_attrasgn to masgn_lhs_node, since it now handles both
constants and attributes. Add add_masgn_lhs_node static function
for adding data for lhs attribute and constant setting.
Fixes [Bug #15928]
This `NODE` type was used in pre-YARV implementation, to improve
the performance of assignment to dynamic local variable defined at
the innermost scope. It has no longer any actual difference with
`NODE_DASGN`, except for the node dump.
* Use duparray when possible for argspush
ARGSPUSH is the node we see with a single value pushed to the end of a
splatted array. ARGSCAT is similar, but is used when multiple values are
being concatenated to the list.
Previously only ARGSCAT had an optimization where when all the values
were static it would use duparray instead of newarray to create the
intermediate array.
This commit adds similar behaviour for ARGSPUSH, using duparray instead
of putobject/newarray.
* Replace duparray with putobject before concatarray
When performing duparray/concatarray we know we'll never use the
intermediate array being created by duparray, so we should be able to
use it as a temporary object.
This avoids an extra array allocation for NODE_ARGSPUSH (ex. [*foo, 1])
and NODE_ARGSCAT (ex. [*foo, 1, 2]).
Dumped iseq binary can not have unnamed symbols/IDs, and ID 0 is
stored instead. As `struct rb_id_table` disallows ID 0, also for
the distinction, re-assign a new temporary ID based on the local
variable table index when loading from the binary, as well as the
parser.
The implementation of a local variable tables was represented as `ID*`,
but it was very hacky: the first element is not an ID but the size of
the table, and, the last element is (sometimes) a link to the next local
table only when the id tables are a linked list.
This change converts the hacky implementation to a normal struct.
This provides a significant speedup for symbol, true, false,
nil, and 0-9, class/module, and a small speedup in most other cases.
Speedups (using included benchmarks):
:symbol :: 60%
0-9 :: 50%
Class/Module :: 50%
nil/true/false :: 20%
integer :: 10%
[] :: 10%
"" :: 3%
One reason this approach is faster is it reduces the number of
VM instructions for each interpolated value.
Initial idea, approach, and benchmarks from Eric Wong. I applied
the same approach against the master branch, updating it to handle
the significant internal changes since this was first proposed 4
years ago (such as CALL_INFO/CALL_CACHE -> CALL_DATA). I also
expanded it to optimize true/false/nil/0-9/class/module, and added
handling of missing methods, refined methods, and RUBY_DEBUG.
This renames the tostring insn to anytostring, and adds an
objtostring insn that implements the optimization. This requires
making a few functions non-static, and adding some non-static
functions.
This disables 4 YJIT tests. Those tests should be reenabled after
YJIT optimizes the new objtostring insn.
Implements [Feature #13715]
Co-authored-by: Eric Wong <e@80x24.org>
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
Co-authored-by: Yusuke Endoh <mame@ruby-lang.org>
Co-authored-by: Koichi Sasada <ko1@atdot.net>
Compare with the C methods, A built-in methods written in Ruby is
slower if only mandatory parameters are given because it needs to
check the argumens and fill default values for optional and keyword
parameters (C methods can check the number of parameters with `argc`,
so there are no overhead). Passing mandatory arguments are common
(optional arguments are exceptional, in many cases) so it is important
to provide the fast path for such common cases.
`Primitive.mandatory_only?` is a special builtin function used with
`if` expression like that:
```ruby
def self.at(time, subsec = false, unit = :microsecond, in: nil)
if Primitive.mandatory_only?
Primitive.time_s_at1(time)
else
Primitive.time_s_at(time, subsec, unit, Primitive.arg!(:in))
end
end
```
and it makes two ISeq,
```
def self.at(time, subsec = false, unit = :microsecond, in: nil)
Primitive.time_s_at(time, subsec, unit, Primitive.arg!(:in))
end
def self.at(time)
Primitive.time_s_at1(time)
end
```
and (2) is pointed by (1). Note that `Primitive.mandatory_only?`
should be used only in a condition of an `if` statement and the
`if` statement should be equal to the methdo body (you can not
put any expression before and after the `if` statement).
A method entry with `mandatory_only?` (`Time.at` on the above case)
is marked as `iseq_overload`. When the method will be dispatch only
with mandatory arguments (`Time.at(0)` for example), make another
method entry with ISeq (2) as mandatory only method entry and it
will be cached in an inline method cache.
The idea is similar discussed in https://bugs.ruby-lang.org/issues/16254
but it only checks mandatory parameters or more, because many cases
only mandatory parameters are given. If we find other cases (optional
or keyword parameters are used frequently and it hurts performance),
we can extend the feature.
`RubyVM.keep_script_lines` enables to keep script lines
for each ISeq and AST. This feature is for debugger/REPL
support.
```ruby
RubyVM.keep_script_lines = true
RubyVM::keep_script_lines = true
eval("def foo = nil\ndef bar = nil")
pp RubyVM::InstructionSequence.of(method(:foo)).script_lines
```
Since opt_getinlinecache and opt_setinlinecache point to the same cache
struct, there is no need to track the index of the get instruction and
then store it on the cache struct later when processing the set
instruction. Setting it when processing the get instruction works just
as well.
This change reduces our diff.
Make sure `opt_getinlinecache` is in a block all on its own, and
invalidate it from the interpreter when `opt_setinlinecache`.
It will recompile with a filled cache the second time around.
This lets YJIT runs well when the IC for constant is cold.