Before this commit, we were mixing a lot of concerns with the prism
compile between RubyVM::InstructionSequence and the general entry
points to the prism parser/compiler.
This commit makes all of the various prism-related APIs mirror
their corresponding APIs in the existing parser/compiler. This means
we now have the correct frame naming, and it's much easier to follow
where the logic actually flows. Furthermore this consolidates a lot
of the prism initialization, making it easier to see where we could
potentially be raising errors.
Previously, this would use newarray followed by concattoarray.
This now uses pushtoarray instead, avoiding the unnecessary
array allocation.
This is implemented by making compile_array take a first_chunk
argument, passing in 1 in the normal array case, and 0 in the
ARGSCAT with LIST body case.
newarray, duparray, concatarray, and splatarray always leave an
array at the top of the stack. expandarray does not, it takes
an array from the top of the stack as input, and leaves individual
elements on the stack. I assume no Ruby code generates the
expandarray/splatarray sequence, or it could break. The only
use of expandarray outside the peephole optimizer is in the
masgn code, and it does not appear to generate splatarray
directly after expandarray.
The splatarray/splatarray peephole optimization is probably
also wrong in the following case:
```
putobject [1,2]
splatarray false
splatarray true
```
This instruction sequence should result in a duplicate of
[1,2] at the top of the stack, but the peephole optimizer would
remove the `splatarray true`, resulting in change that made
[1,2] on top of the stack. I'm not sure Ruby code can generate
`splatarray false` followed by `splatarray true` (I could get it
to generate chains of `splatarray true`), so maybe this has no
effect.
newarray, duparray, and concatarray all result in newly allocated
arrays at the top of the stack, so they shouldn't have an issue
with removing either `splatarray true` or `splatarray false`.
Given code such as:
```ruby
h[*a, :a], h[*b] = v
```
Ruby would previously allocate 5 arrays for the mass assignment:
* splatarray true for a
* newarray for v[0]
* concatarray for [*a, a] and v[0]
* newarray for v[1]
* concatarray for b and v[1]
This optimizes it to only allocate 2 arrays:
* splatarray true for a
* splatarray true for b
Instead of the newarray/concatarray combination, pushtoarray is used.
Note above that there was no splatarray true for b originally. The
normal compilation uses splatarray false for b. Instead of trying
to find and modify the splatarray false to splatarray true, this
adds splatarray true for b, which requires a couple of swap
instructions, before the pushtoarray. This could be further
optimized to remove the need for those three instructions, but I'm
not sure if the complexity is worth it.
Additionally, this sets VM_CALL_ARGS_SPLAT_MUT on the call to
[]= in the h[*b] case, so that if []= has a splat parameter, the
new array can be used directly, without additional duplication.
Given code such as:
```ruby
h[*a, 1] += 1
h[*b] += 2
```
Ruby would previously allocate 5 arrays:
* splatarray true for a
* newarray for 1
* concatarray for [*a, 1] and [1]
* newarray for 2
* concatarray for b and [2]
This optimizes it to only allocate 2 arrays:
* splatarray true for a
* splatarray true for b
Instead of the newarray/concatarray combination, pushtoarray is used.
Note above that there was no splatarray true for b originally. The
normal compilation uses splatarray false for b. Instead of trying
to find and modify the splatarray false to splatarray true, this
adds splatarray true for b, which requires a couple of swap
instructions, before the pushtoarray. This could be further
optimized to remove the need for those three instructions, but I'm
not sure if the complexity is worth it.
Additionally, this sets VM_CALL_ARGS_SPLAT_MUT on the call to
[]= in the h[*b] case, so that if []= has a splat parameter, the
new array can be used directly, without additional duplication.
Previously, a literal array with a splat and any other args resulted in
more than one array allocation:
```ruby
[1, *a]
[*a, 1]
[*a, *a]
[*a, 1, 2]
[*a, a]
[*a, 1, *a]
[*a, 1, a]
[*a, a, a]
[*a, a, *a]
[*a, 1, *a, 1]
[*a, 1, *a, *a]
[*a, a, *a, a]
```
This is because previously Ruby would use newarray and concatarray
to create the array, which both each allocate an array internally.
This changes the compilation to use concattoarray and pushtoarray,
which do not allocate arrays. It also updates the peephole optimizer
to optimize the duparray/concattoarray sequence to
putobject/concattoarray, mirroring the existing duparray/concatarray
optimization.
These changes reduce the array allocations for the above examples to
a single array allocation, except for:
```
[*a, 1, a]
[*a, a, a]
```
The reason for this is because optimizing this case to only allocate
1 array requires changes to compile_array, which would currently
conflict with an unmerged pull request (#9721). After that pull
request is merged, it should be possible to refactor things to only
allocate a 1 array for all literal arrays (or 2 for arrays with
keyword splats).
To avoid stack overflow, Ruby splits compilation of large arrays
into smaller arrays, and concatenates the small arrays together.
It previously used newarray/concatarray for this, which is
inefficient. This switches the compilation to use pushtoarray,
which is much faster. This makes almost all literal arrays only
allocate a single array.
For cases where there is a large amount of static values in the
array, Ruby will statically compile subarrays, and previously
added them using concatarray. This switches to concattoarray,
avoiding an array allocation for the append.
Keyword splats are also supported in arrays, and ignored if the
keyword splat is empty. Previously, this used newarraykwsplat and
concatarray. This still uses newarraykwsplat, but switches to
concattoarray to save an allocation. So large arrays with keyword
splats can allocate 2 arrays instead of 1.
Previously, for the following array sizes (assuming local variable
access for each element), Ruby allocated the following number of
arrays:
1000 elements: 7 arrays
10000 elements: 79 arrays
100000 elements: 781 arrays
With these changes, only a single array is allocated (or 2 for a
large array with a keyword splat.
Results using the included benchmark:
```
array_1000
miniruby: 34770.0 i/s
./miniruby-before: 10511.7 i/s - 3.31x slower
array_10000
miniruby: 4938.8 i/s
./miniruby-before: 483.8 i/s - 10.21x slower
array_100000
miniruby: 727.2 i/s
./miniruby-before: 4.1 i/s - 176.98x slower
```
Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
`__ENCODING__ `was managed by `NODE_LIT` with Encoding object.
Introduce `NODE_ENCODING` for
1. `__ENCODING__` is detectable from AST Node.
2. Reduce dependency Ruby object for parse.y
For zsuper calls with a keyword splat but no actual keywords, the
keyword splat is passed directly, so it cannot be mutable, because
if the callee accepts a keyword splat, changes to the keyword splat
by the callee would be reflected in the caller.
While here, simplify the logic when the method supports
literal keywords. I don't think it is possible for
a method with has_kw param flags to not have keywords, so add an
assertion for that, and set VM_CALL_KW_SPLAT_MUT in a single place.
Ruby makes it easy to delegate all arguments from one method to another:
```ruby
def f(*args, **kw)
g(*args, **kw)
end
```
Unfortunately, this indirection decreases performance. One reason it
decreases performance is that this allocates an array and a hash per
call to `f`, even if `args` and `kw` are not modified.
Due to Ruby's ability to modify almost anything at runtime, it's
difficult to avoid the array allocation in the general case. For
example, it's not safe to avoid the allocation in a case like this:
```ruby
def f(*args, **kw)
foo(bar)
g(*args, **kw)
end
```
Because `foo` may be `eval` and `bar` may be a string referencing `args`
or `kw`.
To fix this correctly, you need to perform something similar to escape
analysis on the variables. However, there is a case where you can
avoid the allocation without doing escape analysis, and that is when
the splat variables are anonymous:
```ruby
def f(*, **)
g(*, **)
end
```
When splat variables are anonymous, it is not possible to reference
them directly, it is only possible to use them as splats to other
methods. Since that is the case, if `f` is called with a regular
splat and a keyword splat, it can pass the arguments directly to
`g` without copying them, avoiding allocation. For example:
```ruby
def g(a, b:)
a + b
end
def f(*, **)
g(*, **)
end
a = [1]
kw = {b: 2}
f(*a, **kw)
```
I call this technique: Allocationless Anonymous Splat Forwarding.
This is implemented using a couple additional iseq param flags,
anon_rest and anon_kwrest. If anon_rest is set, and an array splat
is passed when calling the method when the array splat can be used
without modification, `setup_parameters_complex` does not duplicate
it. Similarly, if anon_kwest is set, and a keyword splat is passed
when calling the method, `setup_parameters_complex` does not
duplicate it.
This instruction is similar to concattoarray, but it takes the
number of arguments to push to the array, removes that number
of arguments from the stack, and adds them to the array now at
the top of the stack.
This allows `f(*a, 1)` to allocate only a single array on the
caller side (which can be reused on the callee side in the case of
`def f(*a)`). Prior to this commit, `f(*a, 1)` would generate
3 arrays:
* a dupped by splatarray true
* 1 wrapped in array by newarray
* a dupped again by concatarray
Instructions Before for `a = []; f(*a, 1)`:
```
0000 newarray 0 ( 1)[Li]
0002 setlocal_WC_0 a@0
0004 putself
0005 getlocal_WC_0 a@0
0007 splatarray true
0009 putobject_INT2FIX_1_
0010 newarray 1
0012 concatarray
0013 opt_send_without_block <calldata!mid:f, argc:1, ARGS_SPLAT|FCALL>
0015 leave
```
Instructions After for `a = []; f(*a, 1)`:
```
0000 newarray 0 ( 1)[Li]
0002 setlocal_WC_0 a@0
0004 putself
0005 getlocal_WC_0 a@0
0007 splatarray true
0009 putobject_INT2FIX_1_
0010 pushtoarray 1
0012 opt_send_without_block <calldata!mid:f, argc:1, ARGS_SPLAT|ARGS_SPLAT_MUT|FCALL>
0014 leave
```
With these changes, method calls to Ruby methods should
implicitly allocate at most one array.
Ignore typeprof bundled gem failure due to unrecognized instruction.
This instruction is similar to concatarray, but assumes the first
object is already an array, and appends to it directly. This is
different than concatarray, which will create a new array instead
of appending to an existing array.
Additionally, for both concatarray and concattoarray, if the second
argument cannot be converted to an array, then just push it onto
the array, instead of creating a new array to wrap it, and then
using concat array. This saves an array allocation in that case.
This allows `f(*a, *a, *1)` to allocate only a single array on the
caller side (which can be reused on the callee side in the case of
`def f(*a)`). Prior to this commit, `f(*a, *a, *1)` would generate
4 arrays:
* a dupped by splatarray true
* a dupped again by first concatarray
* 1 wrapped in array by third splatarray
* result of [*a, *a] dupped by second concatarray
Instructions Before for `a = []; f(*a, *a, *1)`:
```
0000 newarray 0 ( 1)[Li]
0002 setlocal_WC_0 a@0
0004 putself
0005 getlocal_WC_0 a@0
0007 splatarray true
0009 getlocal_WC_0 a@0
0011 splatarray false
0013 concatarray
0014 putobject_INT2FIX_1_
0015 splatarray false
0017 concatarray
0018 opt_send_without_block <calldata!mid:g, argc:1, ARGS_SPLAT|ARGS_SPLAT_MUT|FCALL>
0020 leave
```
Instructions After for `a = []; f(*a, *a, *1)`:
```
0000 newarray 0 ( 1)[Li]
0002 setlocal_WC_0 a@0
0004 putself
0005 getlocal_WC_0 a@0
0007 splatarray true
0009 getlocal_WC_0 a@0
0011 concattoarray
0012 putobject_INT2FIX_1_
0013 concattoarray
0014 opt_send_without_block <calldata!mid:f, argc:1, ARGS_SPLAT|ARGS_SPLAT_MUT|FCALL>
0016 leave
```
This flag is set when the caller has already created a new array to
handle a splat, such as for `f(*a, b)` and `f(*a, *b)`. Previously,
if `f` was defined as `def f(*a)`, these calls would create an extra
array on the callee side, instead of using the new array created
by the caller.
This modifies `setup_args_core` to set the flag whenver it would add
a `splatarray true` instruction. However, when `splatarray true` is
changed to `splatarray false` in the peephole optimizer, to avoid
unnecessary allocations on the caller side, the flag must be removed.
Add `optimize_args_splat_no_copy` and have the peephole optimizer call
that. This significantly simplifies the related peephole optimizer
code.
On the callee side, in `setup_parameters_complex`, set
`args->rest_dupped` to true if the flag is set.
This takes a similar approach for optimizing regular splats that was
previiously used for keyword splats in
d2c41b1bff (via VM_CALL_KW_SPLAT_MUT).
The order of iseq may differ from the order of tokens, typically
`while`/`until` conditions are put after the body.
These orders can match by using line numbers as builtin-indexes, but
at the same time, it introduces the restriction that multiple `cexpr!`
and `cstmt!` cannot appear in the same line.
Another possible idea is to use `RubyVM::AbstractSyntaxTree` and
`node_id` instead of ripper, with making BASERUBY 3.1 or later.
to BUILTIN_ATTR_SINGLE_NOARG_LEAF
The attribute was created when the other attribute was called BUILTIN_ATTR_INLINE.
Now that the original attribute is renamed to BUILTIN_ATTR_LEAF, it's
only confusing that we call it "_INLINE".
nil is treated similarly to the empty hash in this case, passing
no keywords and not calling any conversion methods.
Fixes [Bug #20064]
Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
Previously, it used "expression", as that was the default. However,
op asgn expressions to constants use the NODE_OP_CDECL, so recognize
that node type as assignement.
Fixes [Bug #20111]
`:sym` was managed by `NODE_LIT` with `Symbol` object.
This commit introduces `NODE_SYM` so that
1. Symbol literal is detectable from AST Node
2. Reduce dependency on ruby object
parse.y converted NODE_STR when the string is hash key like
```
h1 = {"str1" => 1}
m1("str2" => 2)
m2({"str3" => 3})
```
This commit stop the conversion.
`static_literal_node_p` needs to know the node is for hash key or not
for the optimization.
When hash keys are duplicated, e.g. `h = {k: 1, l: 2, k: 3}`,
parser changes node structure for correct compilation.
This generates tricky AST. This commit removes AST manipulation
from parser to keep AST structure simple.
`__FILE__` was managed by `NODE_STR` with `String` object.
This commit introduces `NODE_FILE` and `struct rb_parser_string` so that
1. `__FILE__` is detectable from AST Node
2. Reduce dependency ruby object
`__LINE__` was managed by `NODE_LIT` with `Integer` object.
This commit introduces `NODE_LINE` so that
1. `__LINE__` is detectable from AST Node
2. Reduce dependency ruby object
When passing the keyword splat to [], it cannot be mutable, because
mutating the keyword splat inside [] would result in changes to the
keyword splat passed to []=.
Examples of such calls:
```ruby
obj[kw: 1] += fo
obj[**kw] &&= bar
```
Before this patch, literal keywords would segfault in the compiler,
and keyword splat usage would result in TypeError.
This handles all cases I can think of:
* literal keywords
* keyword splats
* combined with positional arguments
* combined with regular splats
* both with and without blocks
* both popped and non-popped cases
This also makes sure that to_hash is only called once on the keyword
splat argument, instead of twice, and make sure it is called before
calling to_proc on a passed block.
Fixes [Bug #20051]
Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
Previously, block.to_proc was called first, by vm_caller_setup_arg_block.
kw.to_hash was called later inside CALLER_SETUP_ARG or setup_parameters_complex.
This adds a splatkw instruction that is inserted before sends with
ARGS_BLOCKARG and KW_SPLAT and without KW_SPLAT_MUT. This is not needed in the
KW_SPLAT_MUT case, because then you know the value is a hash, and you don't
need to call to_hash on it.
The splatkw instruction checks whether the second to top block is a hash,
and if not, replaces it with the value of calling to_hash on it (using
rb_to_hash_type). As it is always before a send with ARGS_BLOCKARG and
KW_SPLAT, second to top is the keyword splat, and top is the passed block.
These are similar to the f(1, *a, &lvar), f(*a, **kw, &lvar) and
f(*a, kw: 1, &lvar) optimizations, but they use getblockparamproxy
instruction instead of getlocal.
This also fixes the else style to be more similar to the surrounding
code.
In cases where the compiler can detect the hash is static, it would
use duphash for the hash part. As the hash is static, there is no need
to allocate an array.
The compiler already eliminates the array allocation for
f(*a, &lvar) and f(*a, &@iv). If that is safe, then eliminating
it for f(*a, **lvar) and f(*a, **@iv) as the last commit did is
as safe, and eliminating it for f(*a, **lvar, &lvar) and
f(*a, **@iv, &@iv) is also as safe.
The compiler already eliminates the array allocation for
f(*a, &lvar) and f(*a, &@iv), and eliminating the array allocation
for keyword splat is as safe as eliminating it for block passes.
Due to how the compiler works, while f(*a, &lvar) and f(*a, &@iv)
do not allocate an array, but f(1, *a, &lvar) and f(1, *a, &@iv)
do. It's probably possible to fix this in the compiler, but
seems easiest to fix this in the peephole optimizer.
Eliminating this array allocation is as safe as the current
elimination of the array allocation for f(*a, &lvar) and
f(*a, &@iv).
Due to how the compiler works, while f(*a) does not allocate an
array f(1, *a) does. This is possible to fix in the compiler, but
the change is much more complex. This attempts to fix the issue
in a simpler way using the peephole optimizer.
Eliminating this array allocation is safe, since just as in the
f(*a) case, nothing else on the caller side can modify the array.
The operands in each instruction needs to be pinned because if
auto-compaction runs in iseq_set_sequence, then the objects could exist
on the generated_iseq buffer, which would not be reference updated which
can lead to T_MOVED (and subsequently T_NONE) objects on the iseq.
The function iseq_set_exception_table allocates memory which can cause
a GC compaction to run. Since catch_table_ary is not on the stack, it
can be moved, which would make tptr incorrect.
- Unless `sizeof(BDIGIT) == 4`, (8-byte integer not available), the
size to be loaded was wrong.
- Since `BDIGIT`s are dumped as raw binary, the loaded byte order was
inverted unless little-endian.