Suppose YJIT runs a rb_vm_opt_send_without_block()
fallback and the control frame stack looks like:
```
will_tailcall_bar [FINISH]
caller_that_used_fallback
```
will_tailcall_bar() runs in the interpreter and sets up a tailcall.
Right before JIT_EXEC() in the `send` instruction, the stack will look like:
```
bar [FINISH]
caller_that_used_fallback
```
Previously, JIT_EXEC() ran bar() in JIT code, which caused the `FINISH`
flag to return to the interpreter instead of to the JIT code running
caller_that_used_fallback(), causing code to run twice and probably
crash. Recent flaky failures on CI about "each stub expects a particular
iseq" are probably due to leaving methods twice in
`test_optimizations.rb`.
Only run JIT code from the interpreter if a new frame is pushed.
Fix [Bug #20207]
Fix [Bug #20212]
Handling consecutive lookarounds in init_cache_opcodes is buggy, so it
causes invalid memory access reported in [Bug #20207] and [Bug #20212].
This fixes it by using recursive functions to detected lookarounds
nesting correctly.
In WebAssembly C ABI, the linear stack pointer must be always aligned
to 16 bytes like other archs.
The misaligned stack pointer causes some weird memory corruption since
compiler assumes the aligned stack pointer.
```
ruby: YJIT has panicked. More info to follow...
thread '<unnamed>' panicked at src/core.rs:2751:9:
assertion `left == right` failed: each stub expects a particular iseq
left: 0x7fc8d8e09850
right: 0x7fc8d2c2f3a0
stack backtrace:
0: rust_begin_unwind
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:645:5
1: core::panicking::panic_fmt
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/panicking.rs:72:14
2: core::panicking::assert_failed_inner
3: core::panicking::assert_failed
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/panicking.rs:279:5
4: yjit::core::branch_stub_hit_body
at /home/runner/work/ruby/ruby/src/yjit/src/core.rs:2751:9
5: yjit::core::branch_stub_hit::{{closure}}::{{closure}}
at /home/runner/work/ruby/ruby/src/yjit/src/core.rs:2696:36
6: yjit::stats::with_compile_time
at /home/runner/work/ruby/ruby/src/yjit/src/stats.rs:979:15
7: yjit::core::branch_stub_hit::{{closure}}
at /home/runner/work/ruby/ruby/src/yjit/src/core.rs:2696:13
8: std::panicking::try::do_call
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:552:40
9: __rust_try
10: std::panicking::try
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:516:19
11: std::panic::catch_unwind
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panic.rs:142:14
12: yjit::cruby::with_vm_lock
at /home/runner/work/ruby/ruby/src/yjit/src/cruby.rs:647:21
13: yjit::core::branch_stub_hit
at /home/runner/work/ruby/ruby/src/yjit/src/core.rs:2695:9
14: <unknown>
```
newarray, duparray, concatarray, and splatarray always leave an
array at the top of the stack. expandarray does not, it takes
an array from the top of the stack as input, and leaves individual
elements on the stack. I assume no Ruby code generates the
expandarray/splatarray sequence, or it could break. The only
use of expandarray outside the peephole optimizer is in the
masgn code, and it does not appear to generate splatarray
directly after expandarray.
The splatarray/splatarray peephole optimization is probably
also wrong in the following case:
```
putobject [1,2]
splatarray false
splatarray true
```
This instruction sequence should result in a duplicate of
[1,2] at the top of the stack, but the peephole optimizer would
remove the `splatarray true`, resulting in change that made
[1,2] on top of the stack. I'm not sure Ruby code can generate
`splatarray false` followed by `splatarray true` (I could get it
to generate chains of `splatarray true`), so maybe this has no
effect.
newarray, duparray, and concatarray all result in newly allocated
arrays at the top of the stack, so they shouldn't have an issue
with removing either `splatarray true` or `splatarray false`.
Given code such as:
```ruby
h[*a, :a], h[*b] = v
```
Ruby would previously allocate 5 arrays for the mass assignment:
* splatarray true for a
* newarray for v[0]
* concatarray for [*a, a] and v[0]
* newarray for v[1]
* concatarray for b and v[1]
This optimizes it to only allocate 2 arrays:
* splatarray true for a
* splatarray true for b
Instead of the newarray/concatarray combination, pushtoarray is used.
Note above that there was no splatarray true for b originally. The
normal compilation uses splatarray false for b. Instead of trying
to find and modify the splatarray false to splatarray true, this
adds splatarray true for b, which requires a couple of swap
instructions, before the pushtoarray. This could be further
optimized to remove the need for those three instructions, but I'm
not sure if the complexity is worth it.
Additionally, this sets VM_CALL_ARGS_SPLAT_MUT on the call to
[]= in the h[*b] case, so that if []= has a splat parameter, the
new array can be used directly, without additional duplication.
Given code such as:
```ruby
h[*a, 1] += 1
h[*b] += 2
```
Ruby would previously allocate 5 arrays:
* splatarray true for a
* newarray for 1
* concatarray for [*a, 1] and [1]
* newarray for 2
* concatarray for b and [2]
This optimizes it to only allocate 2 arrays:
* splatarray true for a
* splatarray true for b
Instead of the newarray/concatarray combination, pushtoarray is used.
Note above that there was no splatarray true for b originally. The
normal compilation uses splatarray false for b. Instead of trying
to find and modify the splatarray false to splatarray true, this
adds splatarray true for b, which requires a couple of swap
instructions, before the pushtoarray. This could be further
optimized to remove the need for those three instructions, but I'm
not sure if the complexity is worth it.
Additionally, this sets VM_CALL_ARGS_SPLAT_MUT on the call to
[]= in the h[*b] case, so that if []= has a splat parameter, the
new array can be used directly, without additional duplication.