Previously, PosMarker callbacks ran even when the assembler failed to
assemble its contents due to insufficient space. This was problematic
because when Assembler::compile() failed, the callbacks were given
positions that have no valid code, contrary to general expectation.
For example, we use a PosMarker callback to record VM instruction
boundaries and patch in jumps to exits in case the guest program starts
tracing, however, previously, we could record a location near the end of
the code block, where there is no space to patch in jumps. I suspect
this is the cause of the recent occurrences of rare random failures on
GitHub Actions with the invariants.rs:529 "can rewrite existing code"
message. `--yjit-perf` also uses PosMarker and had a similar issue.
Buffer the list of callbacks to fire, and only fire them when all code
in the assembler are written out successfully. It's more intuitive this
way.
Right now the `rb_shape_get_next` shape caller need to
first check if there is capacity left, and if not call
`rb_shape_transition_shape_capa` before it can call `rb_shape_get_next`.
And on each of these it needs to checks if we got a TOO_COMPLEX
back.
All this logic is duplicated in the interpreter, YJIT and RJIT.
Instead we can have `rb_shape_get_next` do the capacity transition
when needed. The caller can compare the old and new shapes capacity
to know if resizing is needed. It also can check for TOO_COMPLEX
only once.
We still need to do `jit.record_boundary_patch_point = false`
when gen_outlined_exit() returns `None` and we return with `?`.
Previously, we tripped the assert at codegen.rs:1042.
Found with `--yjit-exec-mem-size=3` on the lobsters benchmark.
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
We've long had a size restriction on the code memory region such that a
u32 could refer to everything. This commit capitalizes on this
restriction by shrinking the size of `CodePtr` to be 4 bytes from 8.
To derive a full raw pointer from a `CodePtr`, one needs a base pointer.
Both `CodeBlock` and `VirtualMemory` can be used for this purpose. The
base pointer is readily available everywhere, except for in the case of
the `jit_return` "branch". Generalize lea_label() to lea_jump_target()
in the IR to delay deriving the `jit_return` address until `compile()`,
when the base pointer is available.
On railsbench, this yields roughly a 1% reduction to `yjit_alloc_size`
(58,397,765 to 57,742,248).
If the VM ran out of shape, `rb_shape_transition_shape_capa` might
return `OBJ_TOO_COMPLEX_SHAPE`.
Co-authored-by: Jean Boussier <byroot@ruby-lang.org>
* YJIT: implement two-step call threshold
Automatically switch call threshold to a larger value for
larger, production-sized apps, while still allowing smaller apps
and command-line programs to start with a lower threshold.
* Update yjit/src/options.rs
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
* Make the new variables constants
* Check that a custom call threshold was not specified
---------
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
So that we get a reminder to check CodeBlock::has_dropped_bytes().
Internally, asm.compile() already checks it, and this patch just
propagates it out to the caller with a `#[must_use]`.
Code GC logic moved out one level in entry_stub_hit(), so the body
can freely use `?`
It's an estimator for application size and could be used as a
compilation heuristic later.
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
Previously, the version-controlled `cruby_bindings.inc.rs` file
contained the build-time artifact `id.h`, which nobu mentioned hinders
the goal of having fewer magic numbers in the repository.
Lookup the IDs YJIT needs on boot. It costs cycles, but it's fine since
YJIT only uses a handful of IDs at the moment. No perceptible
degradation to boot time found in my testing.
Previously, for block argument callsites with some specific argument
count and callee local variable count combinations, YJIT ended up
writing over arguments that are supposed to be collected into a rest
parameter array unmodified.
Detect when clobbering would happen and avoid it. Also, place the block
handler after the stack overflow check, since it writes to new stack
space.
Reported-by: Takashi Kokubun <takashikkbn@gmail.com>
* Port call threshold logic from Rust to C for performance
* Prefix global/field names with yjit_
* Fix linker error
* Fix preprocessor condition for rb_yjit_threshold_hit
* Fix third linker issue
* Exclude yjit_calls_at_interv from RJIT bindgen
---------
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
Given `SHAPE_MAX_NUM_IVS 80`, we transition to TOO_COMPLEX
way before we could overflow a 8bit counter.
This reduce the size of `rb_shape_t` from 32B to 24B.
If we decide to raise `SHAPE_MAX_NUM_IVS` we can always increase
that type again.
Previously, at the end of `leave` we did
`*caller_cfp->sp = return_value`, like the interpreter.
With future changes that leaves the SP field uninitialized for C frames,
this will become problematic. For cases like returning from
`rb_funcall()`, the return value was written above the stack and
never read anyway (callers use the copy in the return register).
Leave the return value in a register at the end of `leave` and have the
code at `cfp->jit_return` decide what to do with it. This avoids the
unnecessary memory write mentioned above. For JIT-to-JIT returns, it goes
through `asm.stack_push()` and benefits from register allocation for
stack temporaries.
Mostly flat on benchmarks, with maybe some marginal speed improvements.
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
* YJIT: Avoid creating a vector in get_temp_regs()
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
* Remove unused import
---------
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
Since the compile-time iseq used in the guard was not marked and updated
during compaction, a runtime value reusing the address could falsely pass
the guard.
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
* YJIT: Skip Insn::Comment and format!
if disasm is disabled
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
* YJIT: Get rid of asm.comment
---------
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
Previously, TestStack#test_machine_stack_size failed pretty consistently
on ARM64 macOS, with Rust code and part of the interpreter used for
per-instruction fallback (rb_vm_invokeblock() and friends) touching the
stack guard page and crashing with SEGV. I've also seen the same test
fail on x64 Linux, though with a different symptom.
Previously, Kernel#lambda returned a non-lambda proc when given a
non-literal block and issued a warning under the `:deprecated` category.
With this change, Kernel#lambda will always return a lambda proc, if it
returns without raising.
Due to interactions with block passing optimizations, we previously had
two separate code paths for detecting whether Kernel#lambda got a
literal block. This change allows us to remove one path, the hack done
with rb_control_frame_t::block_code introduced in 85a337f for supporting
situations where Kernel#lambda returned a non-lambda proc.
[Feature #19777]
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
* Add getbyte JIT implementation
Adds an implementation for String#getbyte for YJIT, along with a
bootstrap test. This should be helpful for pure Ruby implementations
and to avoid unneeded allocations.
Co-authored-by: John Hawthorn <jhawthorn@github.com>
* Skip the getbyte test for RJIT for now
---------
Co-authored-by: John Hawthorn <jhawthorn@github.com>
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
* Remove function call for String#bytesize
String size is stored in a consistent location, so we can eliminate the
function call.
* Update yjit/src/codegen.rs
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
---------
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
* YJIT: Make compiled_* stats available by default
* Update comment about default counters [ci skip]
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
---------
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
New Clippy lint in 1.72.0 is breaking our build as GitHub has updated
their image. No point hearing about lints from generated code we don't
manually write.
These types are essentially claims about what `RBASIC_CLASS(obj)`
returns. The field changes with singleton class creation, but we didn't
consider so previously and elided guards where we actually needed them.
Found running ruby/spec with --yjit-verify-ctx. The assertion interface
makes extensive use of singleton classes.
Rack uses this. Speculate that the `obj` in `the_call(&obj)`
will be a proc when the compile-time sample is a proc.
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
* YJIT: implement fast path for integer multiplication in opt_mult
* Update yjit/src/codegen.rs
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
* Implement mul with overflow checking on arm64
* Fix missing semicolon
* Add arm splitting for lshift, rshift, urshift
---------
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
We should return false for this type of special methods but wasn't
previously. Was reproducible with:
make test-all TESTS=../test/-ext-/test_notimplement.rb RUN_OPTS='--yjit-call-threshold=1'
* YJIT: Fix splatting empty array with rest param
* YJIT: Rework optional parameter handling to fix corner case
The old code had a few unintuitive parts. The starting PC of the callee
was set in different places; `num_param`, which one would assume to be
static for a particular callee seemingly tallied to different amounts
depending on the what the caller passed; `opts_filled_with_splat` was
greater than zero even when the opts were not filled by items in the
splat array. Functionally, the bits that lets the callee know which
keyword parameters are unspecified were not passed properly when there
are optional parameters and a rest parameter, and then optional
parameters are all filled.
Make `num_param` non-mut and use parameter information in the callee
iseq as-is. Move local variable nil fill and placing of the rest array
out of `gen_push_frame()` as they are only ever relevant for iseq calls.
Always place the rest array at `lead_num + opt_num` to fix the
previously buggy situation.
* YJIT: Compile splat calls to iseqs with rest params
Test interactions with optional parameters.
* YJIT: handle expandarray_rhs_too_small case
YJIT: fix csel bug in x86 backend, add test
Remove commented out lines
Refactor expandarray to use chain guards
Propagate Type::Nil when known
Update yjit/src/codegen.rs
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
* Add missing counter, use get_array_ptr() in expandarray
* Make change suggested by Kokubun to reuse loop
---------
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
Followup: https://github.com/ruby/ruby/pull/8152
If the receiver is a T_MODULE or T_CLASS and has a lot of
ivars, `get_next_shape_internal` will return `NULL`.
Co-authored-by: Jean Boussier <byroot@ruby-lang.org>
YJIT: implement missing jg instruction in backend
While trying to implement a specialize integer left shift, I ran
into a problem where we have no way to do a greater-than comparison
at the moment. Surprising we went this far without ever needing it.
The rest of the counters are prefixed with `gbpp_` and that's what
`yjit.rb` uses when printing the summary. This counter wasn't included
in the summary.
Remove rb_control_frame_t::__bp__ and optimize bmethod calls
This commit removes the __bp__ field from rb_control_frame_t. It was
introduced to help MJIT, but since MJIT was replaced by RJIT, we can use
vm_base_ptr() to compute it from the SP of the previous control frame
instead. Removing the field avoids needing to set it up when pushing new
frames.
Simply removing __bp__ would cause crashes since RJIT and YJIT used a
slightly different stack layout for bmethod calls than the interpreter.
At the moment of the call, the two layouts looked as follows:
┌────────────┐ ┌────────────┐
│ frame_base │ │ frame_base │
├────────────┤ ├────────────┤
│ ... │ │ ... │
├────────────┤ ├────────────┤
│ args │ │ args │
├────────────┤ └────────────┘<─prev_frame_sp
│ receiver │
prev_frame_sp─>└────────────┘
RJIT & YJIT interpreter
Essentially, vm_base_ptr() needs to compute the address to frame_base
given prev_frame_sp in the diagrams. The presence of the receiver
created an off-by-one situation.
Make the interpreter use the layout the JITs use for iseq-to-iseq
bmethod calls. Doing so removes unnecessary argument shifting and
vm_exec_core() re-entry from the interpreter, yielding a speed
improvement visible through `benchmark/vm_defined_method.yml`:
patched: 7578743.1 i/s
master: 4796596.3 i/s - 1.58x slower
C-to-iseq bmethod calls now store one more VALUE than before, but that
should have negligible impact on overall performance.
Note that re-entering vm_exec_core() used to be necessary for firing
TracePoint events, but that's no longer the case since
9121e57a5f.
Closesruby/ruby#6428
Closes [Feature #19729]
Previously 2 bits of the flags on each RVALUE are reserved to store the
number of GC cycles that each object has survived. This commit
introduces a new bit array on the heap page, called age_bits, to store
that information instead.
This patch still reserves one of the age bits in the flags (the old
FL_PROMOTED0 bit, now renamed FL_PROMOTED).
This is set to 0 for young objects and 1 for old objects, and is used as
a performance optimisation for the write barrier. Fetching the age_bits
from the heap page and doing the required math to calculate if the
object was old or not would slow down the write barrier. So we keep this
bit synced in the flags for fast access.
* Revert "Revert "YJIT: Break register cycles for C arguments (#7918)""
This reverts commit 78ca085785.
* Use shfited_live_ranges for the last-insn check
* YJIT: Fix autosplat miscomp for blocks with optionals
When passing an array as the sole argument to `yield`, and the yieldee
takes more than 1 optional parameter, the array is expanded similar
to `*array` splat calls. This is called "autosplat" in
`setup_parameters_complex()`.
Previously, YJIT did not detect this autosplat condition. It passed the
array without expanding it, deviating from interpreter behavior.
Detect this conditon and refuse to compile it.
Fixes: Shopify/yjit#313
* RJIT: Fix autosplat miscomp for blocks with optionals
This is mirrors the same issue as YJIT. See previous commit.
* Make TAINT and UNTRUSTED flags zero
These flags do nothing already, and should break nothing.
* Remove TAINT and UNTRUSTED macros same as functions
These macros had been defined to use with `#ifdef`, but should not be
used anymore.
`IO#reopen` is very special in that it is able to change the class and
singleton class of IO instances. In its presence, it is not correct to
assume that IO instances has a stable class/singleton class and guard
by comparing identity.
* Unify length field for embedded and heap strings
The length field is of the same type and position in RString for both
embedded and heap allocated strings, so we can unify it.
* Remove RSTRING_EMBED_LEN
This reverts commit 10621f7cb9.
This was reverted because the gc integrity build started failing. We
have figured out a fix so I'm reopening the PR.
Original commit message:
Fix cvar caching when class is cloned
The class variable cache that was added in
ruby#4544 changed the behavior of class
variables on cloned classes. As reported when a class is cloned AND a
class variable was set, and the class variable was read from the
original class, reading a class variable from the cloned class would
return the value from the original class.
This was happening because the IC (inline cache) is stored on the ISEQ
which is shared between the original and cloned class, therefore they
share the cache too.
To fix this we are now storing the `cref` in the cache so that we can
check if it's equal to the current `cref`. If it's different we don't
want to read from the cache. If it's the same we do. Cloned classes
don't share the same cref with their original class.
This will need to be backported to 3.1 in addition to 3.2 since the bug
exists in both versions.
We also added a marking function which was missing.
Fixes [Bug #19379]
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
The class variable cache that was added in
https://github.com/ruby/ruby/pull/4544 changed the behavior of class
variables on cloned classes. As reported when a class is cloned AND a
class variable was set, and the class variable was read from the
original class, reading a class variable from the cloned class would
return the value from the original class.
This was happening because the IC (inline cache) is stored on the ISEQ
which is shared between the original and cloned class, therefore they
share the cache too.
To fix this we are now storing the `cref` in the cache so that we can
check if it's equal to the current `cref`. If it's different we don't
want to read from the cache. If it's the same we do. Cloned classes
don't share the same cref with their original class.
This will need to be backported to 3.1 in addition to 3.2 since the bug
exists in both versions.
We also added a marking function which was missing.
Fixes [Bug #19379]
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
Originally, when 2e7bceb34e fixed cfuncs to no
longer use the VM stack for large array splats, it was thought to have fully
fixed Bug #4040, since the issue was fixed for methods defined in Ruby (iseqs)
back in Ruby 2.2.
After additional research, I determined that same issue affects almost all
types of method calls, not just iseq and cfunc calls. There were two main
types of remaining issues, important cases (where large array splat should
work) and pedantic cases (where large array splat raised SystemStackError
instead of ArgumentError).
Important cases:
```ruby
define_method(:a){|*a|}
a(*1380888.times)
def b(*a); end
send(:b, *1380888.times)
:b.to_proc.call(self, *1380888.times)
def d; yield(*1380888.times) end
d(&method(:b))
def self.method_missing(*a); end
not_a_method(*1380888.times)
```
Pedantic cases:
```ruby
def a; end
a(*1380888.times)
def b(_); end
b(*1380888.times)
def c(_=nil); end
c(*1380888.times)
c = Class.new do
attr_accessor :a
alias b a=
end.new
c.a(*1380888.times)
c.b(*1380888.times)
c = Struct.new(:a) do
alias b a=
end.new
c.a(*1380888.times)
c.b(*1380888.times)
```
This patch fixes all usage of CALLER_SETUP_ARG with splatting a large
number of arguments, and required similar fixes to use a temporary
hidden array in three other cases where the VM would use the VM stack
for handling a large number of arguments. However, it is possible
there may be additional cases where splatting a large number
of arguments still causes a SystemStackError.
This has a measurable performance impact, as it requires additional
checks for a large number of arguments in many additional cases.
This change is fairly invasive, as there were many different VM
functions that needed to be modified to support this. To avoid
too much API change, I modified struct rb_calling_info to add a
heap_argv member for storing the array, so I would not have to
thread it through many functions. This struct is always stack
allocated, which helps ensure sure GC doesn't collect it early.
Because of how invasive the changes are, and how rarely large
arrays are actually splatted in Ruby code, the existing test/spec
suites are not great at testing for correct behavior. To try to
find and fix all issues, I tested this in CI with
VM_ARGC_STACK_MAX to -1, ensuring that a temporary array is used
for all array splat method calls. This was very helpful in
finding breaking cases, especially ones involving flagged keyword
hashes.
Fixes [Bug #4040]
Co-authored-by: Jimmy Miller <jimmy.miller@shopify.com>
This check was introduced to match an assertion in the C YJIT when this
was originally introduced. I don't believe it's necessary for
correctness of the generated code.
Co-authored-by: Adam Hess <HParker@github.com>
Co-authored-by: Daniel Colson <danieljamescolson@gmail.com>
Co-authored-by: Luan Vieira <luanzeba@github.com>
* YJIT: Avoid splitting mov for small values on arm64
* Fix a comment
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
* YJIT: Test the 0xffff boundary
---------
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
yjit-trace-exits appends a synthetic sample for the instruction being
exited, but we didn't increment the size of the stack. Fixing this count
correctly lets us successfully generate a flamegraph from the exits.
I also replaced the line number for instructions with 0, as I don't
think the previous value had meaning.
Co-authored-by: Adam Hess <HParker@github.com>
Previously, setinstancevariable could generate code that calls
`rb_ensure_iv_list_size()` without first updating `cfp->sp`. This means
in the event that a GC start from within said routine the top few
objects would not be marked, causing them to be falsly collected.
Call `jit_prepare_routine_call()` first.
[Bug #19601]
Add a sampling option to trace exits
Running YJIT with trace exits enabled can make very large metrics files.
This allows us to configure a sample rate to make tracing exits possible
on larger tests. This also updates the documented YJIT options.
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
Co-authored-by: John Hawthorn <john@hawthorn.email>
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
Previously we were missing a compile-time check that the known cfuncs
receive the correct number of arguments.
We noticied this because in particular when using ARGS_SPLAT, which also
wasn't checked, YJIT would crash on code which was otherwise correct
(didn't raise exceptions in the VM).
This still supports vararg (argc == -1) cfuncs. I added an additional
assertion that when we use the specialized codegen for one of these
known functions that the argc are popped off the stack correctly, which
should help ensure they're implemented correctly (previously the crash
was usually observed on a future `leave` insn).
[Bug #19595]
* YJIT: Let Assembler own Context
* Update a comment
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
---------
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
* YJIT: Stack temp register allocation for arm64
* Update a comment
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
* Update comments about assertion
* Update a comment
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
---------
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
The socket extensions rubysocket.h pulls in the "private" include/gc.h,
which now depends on vm_core.h. vm_core.h pulls in id.h
when tool/update-deps generates the dependencies for the makefiles, it
generates the line for id.h to be based on VPATH, which is configured in
the extconf.rb for each of the extensions. By default VPATH does not
include the actual source directory of the current Ruby so the
dependency fails to resolve and linking fails.
We need to append the topdir and top_srcdir to VPATH to have the
dependancy picked up correctly (and I believe we need both of these to
cope with in-tree and out-of-tree builds).
I copied this from the approach taken in
https://github.com/ruby/ruby/blob/master/ext/objspace/extconf.rb#L3
Remove !USE_RVARGC code
[Feature #19579]
The Variable Width Allocation feature was turned on by default in Ruby
3.2. Since then, we haven't received bug reports or backports to the
non-Variable Width Allocation code paths, so we assume that nobody is
using it. We also don't plan on maintaining the non-Variable Width
Allocation code, so we are going to remove it.
C function frames don't need to use the VM-specific pc field to run
properly. When pushing a control frame from output code, save one
instruction by leaving the field uninitialized.
Fix-up rb_vm_svar_lep(), which is used while setting local variables via
Regexp#=~. Use cfp->iseq as a secondary signal so it can stop assuming
that all CFUNC frames always have zero pc's.
Making overlapping `&mut`s triggers Undefined Bahavior. This function
previously had them through `cb` and `ocb` aliasing with `self` or live
references in the caller.
To fix the overlap, take `ocb` as a parameter and don't use `get_inline_cb()`
in the body of the function.
* YJIT: Add --yjit-pause and RubyVM::YJIT.resume
This allows booting YJIT in a suspended state. We chose to add a new
command line option as opposed to simply allowing YJIT.resume to work
without any command line option because it allows for combining with
YJIT tuning command line options. It also simpifies implementation.
Paired with Kokubun and Maxime.
* Update yjit.rb
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
---------
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
If the iseq only contains `opt_invokebuiltin_delegate_leave` insn and
the builtin-function (bf) is inline-able, the caller doesn't need to
build a method frame.
`vm_call_single_noarg_inline_builtin` is fast path for such cases.
We crashed in some edge cases due to the recent change to not compile
encoded iseqs that are larger than `u16::MAX`.
- Match the C signature of rb_yjit_constant_ic_update() and clamp down
to `IseqIdx` size
- Return failure instead of panicking with `unwrap()` in codegen when
the iseq is too large
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Noah Gibbs <noah.gibbs@shopify.com>
So by itself, this shouldn't have been a correctness issue, but we
also pop the stack for block_args. Doing stack manipulation like that
and then side-exiting causes issues. So, while this fixes the
immediate failure, we have a bigger issue with block_args popping and
then exiting that we need to deal with.
This reverts commit 5d0a1ffafa.
This commit is causing sequel in yjit-bench to raise with this stack trace:
```
sequel-5.64.0/lib/sequel/dataset/sql.rb:266:in `literal': wrong argument type Array (expected Proc) (TypeError)
from sequel-5.64.0/lib/sequel/database/misc.rb:269:in `literal'
from sequel-5.64.0/lib/sequel/adapters/shared/sqlite.rb:314:in `column_definition_default_sql'
from sequel-5.64.0/lib/sequel/database/schema_methods.rb:564:in `block in column_definition_sql'
from sequel-5.64.0/lib/sequel/database/schema_methods.rb:564:in `each'
from sequel-5.64.0/lib/sequel/database/schema_methods.rb:564:in `column_definition_sql'
from sequel-5.64.0/lib/sequel/database/schema_methods.rb:634:in `block in column_list_sql'
from sequel-5.64.0/lib/sequel/database/schema_methods.rb:634:in `map'
from sequel-5.64.0/lib/sequel/database/schema_methods.rb:634:in `column_list_sql'
from sequel-5.64.0/lib/sequel/database/schema_methods.rb:753:in `create_table_sql'
from sequel-5.64.0/lib/sequel/adapters/shared/sqlite.rb:348:in `create_table_sql'
from sequel-5.64.0/lib/sequel/database/schema_methods.rb:702:in `create_table_from_generator'
from sequel-5.64.0/lib/sequel/database/schema_methods.rb:203:in `create_table'
from benchmarks/sequel/benchmark.rb:19:in `<main>'
```
This assert would've caught a bug I wrote while developing
ruby/ruby#7443 so I figured it would be good to commit it
as it could be helpful in the future.
* YJIT: Rest and block_arg support
* Update bootstraptest/test_yjit.rb
---------
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
`Rc` and `RefCell` both incur runtime space costs.
In addition, `RefCell` has given us some headaches with the
non obvious borrow panics it likes to throw out. The latest
one started with 7fd53eeb46
and is yet to be resolved.
Since we already rely on the GC to properly reclaim memory for `Block`
and `Branch`, we might as well stop paying the overhead of `Rc` and
`RefCell`. The `RefCell` panics go away with this change, too.
On 25 iterations of `railsbench` with a stats build I got
`yjit_alloc_size: 8,386,129 => 7,348,637`, with the new memory size 87.6%
of the status quo. This makes the metadata and machine code size roughly
line up one-to-one.
The general idea here is to use `&` shared references with
[interior mutability][1] with `Cell`, which doesn't take any extra
space. The `noalias` requirement that `&mut` imposes is way too hard to
meet and verify. Imagine replacing places where we would've gotten
`BorrowError` from `RefCell` with Rust/LLVM miscompiling us due to aliasing
violations. With shared references, we don't have to think about subtle
cases like the GC _sometimes_ calling the mark callback while codegen
has an aliasing reference in a stack frame below. We mostly only need to
worry about liveness, with which the GC already helps.
There is now a clean split between blocks and branches that are not yet
fully constructed and ones that are "in-service", so to speak. Working
with `PendingBranch` and `JITState` don't really involve `unsafe` stuff.
This change allows `Branch` and `Block` to not have as many optional
fields as many of them are only optional during compilation. Fields that
change post-compilation are wrapped in `Cell` to facilitate mutation
through shared references.
I do some `unsafe` dances here. I've included just a couple tests to run
with Miri (`cargo +nightly miri test miri`). We can add more Miri tests
if desired.
[1]: https://doc.rust-lang.org/std/cell/struct.UnsafeCell.html
* Make EC required on JIT state
Lets make EC required on the JITState object so we don't need to
`unwrap` it.
* Minor nitpicks
---------
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>