to BUILTIN_ATTR_SINGLE_NOARG_LEAF
The attribute was created when the other attribute was called BUILTIN_ATTR_INLINE.
Now that the original attribute is renamed to BUILTIN_ATTR_LEAF, it's
only confusing that we call it "_INLINE".
* YJIT: expandarray for non-arrays
Co-authored-by: John Hawthorn <john@hawthorn.email>
* Skip the new test on RJIT
* Increment counter for to_ary exit
---------
Co-authored-by: John Hawthorn <john@hawthorn.email>
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
```
warning: unused import: `condition::Condition`
--> src/asm/arm64/arg/mod.rs:13:9
|
13 | pub use condition::Condition;
| ^^^^^^^^^^^^^^^^^^^^
|
= note: `#[warn(unused_imports)]` on by default
warning: unused import: `rb_yjit_fix_mul_fix as rb_fix_mul_fix`
--> src/cruby.rs:188:9
|
188 | pub use rb_yjit_fix_mul_fix as rb_fix_mul_fix;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
warning: unused import: `rb_insn_len as raw_insn_len`
--> src/cruby.rs:142:9
|
142 | pub use rb_insn_len as raw_insn_len;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: `#[warn(unused_imports)]` on by default
```
Make asm public so it stops warning about unused public stuff in there.
Previously, if the method ID argument happens to be on one below the top
of the stack, we didn't overwrite the type of the stack slot, which
leaves an incorrect type for the stack slot. The included script tripped
asserts both with and without --yjit-verify-ctx.
Previously, for splat callsites that land in methods with optional
parameters, we didn't reject the case where the caller supplies too many
arguments. Accepting those calls previously caused YJIT to construct
corrupted control frames, which leads to crashes if the callee uses
certain stack walking methods such as Kernel#raise and String#gsub (for
setting up the frame-local `$~`).
Example crash in a debug build:
Assertion Failed: ../vm_core.h:1375:VM_ENV_FLAGS:FIXNUM_P(flags)
Counter::guard_send_iseq_has_rest_and_splat_not_equal was using
jump-if-lesser-than, so wasn't checking for equality. Rename function
because moving is destructive in Rust, which is confusing for this function
which doesn't modify the array.
Previously, block.to_proc was called first, by vm_caller_setup_arg_block.
kw.to_hash was called later inside CALLER_SETUP_ARG or setup_parameters_complex.
This adds a splatkw instruction that is inserted before sends with
ARGS_BLOCKARG and KW_SPLAT and without KW_SPLAT_MUT. This is not needed in the
KW_SPLAT_MUT case, because then you know the value is a hash, and you don't
need to call to_hash on it.
The splatkw instruction checks whether the second to top block is a hash,
and if not, replaces it with the value of calling to_hash on it (using
rb_to_hash_type). As it is always before a send with ARGS_BLOCKARG and
KW_SPLAT, second to top is the keyword splat, and top is the passed block.
We've seen quite a few compaction bugs lately, and these assertions
should give clearer symptoms. We only call class_of() on
objects that the Ruby code can see.
We have received a report of `assert!( !cb.has_dropped_bytes())` in
set_page() failing. The only explanation for this seems to be memory
allocation failing in write_byte(). The if condition implies that
`current_write_pos < dst_pos < mem_size`, which rules out failing to
encode the relative jump. The has_capacity() assert above not tripping
implies that we were in a place in the page where write_byte() did
attempt to write the byte and potentially made a syscall in the process.
Remove the assert, since memory allocation could fail. Also, return
failure if the destination is outside of the code region to detect that
out-of-memory situation quicker.
Like in the example given in delayed_deallocation(), stubs can be hit
even if the block housing it is invalidated. Mark them so we don't
work with invalidate ISeqs when hitting these stubs.
* YJIT: Cancel on-stack jit_return on invalidation
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
* Use RUBY_VM_CONTROL_FRAME_STACK_OVERFLOW_P
---------
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
* YJIT: record num_send_cfunc stat
Also report num_send_known_cfunc as percentage of num_send_cfunc
* Rename num_send_known_cfunc => num_send_cfunc_inline
Name seems more descriptive of what we do with out custom codegen
Previously, PosMarker callbacks ran even when the assembler failed to
assemble its contents due to insufficient space. This was problematic
because when Assembler::compile() failed, the callbacks were given
positions that have no valid code, contrary to general expectation.
For example, we use a PosMarker callback to record VM instruction
boundaries and patch in jumps to exits in case the guest program starts
tracing, however, previously, we could record a location near the end of
the code block, where there is no space to patch in jumps. I suspect
this is the cause of the recent occurrences of rare random failures on
GitHub Actions with the invariants.rs:529 "can rewrite existing code"
message. `--yjit-perf` also uses PosMarker and had a similar issue.
Buffer the list of callbacks to fire, and only fire them when all code
in the assembler are written out successfully. It's more intuitive this
way.
Right now the `rb_shape_get_next` shape caller need to
first check if there is capacity left, and if not call
`rb_shape_transition_shape_capa` before it can call `rb_shape_get_next`.
And on each of these it needs to checks if we got a TOO_COMPLEX
back.
All this logic is duplicated in the interpreter, YJIT and RJIT.
Instead we can have `rb_shape_get_next` do the capacity transition
when needed. The caller can compare the old and new shapes capacity
to know if resizing is needed. It also can check for TOO_COMPLEX
only once.
We still need to do `jit.record_boundary_patch_point = false`
when gen_outlined_exit() returns `None` and we return with `?`.
Previously, we tripped the assert at codegen.rs:1042.
Found with `--yjit-exec-mem-size=3` on the lobsters benchmark.
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
We've long had a size restriction on the code memory region such that a
u32 could refer to everything. This commit capitalizes on this
restriction by shrinking the size of `CodePtr` to be 4 bytes from 8.
To derive a full raw pointer from a `CodePtr`, one needs a base pointer.
Both `CodeBlock` and `VirtualMemory` can be used for this purpose. The
base pointer is readily available everywhere, except for in the case of
the `jit_return` "branch". Generalize lea_label() to lea_jump_target()
in the IR to delay deriving the `jit_return` address until `compile()`,
when the base pointer is available.
On railsbench, this yields roughly a 1% reduction to `yjit_alloc_size`
(58,397,765 to 57,742,248).
If the VM ran out of shape, `rb_shape_transition_shape_capa` might
return `OBJ_TOO_COMPLEX_SHAPE`.
Co-authored-by: Jean Boussier <byroot@ruby-lang.org>
* YJIT: implement two-step call threshold
Automatically switch call threshold to a larger value for
larger, production-sized apps, while still allowing smaller apps
and command-line programs to start with a lower threshold.
* Update yjit/src/options.rs
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
* Make the new variables constants
* Check that a custom call threshold was not specified
---------
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
So that we get a reminder to check CodeBlock::has_dropped_bytes().
Internally, asm.compile() already checks it, and this patch just
propagates it out to the caller with a `#[must_use]`.
Code GC logic moved out one level in entry_stub_hit(), so the body
can freely use `?`
It's an estimator for application size and could be used as a
compilation heuristic later.
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
Previously, the version-controlled `cruby_bindings.inc.rs` file
contained the build-time artifact `id.h`, which nobu mentioned hinders
the goal of having fewer magic numbers in the repository.
Lookup the IDs YJIT needs on boot. It costs cycles, but it's fine since
YJIT only uses a handful of IDs at the moment. No perceptible
degradation to boot time found in my testing.
Previously, for block argument callsites with some specific argument
count and callee local variable count combinations, YJIT ended up
writing over arguments that are supposed to be collected into a rest
parameter array unmodified.
Detect when clobbering would happen and avoid it. Also, place the block
handler after the stack overflow check, since it writes to new stack
space.
Reported-by: Takashi Kokubun <takashikkbn@gmail.com>
* Port call threshold logic from Rust to C for performance
* Prefix global/field names with yjit_
* Fix linker error
* Fix preprocessor condition for rb_yjit_threshold_hit
* Fix third linker issue
* Exclude yjit_calls_at_interv from RJIT bindgen
---------
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
Given `SHAPE_MAX_NUM_IVS 80`, we transition to TOO_COMPLEX
way before we could overflow a 8bit counter.
This reduce the size of `rb_shape_t` from 32B to 24B.
If we decide to raise `SHAPE_MAX_NUM_IVS` we can always increase
that type again.
Previously, at the end of `leave` we did
`*caller_cfp->sp = return_value`, like the interpreter.
With future changes that leaves the SP field uninitialized for C frames,
this will become problematic. For cases like returning from
`rb_funcall()`, the return value was written above the stack and
never read anyway (callers use the copy in the return register).
Leave the return value in a register at the end of `leave` and have the
code at `cfp->jit_return` decide what to do with it. This avoids the
unnecessary memory write mentioned above. For JIT-to-JIT returns, it goes
through `asm.stack_push()` and benefits from register allocation for
stack temporaries.
Mostly flat on benchmarks, with maybe some marginal speed improvements.
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
* YJIT: Avoid creating a vector in get_temp_regs()
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
* Remove unused import
---------
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
Since the compile-time iseq used in the guard was not marked and updated
during compaction, a runtime value reusing the address could falsely pass
the guard.
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
* YJIT: Skip Insn::Comment and format!
if disasm is disabled
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
* YJIT: Get rid of asm.comment
---------
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
Previously, TestStack#test_machine_stack_size failed pretty consistently
on ARM64 macOS, with Rust code and part of the interpreter used for
per-instruction fallback (rb_vm_invokeblock() and friends) touching the
stack guard page and crashing with SEGV. I've also seen the same test
fail on x64 Linux, though with a different symptom.
Previously, Kernel#lambda returned a non-lambda proc when given a
non-literal block and issued a warning under the `:deprecated` category.
With this change, Kernel#lambda will always return a lambda proc, if it
returns without raising.
Due to interactions with block passing optimizations, we previously had
two separate code paths for detecting whether Kernel#lambda got a
literal block. This change allows us to remove one path, the hack done
with rb_control_frame_t::block_code introduced in 85a337f for supporting
situations where Kernel#lambda returned a non-lambda proc.
[Feature #19777]
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
* Add getbyte JIT implementation
Adds an implementation for String#getbyte for YJIT, along with a
bootstrap test. This should be helpful for pure Ruby implementations
and to avoid unneeded allocations.
Co-authored-by: John Hawthorn <jhawthorn@github.com>
* Skip the getbyte test for RJIT for now
---------
Co-authored-by: John Hawthorn <jhawthorn@github.com>
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
* Remove function call for String#bytesize
String size is stored in a consistent location, so we can eliminate the
function call.
* Update yjit/src/codegen.rs
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
---------
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
* YJIT: Make compiled_* stats available by default
* Update comment about default counters [ci skip]
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
---------
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
New Clippy lint in 1.72.0 is breaking our build as GitHub has updated
their image. No point hearing about lints from generated code we don't
manually write.
These types are essentially claims about what `RBASIC_CLASS(obj)`
returns. The field changes with singleton class creation, but we didn't
consider so previously and elided guards where we actually needed them.
Found running ruby/spec with --yjit-verify-ctx. The assertion interface
makes extensive use of singleton classes.
Rack uses this. Speculate that the `obj` in `the_call(&obj)`
will be a proc when the compile-time sample is a proc.
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
* YJIT: implement fast path for integer multiplication in opt_mult
* Update yjit/src/codegen.rs
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
* Implement mul with overflow checking on arm64
* Fix missing semicolon
* Add arm splitting for lshift, rshift, urshift
---------
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
We should return false for this type of special methods but wasn't
previously. Was reproducible with:
make test-all TESTS=../test/-ext-/test_notimplement.rb RUN_OPTS='--yjit-call-threshold=1'
* YJIT: Fix splatting empty array with rest param
* YJIT: Rework optional parameter handling to fix corner case
The old code had a few unintuitive parts. The starting PC of the callee
was set in different places; `num_param`, which one would assume to be
static for a particular callee seemingly tallied to different amounts
depending on the what the caller passed; `opts_filled_with_splat` was
greater than zero even when the opts were not filled by items in the
splat array. Functionally, the bits that lets the callee know which
keyword parameters are unspecified were not passed properly when there
are optional parameters and a rest parameter, and then optional
parameters are all filled.
Make `num_param` non-mut and use parameter information in the callee
iseq as-is. Move local variable nil fill and placing of the rest array
out of `gen_push_frame()` as they are only ever relevant for iseq calls.
Always place the rest array at `lead_num + opt_num` to fix the
previously buggy situation.
* YJIT: Compile splat calls to iseqs with rest params
Test interactions with optional parameters.