Граф коммитов

975 Коммитов

Автор SHA1 Сообщение Дата
Takashi Kokubun 8df74deab1 YJIT: Tweak a comment a little [ci skip] 2024-07-18 13:03:17 -07:00
Takashi Kokubun 2de8b5b805
YJIT: Allow dev_nodebug to disasm release-mode code (#11198)
* YJIT: Allow dev_nodebug to disasm release-mode code

* Revert "YJIT: Squash canary before falling back"

This reverts commit f05ad373d8.
The stray canary issue should have been solved by
def7023ee4, alleviating this codegen
accommodation.

* s/runtime_assertions/runtime_checks/

---------

Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
2024-07-18 13:01:47 -07:00
Maxime Chevalier-Boisvert d989bc54e2
YJIT: split chain_depth and flag booleans in context (#11169)
Split these values to avoid using a bit mask in the context
Use variable length encoding to save a few bits on chain depth
2024-07-15 14:45:18 -04:00
Takashi Kokubun ec773e15f4
YJIT: Local variable register allocation (#11157)
* YJIT: Local variable register allocation

* locals are not stack temps

* Rename RegTemps to RegMappings

* Rename RegMapping to RegOpnd

* Rename local_size to num_locals

* s/stack value/operand/

* Rename spill_temps() to spill_regs()

* Clarify when num_locals becomes None

* Mention that InsnOut uses different registers

* Rename get_reg_mapping to get_reg_opnd

* Resurrect --yjit-temp-regs capability

* Use MAX_CTX_TEMPS and MAX_CTX_LOCALS
2024-07-15 10:56:57 -04:00
Maxime Chevalier-Boisvert 3fbf9df39a
YJIT: increase context cache size to 1024 redux (#11140)
* YJIT: increase context cache size to 1024 redux

* Move context hashing code outside of unsafe block

* Avoid allocating large table on the stack, which would cause a stack overflow

Co-authored by Alan Wu @XrXr
2024-07-11 19:01:05 +00:00
Maxime Chevalier-Boisvert 48e7112baa
YJIT: increase context cache size to 1024 (#10983)
* YJIT: increase context cache size to 1024

The other day I ran into a mysterious bug while increasing the
cache size to 1024. I was not able to reproduce this locally.
Opening this PR for testing/debugging.

* Add extra debug assertions

* Add more comments to context code

* Update yjit/src/core.rs

Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>

* Update yjit/src/core.rs

* Comment out potentially problematic assertion

* Revert cache size to 512 so we can merge other changes

---------

Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
2024-07-10 19:45:23 +00:00
Alan Wu 3be9ce3cf6
YJIT: `dump-disasm`: Print comments and bytes in release builds
This change implements a fallback mode for the `--yjit-dump-disasm`
development command-line option to make it usable in release builds.
Previously, using the option with release builds of YJIT yielded only
a warning asking the user to build with `--enable-yjit=dev`.

While builds that use the `disasm` feature still give the best output,
just having the comments is useful enough for many kinds of debugging.
Having it usable in release builds is nice for new hackers, too, since
this allows for tinkering without having to learn how to build YJIT in
development mode.

Sample output on A64:

```
  # regenerate_branch
  # Insn: 0001 opt_send_without_block (stack_size: 1)
  # guard known object with singleton class
  0x11f7e0034: 4b 00 00 58 03 00 00 14 08 ce 9c 04 01 00 00
  0x11f7e0043: 00 3f 00 0b eb 81 06 01 54 1f 20 03 d5
  # RUBY_VM_CHECK_INTS(ec)
  0x11f7e0050: 8b 02 42 b8 cb 07 01 35
  # stack overflow check
  0x11f7e0058: ab 62 02 91 7f 02 0b eb 69 07 01 54
  # save PC to CFP
  0x11f7e0064: 0b 3b 9a d2 2b 2f a0 f2 0b 00 cc f2 6b 02 00
  0x11f7e0073: f8 ab 82 00 91
```

To ensure this feature doesn't incur too much cost when running without
the `--yjit-dump-disasm` option, I checked that there is no significant
impact to compile time and memory usage with the `compile_time_ns` and
`yjit_alloc_size` entry in `RubyVM::YJIT.runtime_stats`. For each
sample, I ran 3 iterations of the `lobsters` YJIT benchmark. The
statistics summary and done with the `summary` function in R.

Compile time, sample size of 60, lower is better:

```
       Before              After
 Min.   :2.054e+09   Min.   :2.028e+09
 1st Qu.:2.069e+09   1st Qu.:2.044e+09
 Median :2.081e+09   Median :2.060e+09
 Mean   :2.089e+09   Mean   :2.066e+09
 3rd Qu.:2.109e+09   3rd Qu.:2.085e+09
 Max.   :2.146e+09   Max.   :2.144e+09
```

Allocation size, sample size of 20, lower is better:

```
       Before             After
 Min.   :21804742   Min.   :21794082
 1st Qu.:21826682   1st Qu.:21816282
 Median :21844042   Median :21826814
 Mean   :21960664   Mean   :22026291
 3rd Qu.:21861228   3rd Qu.:22040439
 Max.   :22587426   Max.   :22930614
```

The `yjit_alloc_size` samples are noisy, but since the average increased
by only 0.3%, and the median is lower, I feel safe saying that there is
no significant change.
2024-07-08 20:02:30 +00:00
Alan Wu b160a78d6b YJIT: Remove done TODO, fix indent
Type check now done in rb_iseqw_to_iseq().
2024-07-03 19:10:57 -04:00
Kevin Menard 3407565d2f
YJIT: Use a special breakpoint address if one isn't explicitly supplied in order to support natural line stepping. (#11083)
Use a special breakpoint address if one isn't explicitly supplied in order to support natural line stepping.

ARM64 will not increment the program counter (PC) upon hitting a breakpoint instruction. Consequently, stepping through code with a debugger ends up looping back to the breakpoint instruction. LLDB has a special breakpoint address of 0xf000 that will increment the PC and allow the debugger to work as expected. This change makes it possible to debug YJIT generated code on ARM64.

More details at: https://discourse.llvm.org/t/stepping-over-a-brk-instruction-on-arm64/69766/8

Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
2024-07-02 15:55:17 -04:00
Gabriel Lacroix 4d94d28a4a
YJIT: Inline simple ISEQs with unused keyword parameters
This commit expands inlining for simple ISeqs to accept
callees that have unused keyword parameters and callers
that specify unused keywords. The following shows 2 new
callsites that will be inlined:

```ruby
def let(a, checked: true) = a

let(1)
let(1, checked: false)
```

Co-authored-by: Kaan Ozkan <kaan.ozkan@shopify.com>
2024-07-02 18:34:48 +00:00
Aaron Patterson a2c27bae96 [YJIT] Don't expand kwargs on forwarding
Similarly to splat arrays, we shouldn't expand splat kwargs.

[ruby-core:118401]
2024-06-29 11:25:59 -06:00
Alan Wu 3e14fe7c21 YJIT: Fix `cargo doc --document-private-items` warnings [ci skip]
Mostly putting angle brackets around links to follow markdown syntax.
2024-06-28 13:44:35 -04:00
Alan Wu bc91e8ff1d YJIT: Move `ocb` parameters into `JITState`
Many functions take an outlined code block but do nothing more than
passing it along; only a couple of functions actually make use of it.
So, in most cases the `ocb` parameter is just boilerplate.

Most functions that take `ocb` already also take a `JITState` and this
commit moves `ocb` into `JITState` to remove the visual noise of the
`ocb` parameter.
2024-06-28 11:01:05 -04:00
Aaron Patterson 4cbc41d5e5 [YJIT] Fix block and splat handling when forwarding
This commit fixes splat and block handling when calling in to a
forwarding iseq.  In the case of a splat we need to avoid expanding the
array to the stack.  We need to also ensure the CI write is flushed to
the SP, otherwise it's possible for a block handler to clobber the CI

[ruby-core:118360]
2024-06-26 16:01:26 -04:00
Aaron Patterson cc97a27008 Add two new instructions for forwarding calls
This commit adds `sendforward` and `invokesuperforward` for forwarding
parameters to calls

Co-authored-by: Matt Valentine-House <matt@eightbitraptor.com>
2024-06-18 09:28:25 -07:00
Aaron Patterson cdf33ed5f3 Optimized forwarding callers and callees
This patch optimizes forwarding callers and callees. It only optimizes methods that only take `...` as their parameter, and then pass `...` to other calls.

Calls it optimizes look like this:

```ruby
def bar(a) = a
def foo(...) = bar(...) # optimized
foo(123)
```

```ruby
def bar(a) = a
def foo(...) = bar(1, 2, ...) # optimized
foo(123)
```

```ruby
def bar(*a) = a

def foo(...)
  list = [1, 2]
  bar(*list, ...) # optimized
end
foo(123)
```

All variants of the above but using `super` are also optimized, including a bare super like this:

```ruby
def foo(...)
  super
end
```

This patch eliminates intermediate allocations made when calling methods that accept `...`.
We can observe allocation elimination like this:

```ruby
def m
  x = GC.stat(:total_allocated_objects)
  yield
  GC.stat(:total_allocated_objects) - x
end

def bar(a) = a
def foo(...) = bar(...)

def test
  m { foo(123) }
end

test
p test # allocates 1 object on master, but 0 objects with this patch
```

```ruby
def bar(a, b:) = a + b
def foo(...) = bar(...)

def test
  m { foo(1, b: 2) }
end

test
p test # allocates 2 objects on master, but 0 objects with this patch
```

How does it work?
-----------------

This patch works by using a dynamic stack size when passing forwarded parameters to callees.
The caller's info object (known as the "CI") contains the stack size of the
parameters, so we pass the CI object itself as a parameter to the callee.
When forwarding parameters, the forwarding ISeq uses the caller's CI to determine how much stack to copy, then copies the caller's stack before calling the callee.
The CI at the forwarded call site is adjusted using information from the caller's CI.

I think this description is kind of confusing, so let's walk through an example with code.

```ruby
def delegatee(a, b) = a + b

def delegator(...)
  delegatee(...)  # CI2 (FORWARDING)
end

def caller
  delegator(1, 2) # CI1 (argc: 2)
end
```

Before we call the delegator method, the stack looks like this:

```
Executing Line | Code                                  | Stack
---------------+---------------------------------------+--------
              1| def delegatee(a, b) = a + b           | self
              2|                                       | 1
              3| def delegator(...)                    | 2
              4|   #                                   |
              5|   delegatee(...)  # CI2 (FORWARDING)  |
              6| end                                   |
              7|                                       |
              8| def caller                            |
          ->  9|   delegator(1, 2) # CI1 (argc: 2)     |
             10| end                                   |
```

The ISeq for `delegator` is tagged as "forwardable", so when `caller` calls in
to `delegator`, it writes `CI1` on to the stack as a local variable for the
`delegator` method.  The `delegator` method has a special local called `...`
that holds the caller's CI object.

Here is the ISeq disasm fo `delegator`:

```
== disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)>
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] "..."@0
0000 putself                                                          (   1)[LiCa]
0001 getlocal_WC_0                          "..."@0
0003 send                                   <calldata!mid:delegatee, argc:0, FCALL|FORWARDING>, nil
0006 leave                                  [Re]
```

The local called `...` will contain the caller's CI: CI1.

Here is the stack when we enter `delegator`:

```
Executing Line | Code                                  | Stack
---------------+---------------------------------------+--------
              1| def delegatee(a, b) = a + b           | self
              2|                                       | 1
              3| def delegator(...)                    | 2
           -> 4|   #                                   | CI1 (argc: 2)
              5|   delegatee(...)  # CI2 (FORWARDING)  | cref_or_me
              6| end                                   | specval
              7|                                       | type
              8| def caller                            |
              9|   delegator(1, 2) # CI1 (argc: 2)     |
             10| end                                   |
```

The CI at `delegatee` on line 5 is tagged as "FORWARDING", so it knows to
memcopy the caller's stack before calling `delegatee`.  In this case, it will
memcopy self, 1, and 2 to the stack before calling `delegatee`.  It knows how much
memory to copy from the caller because `CI1` contains stack size information
(argc: 2).

Before executing the `send` instruction, we push `...` on the stack.  The
`send` instruction pops `...`, and because it is tagged with `FORWARDING`, it
knows to memcopy (using the information in the CI it just popped):

```
== disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)>
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] "..."@0
0000 putself                                                          (   1)[LiCa]
0001 getlocal_WC_0                          "..."@0
0003 send                                   <calldata!mid:delegatee, argc:0, FCALL|FORWARDING>, nil
0006 leave                                  [Re]
```

Instruction 001 puts the caller's CI on the stack.  `send` is tagged with
FORWARDING, so it reads the CI and _copies_ the callers stack to this stack:

```
Executing Line | Code                                  | Stack
---------------+---------------------------------------+--------
              1| def delegatee(a, b) = a + b           | self
              2|                                       | 1
              3| def delegator(...)                    | 2
              4|   #                                   | CI1 (argc: 2)
           -> 5|   delegatee(...)  # CI2 (FORWARDING)  | cref_or_me
              6| end                                   | specval
              7|                                       | type
              8| def caller                            | self
              9|   delegator(1, 2) # CI1 (argc: 2)     | 1
             10| end                                   | 2
```

The "FORWARDING" call site combines information from CI1 with CI2 in order
to support passing other values in addition to the `...` value, as well as
perfectly forward splat args, kwargs, etc.

Since we're able to copy the stack from `caller` in to `delegator`'s stack, we
can avoid allocating objects.

I want to do this to eliminate object allocations for delegate methods.
My long term goal is to implement `Class#new` in Ruby and it uses `...`.

I was able to implement `Class#new` in Ruby
[here](https://github.com/ruby/ruby/pull/9289).
If we adopt the technique in this patch, then we can optimize allocating
objects that take keyword parameters for `initialize`.

For example, this code will allocate 2 objects: one for `SomeObject`, and one
for the kwargs:

```ruby
SomeObject.new(foo: 1)
```

If we combine this technique, plus implement `Class#new` in Ruby, then we can
reduce allocations for this common operation.

Co-Authored-By: John Hawthorn <john@hawthorn.email>
Co-Authored-By: Alan Wu <XrXr@users.noreply.github.com>
2024-06-18 09:28:25 -07:00
Kevin Menard 91bbb78313 YJIT: Fix an unused field warning in `DumpDisasm`. 2024-06-17 17:14:29 -04:00
Alan Wu 657c8db8de YJIT: `--yjit-dump-disasm=dir`: Hold descriptor for dump file
This mainly aims to make `--yjit-dump-disasm=<relative_path>` more
usable. Previously, it crashed if the program did chdir(2), since it
opened the dump file every time when appending.

Tested with:

    ./miniruby --yjit-dump-disasm=. --yjit-call-threshold=1 -e 'Dir.chdir("/") {}'

And the `lobsters` benchmark.
2024-06-17 15:09:32 -04:00
Alan Wu ffd895156f YJIT: Delete otherwise-empty defer_compilation() blocks
Calls to defer_compilation() leave behind a stub and a `struct Block`
that we retain. If the block is empty, it only exits to hold the
`struct Branch` that the stub needs.

This patch transplants the branch out of the empty block into the newly
generated block when the defer_compilation() stub is hit, and deletes
the empty block to save memory.

To assist the transplantation, `Block::outgoing` is now a
`MutableBranchList`, and `Branch::Block` now in a `Cell`. These types
don't incur a size cost.

On the `lobsters` benchmark, `yjit_alloc_size` is roughly 98% of what
it was before the change.

Co-authored-by: Kevin Menard <kevin.menard@shopify.com>
Co-authored-by: Randy Stauner <randy@r4s6.net>
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
2024-06-13 13:00:46 -04:00
Maxime Chevalier-Boisvert ce06924a17
YJIT: add context cache hits stat (#10979)
* YJIT: add context cache hits stat

This stat should make more sense when it comes to interpreting
the effectiveness of the cache on large deployed apps.
2024-06-12 13:33:27 -04:00
Takashi Kokubun ec1ea2c5b9 YJIT: Make num_contexts_encoded a default counter 2024-06-11 10:17:41 -07:00
Maxime Chevalier-Boisvert 39019b6a63
YJIT: add context cache size stat, lazily allocate cache
* YJIT: add context cache size stat
* Allocate the context cache in a box so CRuby doesn't pay overhead
* Add an extra debug assertion
2024-06-11 12:46:11 -04:00
Maxime Chevalier-Boisvert 0d91887c6a
YJIT: implement cache for recently encoded/decoded contexts (#10938)
* YJIT: implement cache for recently encoded/decoded contexts

* Increase cache size to 512
2024-06-07 21:59:59 +00:00
Maxime Chevalier-Boisvert 425e630ce7
YJIT: implement variable-length context encoding scheme (#10888)
* Implement BitVector data structure for variable-length context encoding

* Rename method to make intent clearer

* Rename write_uint => push_uint to make intent clearer

* Implement debug trait for BitVector

* Fix bug in BitVector::read_uint_at(), enable more tests

* Add one more test for good measure

* Start sketching Context::encode()

* Progress on variable length context encoding

* Add tests. Fix bug.

* Encode stack state

* Add comments. Try to estimate context encoding size.

* More compact encoding for stack size

* Commit before rebase

* Change Context::encode() to take a BitVector as input

* Refactor BitVector::read_uint(), add helper read functions

* Implement Context::decode() function. Add test.

* Fix bug, add tests

* Rename methods

* Add Context::encode() and decode() methods using global data

* Make encode and decode methods use u32 indices

* Refactor YJIT to use variable-length context encoding

* Tag functions as allow unused

* Add a simple caching mechanism and stats for bytes per context etc

* Add comments, fix formatting

* Grow vector of bytes by 1.2x instead of 2x

* Add debug assert to check round-trip encoding-decoding

* Take some rustfmt formatting

* Add decoded_from field to Context to reuse previous encodings

* Remove olde context stats

* Re-add stack_size assert

* Disable decoded_from optimization for now
2024-06-07 16:26:14 -04:00
Jean Boussier 33f92b3c88 Don't add `+YJIT` to `RUBY_DESCRIPTION` until it's actually enabled
If you start Ruby with `--yjit-disable`, the `+YJIT` shouldn't be
added until `RubyVM::YJIT.enable` is actually called. Otherwise
it's confusing in crash reports etc.
2024-06-05 20:53:49 +02:00
Jean Boussier f7b53a75b6 Do not emit shape transition warnings when YJIT is compiling
[Bug #20522]

If `Warning.warn` is redefined in Ruby, emitting a warning would invoke
Ruby code, which can't safely be done when YJIT is compiling.
2024-06-04 19:21:01 +02:00
Takashi Kokubun a2147eb694
YJIT: Fix getconstant exits after opt_ltlt fusion (#10903)
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
2024-06-04 10:17:40 -04:00
Alan Wu 6c8ae44a38 YJIT: Fix out of bounds access when splatting empty array
Previously, we read the last element array even when the array was
empty, doing an out-of-bounds access. This sometimes caused a SEGV.

[Bug #20496]
2024-05-31 18:37:13 -04:00
Alan Wu 4a9ef9e23c YJIT: Fix a warning from nightly rust
No plan about migrating to the 2024 edition yet (it's not even
available yet), but this is a simple enough suggestion so we can just
take it.

```
warning: this method call resolves to `<&Box<[T]> as IntoIterator>::into_iter` (due to backwards compatibility), but will resolve to `<Box<[T]> as IntoIterator>::into_iter` in Rust 2024
    --> ../yjit/src/core.rs:1003:49
     |
1003 |         formatter.debug_list().entries(branches.into_iter()).finish()
     |                                                 ^^^^^^^^^
     |
     = warning: this changes meaning in Rust 2024
     = note: `#[warn(boxed_slice_into_iter)]` on by default
help: use `.iter()` instead of `.into_iter()` to avoid ambiguity
     |
1003 |         formatter.debug_list().entries(branches.iter()).finish()
     |                                                 ~~~~
help: or use `IntoIterator::into_iter(..)` instead of `.into_iter()` to explicitly iterate by value
     |
1003 |         formatter.debug_list().entries(IntoIterator::into_iter(branches)).finish()
     |                                        ++++++++++++++++++++++++        ~
```
2024-05-29 15:58:35 -04:00
Maxime Chevalier-Boisvert 1eff5a98f1
YJIT: limit size of call count stats dict (#10858)
* YJIT: limit size of call count stats dict

Someone reported that logs were getting bloated because the
ISEQ and C call count dicts were huge, since they include all
of the call sites. I wrote code on the Rust size to limit the
size of the dict to avoid this problem. The size limit is
hardcoded at 20, but I figure this is probably fine?

* Fix bug reported by Kokubun.
2024-05-28 13:23:01 -04:00
Étienne Barrié 1376881e9a Stop marking chilled strings as frozen
They were initially made frozen to avoid false positives for cases such
as:

    str = str.dup if str.frozen?

But this may cause bugs and is generally confusing for users.

[Feature #20205]

Co-authored-by: Jean Boussier <byroot@ruby-lang.org>
2024-05-28 07:32:33 +02:00
Nobuyoshi Nakada 49fcd33e13 Introduce a specialize instruction for Array#pack
Instructions for this code:

```ruby
  # frozen_string_literal: true

[a].pack("C")
```

Before this commit:

```
== disasm: #<ISeq:<main>@test.rb:1 (1,0)-(3,13)>
0000 putself                                                          (   3)[Li]
0001 opt_send_without_block                 <calldata!mid:a, argc:0, FCALL|VCALL|ARGS_SIMPLE>
0003 newarray                               1
0005 putobject                              "C"
0007 opt_send_without_block                 <calldata!mid:pack, argc:1, ARGS_SIMPLE>
0009 leave
```

After this commit:

```
== disasm: #<ISeq:<main>@test.rb:1 (1,0)-(3,13)>
0000 putself                                                          (   3)[Li]
0001 opt_send_without_block                 <calldata!mid:a, argc:0, FCALL|VCALL|ARGS_SIMPLE>
0003 putobject                              "C"
0005 opt_newarray_send                      2, :pack
0008 leave
```

Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
2024-05-23 12:11:50 -07:00
Alan Wu 1df1edc080
YJIT: Fix comment and counter in rb_yjit_invalidate_ep_is_bp() (#10722)
`mem::take` substitutes an empty instance which makes `jit.ep_is_bp()`
return false.
2024-05-06 10:28:36 -04:00
Alan Wu 2a978ee047
YJIT: Fix `Struct` accessors not firing tracing events (#10690)
* YJIT: Fix `Struct` accessors not firing tracing events

Reading and writing to structs should fire `c_call` and `c_return`, but
YJIT wasn't correctly dropping those calls when tracing.
This has been missing since this functionality was added in 3081c83169,
but the added test only fails when ran in isolation with
`--yjit-call-threshold=1`. The test sometimes failed on CI.

* RJIT: YJIT: Fix `Struct` readers not firing tracing events

Same issue as YJIT, but it looks like RJIT doesn't support writing to
structs, so only reading needs changing.
2024-05-01 10:22:41 -04:00
Alan Wu 470eceff8f YJIT: Remove CString allocation when using `src_loc!()`
Since we often take the VM lock as the first thing we do when entering
YJIT, and that needs a `src_loc!()`, this removes a allocation from
that. The main trick here is `concat!(file!(), '\0')` to get a C string
statically baked into the binary.
2024-04-29 16:36:27 -04:00
Alan Wu de0ad3be8e
YJIT: Take VM lock when invalidating
We need the lock to patch code safely.
This might fix some Ractor related crashes seen on CI.
2024-04-29 16:08:06 -04:00
Randy Stauner adae813c5f
YJIT: Expand codegen for `TrueClass#===` to `FalseClass` and `NilClass` (#10679) 2024-04-29 19:25:22 +00:00
Randy Stauner 845f2db136
YJIT: Add specialized codegen function for `TrueClass#===` (#10640)
* YJIT: Add specialized codegen function for `TrueClass#===`

TrueClass#=== is currently number 10 in the most frequent C calls list of the lobsters benchmark.

```
require "benchmark/ips"

def wrap
  true === true
  true === false
  true === :x
end

Benchmark.ips do |x|
  x.report(:wrap) do
    wrap
  end
end
```

```
before
Warming up --------------------------------------
                wrap     1.791M i/100ms
Calculating -------------------------------------
                wrap     17.806M (± 1.0%) i/s -     89.544M in   5.029363s

after
Warming up --------------------------------------
                wrap     4.024M i/100ms
Calculating -------------------------------------
                wrap     40.149M (± 1.1%) i/s -    201.223M in   5.012527s
```

Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Takashi Kokubun (k0kubun) <takashikkbn@gmail.com>
Co-authored-by: Kevin Menard <kevin.menard@shopify.com>
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>

* Fix the new test for RJIT

---------

Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Takashi Kokubun (k0kubun) <takashikkbn@gmail.com>
Co-authored-by: Kevin Menard <kevin.menard@shopify.com>
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
2024-04-29 14:32:07 -04:00
Takashi Kokubun 8089faee45 Revert "YJIT: Try splitting getlocal/setlocal blocks (#10648)"
This reverts commit ab228bd084.
2024-04-26 20:43:22 -07:00
Alan Wu 2ba7c1b142 YJIT: Correct signature of rb_yjit_root_mark()
Even though unused, it's supposed to take a pointer like the C side
expects.
2024-04-26 18:03:26 -07:00
Alan Wu 73eeb8643b YJIT: Fix reference update for `Invariants::no_ep_escape_iseqs`
Previously, the update was done in the ISEQ callback. That effectively
never updated anything because the callback itself is given an intact
reference, so it could update its content, and `rb_gc_location(iseq)`
never returned a new address. Update the whole table once in the YJIT
root instead.
2024-04-26 18:03:26 -07:00
Takashi Kokubun ab228bd084
YJIT: Try splitting getlocal/setlocal blocks (#10648) 2024-04-26 13:02:22 -07:00
Alan Wu 9b5bc8e6ea YJIT: Relax `--yjit-verify-ctx` after singleton class creation
Types like `Type::CString` really only assert that at one point the object had
its class field equal to `String`. Once a singleton class is created for any
strings, the type makes no assertion about any class field anymore, and becomes
the same as `Type::TString`.

Previously, the `--yjit-verify-ctx` option wasn't allowing objects of these
kind that have have singleton classes to pass verification even though the code
generators handle it just fine.

Found through `ruby/spec`.
2024-04-25 18:38:14 -04:00
Takashi Kokubun 7ab1a608e7
YJIT: Optimize local variables when EP == BP (take 2) (#10607)
* Revert "Revert "YJIT: Optimize local variables when EP == BP" (#10584)"

This reverts commit c878344195.

* YJIT: Take care of GC references in ISEQ invariants

Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>

---------

Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
2024-04-25 10:04:53 -04:00
Kevin Menard afc7799c32
YJIT: Add a specialized codegen function for `Class#superclass`. (#10613)
Add a specialized codegen function for `Class#superclass`.

Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Takashi Kokubun (k0kubun) <takashikkbn@gmail.com>
Co-authored-by: Randy Stauner <randy.stauner@shopify.com>
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
2024-04-24 10:31:35 -04:00
Alan Wu 1bb7638e7a
YJIT: Fix shrinking block with assumption too much (#10585)
* YJIT: Fix shrinking block with assumption too much

Under the very specific circumstances, discovered by a test case in
`ruby/spec`, an `expandarray` block can contain just a branch and carry
a method lookup assumption. Previously, when we regenerated the branch,
we allowed it to shrink to empty, since we put the code at the jump
target immediately after it. That was incorrect and caused a crash while
the block is invalidated, since that left no room to patch in an exit.

When regenerating a branch that makes up a block entirely, and the block
could be invalidated, we need to ensure there is room for invalidation.
When there is code before the branch, they should act as padding, so we
don't need to worry about those cases.

* skip on RJIT
2024-04-22 11:16:46 -04:00
Alan Wu c878344195
Revert "YJIT: Optimize local variables when EP == BP" (#10584)
This reverts commit 4cc58ea0b8.

Since the change landed call-threshold=1 CI runs have been timing out.
There has also been `verify-ctx` violations. Revert for now while we debug.
2024-04-19 16:47:25 +00:00
careworry 8e08556fa7
chore: remove repetitive words (#10573)
Signed-off-by: careworry <worrycare@outlook.com>
2024-04-18 15:32:34 +00:00
Alan Wu 28efc0c924
YJIT: Fix canary crash with Array#<< (#10568)
Previously, we got "We are killing the stack canary set by opt_ltlt"
from `$./miniruby --yjit-call-threshold=1 -e 'a = [].freeze; a << 1'`

Found by running ruby-spec with yjit-call-threshold=1.
2024-04-18 10:04:23 -04:00
Alan Wu 8b81301536
YJIT: A64: Use CBZ/CBNZ to check for zero
* YJIT: A64: Add CBZ and CBNZ encoding functions

* YJIT: A64: Use CBZ/CBNZ to check for zero

Instead of emitting `cmp x0, #0` plus `b.z #target`, A64 offers Compare
and Branch on Zero for us to just do `cbz x0, #target`. This commit
utilizes that and the related CBNZ instruction when appropriate.

We check for zero most commonly in interrupt checks:

```diff
  # Insn: 0003 leave (stack_size: 1)
  # RUBY_VM_CHECK_INTS(ec)
  ldur w11, [x20, #0x20]
  -tst w11, w11
  -b.ne #0x109002164
  +cbnz w11, #0x1049021d0
```

* fix copy paste error

Co-authored-by: Randy Stauner <randy@r4s6.net>

---------

Co-authored-by: Randy Stauner <randy@r4s6.net>
2024-04-17 21:48:38 +00:00