Граф коммитов

388 Коммитов

Автор SHA1 Сообщение Дата
Takashi Kokubun 24043031be
YJIT: Split send_iseq_complex_callee exit reasons (#6895) 2022-12-09 16:45:38 -08:00
Maxime Chevalier-Boisvert daa893db41
YJIT: implement `getconstant` YARV instruction (#6884)
* YJIT: implement getconstant YARV instruction

* Constant id is not a pointer

* Stack operands must be read after jit_prepare_routine_call

Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
2022-12-09 14:12:15 -08:00
Alan Wu e714907d82 YJIT: Upgrade bindgen to stabilize and reduce output
The new version has an option to merge everything into a big
`extern "C"` block and it's nicer.

More importantly, this upgrade fixes an issue where Ubuntu with Clang 12
and macOS with Clang 14 gave a one line diff for `rb_shape_t`. It was
slightly annoying because we use macOS locally.
2022-12-08 17:35:18 -05:00
Takashi Kokubun 51ef991d8d
YJIT: Drop Copy trait from Context (#6889) 2022-12-08 17:33:18 -05:00
Maxime Chevalier-Boisvert b26c9ce5e9
YJIT: implement opt_newarray_min YARV instruction (#6888) 2022-12-08 17:31:33 -05:00
Samuel Williams 6fd5d2dc00
Introduce `IO.new(..., path:)` and promote `File#path` to `IO#path`. (#6867) 2022-12-08 18:19:53 +13:00
Jemma Issroff 40a9964b89 Set max_iv_count (used for object shapes) based on inline caches
With this change, we're storing the iv name on an inline cache on
setinstancevariable instructions. This allows us to check the inline
cache to count instance variables set in initialize and give us an
estimate of iv capacity for an object.

For the purpose of estimating the number of instance variables required
for an object, we're assuming that all initialize methods will call
`super`.

This change allows us to estimate the number of instance variables
required without disassembling instruction sequences.

Co-Authored-By: Aaron Patterson <tenderlove@ruby-lang.org>
2022-12-06 13:43:42 -08:00
Daniel Colson e69b91fae4 Introduce BOP_CMP for optimized comparison
Prior to this commit the `OPTIMIZED_CMP` macro relied on a method lookup
to determine whether `<=>` was overridden. The result of the lookup was
cached, but only for the duration of the specific method that
initialized the cmp_opt_data cache structure.

With this method lookup, `[x,y].max` is slower than doing `x > y ?
x : y` even though there's an optimized instruction for "new array max".
(John noticed somebody a proposed micro-optimization based on this fact
in https://github.com/mastodon/mastodon/pull/19903.)

```rb
a, b = 1, 2
Benchmark.ips do |bm|
  bm.report('conditional') { a > b ? a : b }
  bm.report('method') { [a, b].max }
  bm.compare!
end
```

Before:

```
Comparison:
         conditional: 22603733.2 i/s
              method: 19820412.7 i/s - 1.14x  (± 0.00) slower
```

This commit replaces the method lookup with a new CMP basic op, which
gives the examples above equivalent performance.

After:

```
Comparison:
              method: 24022466.5 i/s
         conditional: 23851094.2 i/s - same-ish: difference falls within
error
```

Relevant benchmarks show an improvement to Array#max and Array#min when
not using the optimized newarray_max instruction as well. They are
noticeably faster for small arrays with the relevant types, and the same
or maybe a touch faster on larger arrays.

```
$ make benchmark COMPARE_RUBY=<master@5958c305> ITEM=array_min
$ make benchmark COMPARE_RUBY=<master@5958c305> ITEM=array_max
```

The benchmarks added in this commit also look generally improved.

Co-authored-by: John Hawthorn <jhawthorn@github.com>
2022-12-06 12:37:23 -08:00
Daniel Colson c43951e60e Move BOP macros to separate file
This commit moves ruby_basic_operators and the unredefined macros out of
vm_core.h and into basic_operators.h so that we can use them more
broadly in places where we currently use a method look up via
`rb_method_basic_definition_p` (e.g. object.c, numeric.c, complex.c,
enum.c, but also in internal/compar.h after introducing BOP_CMP and
elsewhere if we introduce more BOPs)

The most controversial part of this change is probably moving
redefined_flag out of rb_vm_t. [vm_opt_method_def_table and
vm_opt_mid_table](9da2a5204f/vm.c)
are not part of rb_vm_t either, and I think this fits well with those.
But more significantly it seems to result in one fewer instruction. For
example:

Before:

```
(lldb) disassemble -n vm_opt_str_freeze
miniruby`vm_exec_core:
miniruby[0x10028233e] <+14558>: movq   0x11a86b(%rip), %rax      ; ruby_current_vm_ptr
miniruby[0x100282345] <+14565>: testb  $0x4, 0x242c(%rax)
```

After:

```
(lldb) disassemble -n vm_opt_str_freeze
ruby`vm_exec_core:
ruby[0x100280ebe] <+14510>: testb  $0x4, 0x120147(%rip)      ; ruby_vm_redefined_flag + 43
```

Co-authored-by: John Hawthorn <jhawthorn@github.com>
2022-12-06 12:37:23 -08:00
Alan Wu 235fc50447
YJIT: Remove --yjit-code-page-size (#6865)
Certain code page sizes don't work and can cause crashes, so having this
value available as a command-line option is a bit dangerous. Remove it
and turn it into a constant instead.
2022-12-05 17:43:17 -05:00
Jemma Issroff e7642d8095
YJIT: Extract SHAPE_ID_NUM_BITS into a constant (#6863) 2022-12-05 13:20:11 -08:00
Jemma Issroff 41bacd9b0d Remove unused rb_shape_flag_shift and rb_shape_flag_mask 2022-12-02 12:53:51 -08:00
Jemma Issroff ebd4c7bb01 Fixed yjit bindings rb_gc_write_barrier 2022-12-02 12:53:51 -08:00
Jemma Issroff 4c5e89791b Extracted rb_shape_id_offset 2022-12-02 12:53:51 -08:00
Maxime Chevalier-Boisvert 606653e43a Update yjit/src/codegen.rs 2022-12-02 12:53:51 -08:00
Aaron Patterson be40af284a make flag clearing better 2022-12-02 12:53:51 -08:00
Aaron Patterson 07fe3d37c5 only generate wb when we really need to 2022-12-02 12:53:51 -08:00
Aaron Patterson 744b0527ea bail on compilation if the comptime receiver is frozen 2022-12-02 12:53:51 -08:00
Aaron Patterson 7b5ee9a8a6 do not fire the wb when writing immediates 2022-12-02 12:53:51 -08:00
Aaron Patterson 17f9bcd7d7 implement IV writes 2022-12-02 12:53:51 -08:00
Alan Wu eb2b717a8b
YJIT: Make case-when optimization respect === redefinition (#6846)
* YJIT: Make case-when optimization respect === redefinition

Even when a fixnum key is in the dispatch hash, if there is a case such
that its basic operations for === is redefined, we need to fall back to
checking each case like the interpreter. Semantically we're always
checking each case by calling === in order, it's just that this is not
observable when basic operations are intact.

When all the keys are fixnums, though, we can do the optimization we're
doing right now. Check for this condition.

* Update yjit/src/cruby_bindings.inc.rs

Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>

Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
2022-12-02 11:40:16 -05:00
Takashi Kokubun fa77bcf722
YJIT: Change the default --yjit-call-threshold to 30 (#6850) 2022-12-02 11:32:49 -05:00
Takashi Kokubun dcbea7671b
YJIT: Respect destination num_bits on STUR (#6848) 2022-12-01 16:13:38 -08:00
Takashi Kokubun 2c939458ca
YJIT: Reorder branches for Fixnum opt_case_dispatch (#6841)
* YJIT: Reorder branches for Fixnum opt_case_dispatch

Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>

* YJIT: Don't support too large values

Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
2022-12-01 10:59:56 -05:00
Jemma Issroff 06a0c58016
YJIT: fix 32 and 16 bit register store (#6840)
* Fix 32 and 16 bit register store in YJIT

Co-Authored-By: Takashi Kokubun <takashikkbn@gmail.com>

* Remove an unnecessary diff

* Reuse an rm_num_bits result

* Use u16::MAX instead

* Update the link

Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>

* Just use sturh for 16 bits

Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
2022-12-01 10:53:50 -05:00
Takashi Kokubun 0d3fc08ff4
YJIT: Optimize rb_int_equal (#6838) 2022-11-30 16:16:11 -05:00
Maxime Chevalier-Boisvert d98d84b75d
YJIT: add new counters for deferred compilation and queued blocks (#6837) 2022-11-30 14:09:10 -05:00
Alan Wu a0b0365e90 YJIT: Deallocate `struct Block` to plug memory leaks
Previously we essentially never freed block even after invalidation.
Their reference count never reached zero for a couple of reasons:
1. `Branch::block` formed a cycle with the block holding the branch
2. Strong count on a branch that has ever contained a stub never
   reached 0 because we increment the `.clone()` call for
   `BranchRef::into_raw()` didn't have a matching decrement.

It's not safe to immediately deallocate blocks during
invalidation since `branch_stub_hit()` can end up
running with a branch pointer from an invalidated branch.
To plug the leaks, we wait until code GC or global invalidation and
deallocate the blocks for iseqs that are definitely not running.
2022-11-30 12:23:50 -05:00
Alan Wu b30248f74a YJIT: Deallocate when assumptions tables are empty
When we run global invalidation for TracePoints or code GC, we clear out
all blocks in our assumptions table but we don't deallocate the backing
buffers. Let's reclaim some memory during these rare events.
2022-11-30 12:23:50 -05:00
Alan Wu 03f1e6a2aa YJIT: Fix IseqPayload::pages memory bloat
HashSet::clear() doesn't deallocate the backing buffer and shrink the
capacity. Replace with a 0-capcity set instead so we reclaim some memory
each code GC.
2022-11-30 12:23:50 -05:00
Takashi Kokubun 3e4d1a1dd1
YJIT: Skip checking interrupt_mask (#6825) 2022-11-29 10:09:32 -05:00
Takashi Kokubun 6844bcc6b4 MJIT: Use a String buffer in builtin compilers
instead of FILE*.

Using C.fprintf is slower than String manipulation on memory. I'm going
to change the way MJIT writes files, and this is a prerequisite for it.
2022-11-27 21:11:33 -08:00
Maxime Chevalier-Boisvert d2fa67de81
YJIT: rename `InsnOpnd` => `YARVOpnd` (#6801)
Rename InsnOpnd => YARVOpnd

Make it more clear this refers to YARV insn/vm operands rather
than backend IR, x86 or ARM insn operands.
2022-11-24 10:30:28 -05:00
Alan Wu d92054e371 YJIT: Use a Box for branch targets to save memory
We frequently make branches that only have one target but we used to
always allocate space for two branch targets. This patch moves all the
information a branch target has into a struct and refer to them using
Option<Box<BranchTarget>>, this way when the second branch target is not
present it only takes 8 bytes.

Retained heap size on railsbench went from 16.17 MiB to 14.57 MiB, a
ratio of about 1.1.
2022-11-23 18:00:12 -05:00
Takashi Kokubun a50aabde9c
YJIT: Simplify Insn::CCall to obviate Target::FunPtr (#6793) 2022-11-23 12:14:43 -05:00
Takashi Kokubun d88adaad7e
YJIT: Use NonNull pointer for CodePtr (#6792) 2022-11-23 12:02:05 -05:00
Takashi Kokubun 9c36de3c48 YJIT: Stop passing target1 to gen_return_branch 2022-11-23 11:59:50 -05:00
Takashi Kokubun fe2bed6778
YJIT: Simplify code for RB_SPECIAL_CONST_P (#6795) 2022-11-23 11:59:02 -05:00
Jemma Issroff e82b15b660
Fix YJIT backend to account for unsigned int immediates (#6789)
YJIT: x86_64: Fix cmp with number where sign bit is set

Before this commit, we were unconditionally treating unsigned ints as
signed ints when counting the number of bits required for representing
the immediate in machine code. When the size of the immediate matches
the size of the other operand, no sign extension happens, so this was
incorrect. `asm.cmp(opnd64, 0x8000_0000)` panicked even though it's
encodable as `CMP r/m32, imm32`. Large shape ids were impacted by this
issue.

Co-Authored-By: Aaron Patterson <tenderlove@ruby-lang.org>
Co-Authored-By: Alan Wu <alanwu@ruby-lang.org>

Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
Co-authored-by: Alan Wu <alanwu@ruby-lang.org>
2022-11-23 10:48:17 -05:00
Takashi Kokubun 63f4a7a1ec
YJIT: Skip padding jumps to side exits on Arm (#6790)
YJIT: Skip padding jumps to side exits

Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>

Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
2022-11-22 15:57:17 -05:00
Takashi Kokubun 6dcb7b9216
YJIT: Improve the failure message on enlarging a branch (#6769) 2022-11-18 17:27:07 -08:00
Aaron Patterson 9e067df76b 32 bit comparison on shape id
This commit changes the shape id comparisons to use a 32 bit comparison
rather than 64 bit.  That means we don't need to load the shape id to a
register on x86 machines.

Given the following program:

```ruby
class Foo
  def initialize
    @foo = 1
    @bar = 1
  end

  def read
    [@foo, @bar]
  end
end

foo = Foo.new
foo.read
foo.read
foo.read
foo.read
foo.read

puts RubyVM::YJIT.disasm(Foo.instance_method(:read))
```

The machine code we generated _before_ this change is like this:

```
== BLOCK 1/4, ISEQ RANGE [0,3), 65 bytes ======================
  # getinstancevariable
  0x559a18623023: mov rax, qword ptr [r13 + 0x18]
  # guard object is heap
  0x559a18623027: test al, 7
  0x559a1862302a: jne 0x559a1862502d
  0x559a18623030: cmp rax, 4
  0x559a18623034: jbe 0x559a1862502d
  # guard shape, embedded, and T_OBJECT
  0x559a1862303a: mov rcx, qword ptr [rax]
  0x559a1862303d: movabs r11, 0xffff00000000201f
  0x559a18623047: and rcx, r11
  0x559a1862304a: movabs r11, 0xb000000002001
  0x559a18623054: cmp rcx, r11
  0x559a18623057: jne 0x559a18625046
  0x559a1862305d: mov rax, qword ptr [rax + 0x18]
  0x559a18623061: mov qword ptr [rbx], rax

== BLOCK 2/4, ISEQ RANGE [3,6), 0 bytes =======================
== BLOCK 3/4, ISEQ RANGE [3,6), 47 bytes ======================
  # gen_direct_jmp: fallthrough
  # getinstancevariable
  # regenerate_branch
  # getinstancevariable
  # regenerate_branch
  0x559a18623064: mov rax, qword ptr [r13 + 0x18]
  # guard shape, embedded, and T_OBJECT
  0x559a18623068: mov rcx, qword ptr [rax]
  0x559a1862306b: movabs r11, 0xffff00000000201f
  0x559a18623075: and rcx, r11
  0x559a18623078: movabs r11, 0xb000000002001
  0x559a18623082: cmp rcx, r11
  0x559a18623085: jne 0x559a18625099
  0x559a1862308b: mov rax, qword ptr [rax + 0x20]
  0x559a1862308f: mov qword ptr [rbx + 8], rax
```

After this change, it's like this:

```
== BLOCK 1/4, ISEQ RANGE [0,3), 41 bytes ======================
  # getinstancevariable
  0x5560c986d023: mov rax, qword ptr [r13 + 0x18]
  # guard object is heap
  0x5560c986d027: test al, 7
  0x5560c986d02a: jne 0x5560c986f02d
  0x5560c986d030: cmp rax, 4
  0x5560c986d034: jbe 0x5560c986f02d
  # guard shape
  0x5560c986d03a: cmp word ptr [rax + 6], 0x19
  0x5560c986d03f: jne 0x5560c986f046
  0x5560c986d045: mov rax, qword ptr [rax + 0x10]
  0x5560c986d049: mov qword ptr [rbx], rax

== BLOCK 2/4, ISEQ RANGE [3,6), 0 bytes =======================
== BLOCK 3/4, ISEQ RANGE [3,6), 23 bytes ======================
  # gen_direct_jmp: fallthrough
  # getinstancevariable
  # regenerate_branch
  # getinstancevariable
  # regenerate_branch
  0x5560c986d04c: mov rax, qword ptr [r13 + 0x18]
  # guard shape
  0x5560c986d050: cmp word ptr [rax + 6], 0x19
  0x5560c986d055: jne 0x5560c986f099
  0x5560c986d05b: mov rax, qword ptr [rax + 0x18]
  0x5560c986d05f: mov qword ptr [rbx + 8], rax
```

The first ivar read is a bit more complex, but the second ivar read is
much simpler.  I think eventually we could teach the context about the
shape, then emit only one shape guard.
2022-11-18 12:04:10 -08:00
Jimmy Miller 98e9165b0a
Fix bug involving .send and overwritten methods. (#6752)
@casperisfine reporting a bug in this gist https://gist.github.com/casperisfine/d59e297fba38eb3905a3d7152b9e9350

After investigating I found it was caused by a combination of send and a c_func that we have overwritten in the JIT. For send calls, we need to do some stack manipulation before making the call. Because of the way exits works, we need to do that stack manipulation at the last possible moment. In this case, we weren't doing that stack manipulation at all. Unfortunately, with how the code is structured there isn't a great place to do that stack manipulation for our overridden C funcs.

Each overridden C func can return a boolean stating that it shouldn't be used. We would need to do the stack manipulation after all of those checks are done. We could pass a lambda(?) or separate out the logic for "can I run this override" from "now generate the code for it". Since we are coming up on a release, I went with the path of least resistence and just decided to not use these overrides if we are in a send call.

We definitely should revist this in the future.
2022-11-17 23:17:40 -05:00
Takashi Kokubun a777ec0d85
YJIT: Shrink version lists after mutation (#6749) 2022-11-16 16:30:39 -08:00
Takashi Kokubun 3259aceb35
YJIT: Pack BlockId and CodePtr (#6748) 2022-11-16 15:48:46 -08:00
Takashi Kokubun 1b8236acc2
YJIT: Add compiled_branch_count stats (#6746) 2022-11-16 15:31:13 -08:00
Takashi Kokubun 6de4032e40
YJIT: Stop wrapping CmePtr with CmeDependency (#6747)
* YJIT: Stop wrapping CmePtr with CmeDependency

* YJIT: Fix an outdated comment [ci skip]
2022-11-16 15:30:29 -08:00
Takashi Kokubun 3eb7a6521c
YJIT: Shrink the vectors of Block after mutation (#6739) 2022-11-16 10:09:15 -08:00
Takashi Kokubun 41b0f641ef
YJIT: Always encode Opnd::Value in 64 bits on x86_64 for GC offsets (#6733)
* YJIT: Always encode Opnd::Value in 64 bits on x86_64

for GC offsets

Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>

* Introduce heap_object_p

* Leave original mov intact

* Remove unneeded branches

* Add a test for movabs

Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
2022-11-15 15:23:20 -08:00
Takashi Kokubun 0d384ce6e6
YJIT: Include actual memory region size in stats (#6736) 2022-11-15 15:20:02 -08:00