Граф коммитов

369 Коммитов

Автор SHA1 Сообщение Дата
Takashi Kokubun 6844bcc6b4 MJIT: Use a String buffer in builtin compilers
instead of FILE*.

Using C.fprintf is slower than String manipulation on memory. I'm going
to change the way MJIT writes files, and this is a prerequisite for it.
2022-11-27 21:11:33 -08:00
Maxime Chevalier-Boisvert d2fa67de81
YJIT: rename `InsnOpnd` => `YARVOpnd` (#6801)
Rename InsnOpnd => YARVOpnd

Make it more clear this refers to YARV insn/vm operands rather
than backend IR, x86 or ARM insn operands.
2022-11-24 10:30:28 -05:00
Alan Wu d92054e371 YJIT: Use a Box for branch targets to save memory
We frequently make branches that only have one target but we used to
always allocate space for two branch targets. This patch moves all the
information a branch target has into a struct and refer to them using
Option<Box<BranchTarget>>, this way when the second branch target is not
present it only takes 8 bytes.

Retained heap size on railsbench went from 16.17 MiB to 14.57 MiB, a
ratio of about 1.1.
2022-11-23 18:00:12 -05:00
Takashi Kokubun a50aabde9c
YJIT: Simplify Insn::CCall to obviate Target::FunPtr (#6793) 2022-11-23 12:14:43 -05:00
Takashi Kokubun d88adaad7e
YJIT: Use NonNull pointer for CodePtr (#6792) 2022-11-23 12:02:05 -05:00
Takashi Kokubun 9c36de3c48 YJIT: Stop passing target1 to gen_return_branch 2022-11-23 11:59:50 -05:00
Takashi Kokubun fe2bed6778
YJIT: Simplify code for RB_SPECIAL_CONST_P (#6795) 2022-11-23 11:59:02 -05:00
Jemma Issroff e82b15b660
Fix YJIT backend to account for unsigned int immediates (#6789)
YJIT: x86_64: Fix cmp with number where sign bit is set

Before this commit, we were unconditionally treating unsigned ints as
signed ints when counting the number of bits required for representing
the immediate in machine code. When the size of the immediate matches
the size of the other operand, no sign extension happens, so this was
incorrect. `asm.cmp(opnd64, 0x8000_0000)` panicked even though it's
encodable as `CMP r/m32, imm32`. Large shape ids were impacted by this
issue.

Co-Authored-By: Aaron Patterson <tenderlove@ruby-lang.org>
Co-Authored-By: Alan Wu <alanwu@ruby-lang.org>

Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
Co-authored-by: Alan Wu <alanwu@ruby-lang.org>
2022-11-23 10:48:17 -05:00
Takashi Kokubun 63f4a7a1ec
YJIT: Skip padding jumps to side exits on Arm (#6790)
YJIT: Skip padding jumps to side exits

Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>

Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
2022-11-22 15:57:17 -05:00
Takashi Kokubun 607fb49dbc
YJIT: Lower the required Rust version from 1.58.1 to 1.58.0 (#6780) 2022-11-21 10:27:39 -08:00
Takashi Kokubun 6dcb7b9216
YJIT: Improve the failure message on enlarging a branch (#6769) 2022-11-18 17:27:07 -08:00
Aaron Patterson 9e067df76b 32 bit comparison on shape id
This commit changes the shape id comparisons to use a 32 bit comparison
rather than 64 bit.  That means we don't need to load the shape id to a
register on x86 machines.

Given the following program:

```ruby
class Foo
  def initialize
    @foo = 1
    @bar = 1
  end

  def read
    [@foo, @bar]
  end
end

foo = Foo.new
foo.read
foo.read
foo.read
foo.read
foo.read

puts RubyVM::YJIT.disasm(Foo.instance_method(:read))
```

The machine code we generated _before_ this change is like this:

```
== BLOCK 1/4, ISEQ RANGE [0,3), 65 bytes ======================
  # getinstancevariable
  0x559a18623023: mov rax, qword ptr [r13 + 0x18]
  # guard object is heap
  0x559a18623027: test al, 7
  0x559a1862302a: jne 0x559a1862502d
  0x559a18623030: cmp rax, 4
  0x559a18623034: jbe 0x559a1862502d
  # guard shape, embedded, and T_OBJECT
  0x559a1862303a: mov rcx, qword ptr [rax]
  0x559a1862303d: movabs r11, 0xffff00000000201f
  0x559a18623047: and rcx, r11
  0x559a1862304a: movabs r11, 0xb000000002001
  0x559a18623054: cmp rcx, r11
  0x559a18623057: jne 0x559a18625046
  0x559a1862305d: mov rax, qword ptr [rax + 0x18]
  0x559a18623061: mov qword ptr [rbx], rax

== BLOCK 2/4, ISEQ RANGE [3,6), 0 bytes =======================
== BLOCK 3/4, ISEQ RANGE [3,6), 47 bytes ======================
  # gen_direct_jmp: fallthrough
  # getinstancevariable
  # regenerate_branch
  # getinstancevariable
  # regenerate_branch
  0x559a18623064: mov rax, qword ptr [r13 + 0x18]
  # guard shape, embedded, and T_OBJECT
  0x559a18623068: mov rcx, qword ptr [rax]
  0x559a1862306b: movabs r11, 0xffff00000000201f
  0x559a18623075: and rcx, r11
  0x559a18623078: movabs r11, 0xb000000002001
  0x559a18623082: cmp rcx, r11
  0x559a18623085: jne 0x559a18625099
  0x559a1862308b: mov rax, qword ptr [rax + 0x20]
  0x559a1862308f: mov qword ptr [rbx + 8], rax
```

After this change, it's like this:

```
== BLOCK 1/4, ISEQ RANGE [0,3), 41 bytes ======================
  # getinstancevariable
  0x5560c986d023: mov rax, qword ptr [r13 + 0x18]
  # guard object is heap
  0x5560c986d027: test al, 7
  0x5560c986d02a: jne 0x5560c986f02d
  0x5560c986d030: cmp rax, 4
  0x5560c986d034: jbe 0x5560c986f02d
  # guard shape
  0x5560c986d03a: cmp word ptr [rax + 6], 0x19
  0x5560c986d03f: jne 0x5560c986f046
  0x5560c986d045: mov rax, qword ptr [rax + 0x10]
  0x5560c986d049: mov qword ptr [rbx], rax

== BLOCK 2/4, ISEQ RANGE [3,6), 0 bytes =======================
== BLOCK 3/4, ISEQ RANGE [3,6), 23 bytes ======================
  # gen_direct_jmp: fallthrough
  # getinstancevariable
  # regenerate_branch
  # getinstancevariable
  # regenerate_branch
  0x5560c986d04c: mov rax, qword ptr [r13 + 0x18]
  # guard shape
  0x5560c986d050: cmp word ptr [rax + 6], 0x19
  0x5560c986d055: jne 0x5560c986f099
  0x5560c986d05b: mov rax, qword ptr [rax + 0x18]
  0x5560c986d05f: mov qword ptr [rbx + 8], rax
```

The first ivar read is a bit more complex, but the second ivar read is
much simpler.  I think eventually we could teach the context about the
shape, then emit only one shape guard.
2022-11-18 12:04:10 -08:00
Jimmy Miller 98e9165b0a
Fix bug involving .send and overwritten methods. (#6752)
@casperisfine reporting a bug in this gist https://gist.github.com/casperisfine/d59e297fba38eb3905a3d7152b9e9350

After investigating I found it was caused by a combination of send and a c_func that we have overwritten in the JIT. For send calls, we need to do some stack manipulation before making the call. Because of the way exits works, we need to do that stack manipulation at the last possible moment. In this case, we weren't doing that stack manipulation at all. Unfortunately, with how the code is structured there isn't a great place to do that stack manipulation for our overridden C funcs.

Each overridden C func can return a boolean stating that it shouldn't be used. We would need to do the stack manipulation after all of those checks are done. We could pass a lambda(?) or separate out the logic for "can I run this override" from "now generate the code for it". Since we are coming up on a release, I went with the path of least resistence and just decided to not use these overrides if we are in a send call.

We definitely should revist this in the future.
2022-11-17 23:17:40 -05:00
Takashi Kokubun a777ec0d85
YJIT: Shrink version lists after mutation (#6749) 2022-11-16 16:30:39 -08:00
Takashi Kokubun 3259aceb35
YJIT: Pack BlockId and CodePtr (#6748) 2022-11-16 15:48:46 -08:00
Takashi Kokubun 1b8236acc2
YJIT: Add compiled_branch_count stats (#6746) 2022-11-16 15:31:13 -08:00
Takashi Kokubun 6de4032e40
YJIT: Stop wrapping CmePtr with CmeDependency (#6747)
* YJIT: Stop wrapping CmePtr with CmeDependency

* YJIT: Fix an outdated comment [ci skip]
2022-11-16 15:30:29 -08:00
Takashi Kokubun 3eb7a6521c
YJIT: Shrink the vectors of Block after mutation (#6739) 2022-11-16 10:09:15 -08:00
Takashi Kokubun 41b0f641ef
YJIT: Always encode Opnd::Value in 64 bits on x86_64 for GC offsets (#6733)
* YJIT: Always encode Opnd::Value in 64 bits on x86_64

for GC offsets

Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>

* Introduce heap_object_p

* Leave original mov intact

* Remove unneeded branches

* Add a test for movabs

Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
2022-11-15 15:23:20 -08:00
Takashi Kokubun 0d384ce6e6
YJIT: Include actual memory region size in stats (#6736) 2022-11-15 15:20:02 -08:00
Takashi Kokubun d1fb659547
YJIT: Count getivar side exits by receiver flag changes (#6735) 2022-11-15 14:50:12 -08:00
Takashi Kokubun 1125274c4e
YJIT: Invalidate redefined methods only through cme (#6734)
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>

Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
2022-11-15 12:57:43 -08:00
Aaron Patterson e74d82f674 Implement LDURH on Aarch64
When RUBY_DEBUG is enabled, shape ids are 16 bits.  I would like to do
16 bit comparisons, so I need to load halfwords sometimes.  This commit
adds LDURH so that I can load halfwords.

  https://developer.arm.com/documentation/ddi0596/2021-12/Base-Instructions/LDURH--Load-Register-Halfword--unscaled--?lang=en

I verified the bytes using clang:

```
$ cat asmthing.s
.global _start
.align 2

_start:
  ldurh w10, [x1]
  ldurh w10, [x1, #123]
$ as asmthing.s -o asmthing.o && objdump --disassemble asmthing.o

asmthing.o:	file format mach-o arm64

Disassembly of section __TEXT,__text:

0000000000000000 <ltmp0>:
       0: 2a 00 40 78  	ldurh	w10, [x1]
       4: 2a b0 47 78  	ldurh	w10, [x1, #123]
```
2022-11-14 17:04:50 -08:00
Aaron Patterson b7d591643a Remove USE_RVARGC code
We don't need this constant to be exposed anymore, so remove it
2022-11-14 15:42:11 -08:00
Takashi Kokubun 6246788bc4
YJIT: Instrument global allocations on stats build (#6712)
* YJIT: Instrument global allocations on stats build

* Just use GLOVAL_ALLOCATOR.stats()
2022-11-13 12:54:41 -05:00
Takashi Kokubun d5e1b82f5c
YJIT: Remove unused src_ctx from Block (#6714) 2022-11-13 12:33:23 -05:00
Alan Wu 04c5adf806 YJIT: Fix staying in invalidated code after proc calls
Previously, there is no instruction boundary patch point after
the call to a non-leaf C function we generate for
OPTIMIZED_METHOD_TYPE_CALL. This meant that if code GC is triggered
while inside the C function, we would keep running invalidated code when
we return from the C function. This had the effect of running
stale branch stubs, jumping to bad code, etc.

Use jit_prepare_routine_call() to make sure we exit from the invalidated
region as soon as possible after the C call in case of invalidation.
2022-11-11 11:13:07 -05:00
Jimmy Miller 8b3347950e
Enable --yjit-stats for release builds (#6694)
* Enable --yjit-stats for release builds

In order for people in the real world to report information about how their application runs with YJIT, we want to expose stats without requiring rebuilding ruby. We can do this without overhead, with the exception of count ratio in yjit, since this relies on the interpreter also counting instructions.

This change exposes those stats, while not showing ratio in yjit if we are not in a stats build.

* Update yjit.rb

Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
2022-11-10 12:56:22 -05:00
Jemma Issroff c726c48a3d Remove numiv from RObject
Since object shapes store the capacity of an object, we no longer
need the numiv field on RObjects. This gives us one extra slot which
we can use to give embedded objects one more instance variable (for a
total of 3 ivs). This commit removes the concept of numiv from RObject.
2022-11-10 10:11:34 -05:00
Jemma Issroff 5246f4027e Transition shape when object's capacity changes
This commit adds a `capacity` field to shapes, and adds shape
transitions whenever an object's capacity changes. Objects which are
allocated out of a bigger size pool will also make a transition from the
root shape to the shape with the correct capacity for their size pool
when they are allocated.

This commit will allow us to remove numiv from objects completely, and
will also mean we can guarantee that if two objects share shapes, their
IVs are in the same positions (an embedded and extended object cannot
share shapes). This will enable us to implement ivar sets in YJIT using
object shapes.

Co-Authored-By: Aaron Patterson <tenderlove@ruby-lang.org>
2022-11-10 10:11:34 -05:00
Alan Wu 5d95cd99f4 YJIT: Reset dropped_bytes when patching code
We switch to a new page when we detect dropped_bytes flipping from false
to true. Previously, when we patch code for invalidation during code gc,
we start with the flag being set to true, so we failed to apply patches
that straddle pages. We would write out jumps half way and then stop,
which left the code corrupted.

Reset the flag before patching so we patch across pages properly.
2022-11-08 16:09:02 -05:00
Jimmy Miller 1a65ab20cb
Implement optimize call (#6691)
This dispatches to a c func for doing the dynamic lookup. I experimented with chain on the proc but wasn't able to detect which call sites would be monomorphic vs polymorphic. There is definitely room for optimization here, but it does reduce exits.
2022-11-08 15:28:28 -05:00
Takashi Kokubun 7442cb461b
YJIT: Free pages after ObjectSpace API usages (#6676) 2022-11-07 10:48:26 -05:00
Takashi Kokubun ea77aa2fd0
YJIT: Make Code GC metrics available for non-stats builds (#6665) 2022-11-03 13:41:35 -04:00
Takashi Kokubun 124f10f56b
YJIT: Fix a wrong type reference (#6661)
* YJIT: Fix a wrong type reference

* YJIT: Just remove CapturedSelfOpnd for now
2022-11-03 13:33:49 -04:00
Takashi Kokubun 68ef97d788
YJIT: Stop incrementing write_pos if cb.has_dropped_bytes (#6664)
Co-Authored-By: Alan Wu <alansi.xingwu@shopify.com>

Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
2022-11-03 11:42:28 -04:00
Takashi Kokubun 81e84e0a4d
YJIT: Support invokeblock (#6640)
* YJIT: Support invokeblock

* Update yjit/src/backend/arm64/mod.rs

* Update yjit/src/codegen.rs

Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
2022-11-02 12:30:48 -04:00
Takashi Kokubun 946bb34fb5
YJIT: Avoid accumulating freed pages in the payload (#6657)
Co-Authored-By: Alan Wu <alansi.xingwu@shopify.com>
Co-Authored-By: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>

Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
2022-11-02 11:14:31 -04:00
Takashi Kokubun 0d1e1987d1
YJIT: Visualize live ranges on register spill (#6651) 2022-11-01 15:05:36 -04:00
Alan Wu cbf15e5cbe YJIT: Add an assert to help with Context changes
While experimenting I found that it's easy to change Context and forget
to also change the copying operation in limit_block_versions(). Add an
assert to make sure we substitute a compatible generic context when
limiting the number of versions.
2022-11-01 14:16:32 -04:00
Alan Wu a70f90e1a9 YJIT: Delete redundant ways to make Context
Context::new() is the same as Context::default() and
Context::new_with_stack_size() was only used in tests.
2022-11-01 14:16:32 -04:00
Takashi Kokubun 2b39640b0b
YJIT: Add RubyVM::YJIT.code_gc (#6644)
* YJIT: Add RubyVM::YJIT.code_gc

* Rename compiled_page_count to live_page_count
2022-10-31 14:29:45 -04:00
Maxime Chevalier-Boisvert 5e6633fcf9
YJIT: reduce default `--yjit-exec-mem-size` to 128MiB instead of 256 (#6649)
Reduce default --yjit-exec-mem-size to 128MiB instead of 256
2022-10-31 14:29:11 -04:00
Alan Wu 9cf027f83a
YJIT: Use guard_known_class() for opt_aref on Arrays (#6643)
This code used to roll its own heap object check before we made a better
version in guard_known_class(). The improved version uses one fewer
comparison, so let's use that.
2022-10-27 18:52:58 -04:00
Matthew Draper c746f380f2
YJIT: Support nil and blockparamproxy as blockarg in send (#6492)
Co-authored-by: John Hawthorn <john@hawthorn.email>

Co-authored-by: John Hawthorn <john@hawthorn.email>
2022-10-26 15:27:59 -04:00
Takashi Kokubun fa0adbad92
YJIT: Invalidate i-cache for the other cb on next_page (#6631)
* YJIT: Invalidate i-cache for the other cb on next_page

* YJIT: Invalidate only what's written by jmp_ptr

* YJIT: Move the code to the arm64 backend
2022-10-26 11:29:12 -04:00
Takashi Kokubun b7644a2311
YJIT: GC and recompile all code pages (#6406)
when it fails to allocate a new page.

Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
2022-10-25 09:07:10 -07:00
Jemma Issroff a11952dac1 Rename `iv_count` on shapes to `next_iv_index`
`iv_count` is a misleading name because when IVs are unset, the new
shape doesn't decrement this value. `next_iv_count` is an accurate, and
more descriptive name.
2022-10-21 14:57:34 -07:00
Alan Wu 87bb0bee6b
YJIT: Fix page rounding for icache busting
Previously, we found the current page by rounding the current pointer to
the closest smaller page size. This is incorrect because pages are
relative to the start of the address we reserve. For example, if the
starting address is 12KiB modulo the 16KiB page size, once we have more
than 4KiB of code, calculating with the address would incorrectly give
us page 1 when we're actually still on page 0.

Previously, I can reproduce crashes with:

    make btest RUN_OPTS=--yjit-code-page-size=32

on ARM64 macOS, where system page sizes are 16KiB.
2022-10-21 17:06:34 -04:00
Alan Wu 8bbcb75377 YJIT: Read rb_num_t as usize early
This patch makes sure that we're not accidentally reading rb_num_t
instruction arguments as VALUE and accidentally baking them into
code and marking them. Some of these are simply moving the cast earlier,
but some of these avoid potential problems for flag and ID arguments.

Follow-up for 39f7eddec4.
2022-10-21 16:25:41 -04:00