Граф коммитов

272 Коммитов

Автор SHA1 Сообщение Дата
Jimmy Miller 2387fbfb34
Fix splat args (#6385)
* Fix splat args

Cfuncs were not working properly so I disabled them right now.

There were some checks above that were also actually preventing splat args from being called.

Finally I did some basic code cleanup after realizing I didn't need to mutate argc so much

* Add can't compile for direct cfunc splat call

* Fix typo

* Update yjit/src/codegen.rs

Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
2022-09-16 13:37:15 +09:00
Maxime Chevalier-Boisvert 64a020324d
Add asm comments to make disasm more readable (#6377) 2022-09-15 10:12:27 -04:00
John Hawthorn f98d6d3f38
YJIT: Implement specialized respond_to? (#6363)
* Add rb_callable_method_entry_or_negative

* YJIT: Implement specialized respond_to?

This implements a specialized respond_to? in YJIT.

* Update yjit/src/codegen.rs

Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
2022-09-14 16:15:55 -04:00
Jimmy Miller 758a1d7302
Initial support for VM_CALL_ARGS_SPLAT (#6341)
* Initial support for VM_CALL_ARGS_SPLAT

This implements support for calls with splat (*) for some methods. In
benchmarks this made very little difference for most benchmarks, but a
large difference for binarytrees. Looking at side exits, many
benchmarks now don't exit for splat, but exit for some other
reason. Binarytrees however had a number of calls that used splat args
that are now much faster. In my non-scientific benchmarking this made
splat args performance on par with not using splat args at all.

* Fix wording and whitespace

Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>

* Get rid of side_effect reassignment

Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
2022-09-14 10:32:22 -04:00
Takashi Kokubun 8f37e9c918
YJIT: Add Opnd#with_num_bits to use only 8 bits (#6359)
* YJIT: Add Opnd#sub_opnd to use only 8 bits

* Add with_num_bits and let arm64_split use it

* Add another assertion to with_num_bits

* Use only with_num_bits
2022-09-14 10:27:52 -04:00
Nobuyoshi Nakada 075df960c9 Add comments to touch libyjit 2022-09-14 21:24:40 +09:00
Nobuyoshi Nakada e1a9d88494 Touch libyjit.a which may be still old due to the cache 2022-09-14 21:24:40 +09:00
Nobuyoshi Nakada f2429f0af5 Expand dependency for `$(YJIT_LIBS)`
Currently, miniruby is rebuild **always** when yjit is enabled, even
if nothing is changed.
2022-09-14 21:24:40 +09:00
John Hawthorn 5e39b3b844 YJIT: Branch directly when nil? is known from types 2022-09-09 20:29:40 -07:00
John Hawthorn d319184390 YJIT: Branch directly when truthyness is known 2022-09-09 20:29:40 -07:00
Maxime Chevalier-Boisvert 5b5c627d37
YJIT: eliminate redundant mov in csel/cmov on x86 (#6348)
* Eliminate redundant mov in csel/cmov. Translate mov reg,0 into xor

* Fix x86 asm test

* Remove dbg!()

* xor optimization unsound because it resets flags
2022-09-09 18:41:19 -04:00
Kevin Newton 848037cadd
Better offsets (#6315)
* Introduce InstructionOffset for AArch64

There are a lot of instructions on AArch64 where we take an offset
from PC in terms of the number of instructions. This is for loading
a value relative to the PC or for jumping.

We were usually accepting an A64Opnd or an i32. It can get
confusing and inconsistent though because sometimes you would
divide by 4 to get the number of instructions or multiply by 4 to
get the number of bytes.

This commit adds a struct that wraps an i32 in order to keep all of
that logic in one place. It makes it much easier to read and reason
about how these offsets are getting used.

* Use b instruction when the offset fits on AArch64
2022-09-09 11:37:41 -04:00
Kevin Newton 35cfc9a3bb
Remove as many unnecessary moves as possible (#6342)
This commit does a bunch of stuff to try to eliminate as many
unnecessary mov instructions as possible.

First, it introduces the Insn::LoadInto instruction. Previously
when we needed a value to go into a specific register (like in
Insn::CCall when we're putting values into the argument registers
or in Insn::CRet when we're putting a value into the return
register) we would first load the value and then mov it into the
correct register. This resulted in a lot of duplicated work with
short live ranges since they basically immediately we unnecessary.
The new instruction accepts a destination and does not interact
with the register allocator at all, making it much more efficient.

We then use the new instruction when we're loading values into
argument registers for AArch64 or X86_64, and when we're returning
a value from AArch64. Notably we don't do it when we're returning
a value from X86_64 because everything can be accomplished with a
single mov anyway.

A couple of unnecessary movs were also present because when we
called the split_load_opnd function in a lot of split passes we
were loading all registers and instruction outputs. We no longer do
that.

This commit also makes it so that UImm(0) passes through the
Insn::Store split without attempting to be loaded, which allows it
can take advantage of the zero register. So now instead of mov-ing
0 into a register and then calling store, it just stores XZR.
2022-09-08 17:09:50 -04:00
Kevin Newton 9b48edd932
Allow comparing against 64-bit immediates on x86 (#6314) 2022-09-01 22:14:23 -04:00
John Hawthorn 1cc97412cd Remove rb_iseq_each 2022-09-01 15:20:49 -07:00
John Hawthorn 679ef34586 New constant caching insn: opt_getconstant_path
Previously YARV bytecode implemented constant caching by having a pair
of instructions, opt_getinlinecache and opt_setinlinecache, wrapping a
series of getconstant calls (with putobject providing supporting
arguments).

This commit replaces that pattern with a new instruction,
opt_getconstant_path, handling both getting/setting the inline cache and
fetching the constant on a cache miss.

This is implemented by storing the full constant path as a
null-terminated array of IDs inside of the IC structure. idNULL is used
to signal an absolute constant reference.

    $ ./miniruby --dump=insns -e '::Foo::Bar::Baz'
    == disasm: #<ISeq:<main>@-e:1 (1,0)-(1,13)> (catch: FALSE)
    0000 opt_getconstant_path                   <ic:0 ::Foo::Bar::Baz>      (   1)[Li]
    0002 leave

The motivation for this is that we had increasingly found the need to
disassemble the instructions between the opt_getinlinecache and
opt_setinlinecache in order to determine the constant we are fetching,
or otherwise store metadata.

This disassembly was done:
* In opt_setinlinecache, to register the IC against the constant names
  it is using for granular invalidation.
* In rb_iseq_free, to unregister the IC from the invalidation table.
* In YJIT to find the position of a opt_getinlinecache instruction to
  invalidate it when the cache is populated
* In YJIT to register the constant names being used for invalidation.

With this change we no longe need disassemly for these (in fact
rb_iseq_each is now unused), as the list of constant names being
referenced is held in the IC. This should also make it possible to make
more optimizations in the future.

This may also reduce the size of iseqs, as previously each segment
required 32 bytes (on 64-bit platforms) for each constant segment. This
implementation only stores one ID per-segment.

There should be no significant performance change between this and the
previous implementation. Previously opt_getinlinecache was a "leaf"
instruction, but it included a jump (almost always to a separate cache
line). Now opt_getconstant_path is a non-leaf (it may
raise/autoload/call const_missing) but it does not jump. These seem to
even out.
2022-09-01 15:20:49 -07:00
Takashi Kokubun 064944c902
Stop using a callee-saved register for scratch0 on aarch64 (#6312)
[Bug #18985]

* Callee-save x22 for aarch64

* Just use a caller-saved register

* Update yjit/src/backend/arm64/mod.rs

Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
2022-09-01 13:38:38 -07:00
Takashi Kokubun 4144abee42
Let --yjit-dump-disasm=all dump ocb code as well (#6309)
* Let --yjit-dump-disasm=all dump ocb code as well

* Use an enum instead

* Add a None Option to DumpDisasm (#444)

* Add a None Option to DumpDisasm

* Update yjit/src/asm/mod.rs

Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>

* Fix a build failure

* Use only a single name

* Only None will be a disabled case

* Fix cargo test

* Fix --yjit-dump-disasm=all to print outlined cb

Co-authored-by: Jimmy Miller <jimmyhmiller@gmail.com>
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
2022-09-01 11:55:39 -07:00
Kevin Newton be55b77cc7
Better b.cond usage on AArch64 (#6305)
* Better b.cond usage on AArch64

When we're lowering a conditional jump, we previously had a bit of
a complicated setup where we could emit a conditional jump to skip
over a jump that was the next instruction, and then write out the
destination and use a branch register.

Now instead we use the b.cond instruction if our offset fits (not
common, but not unused either) and if it doesn't we write out an
inverse condition to jump past loading the destination and
branching directly.

* Added an inverse fn for Condition (#443)

Prevents the need to pass two params and potentially reduces errors.

Co-authored-by: Jimmy Miller <jimmyhmiller@jimmys-mbp.lan>

Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
Co-authored-by: Jimmy Miller <jimmyhmiller@jimmys-mbp.lan>
2022-08-31 15:44:26 -04:00
Takashi Kokubun 5dbc725f4d Skip linking rb_yjit_icache_invalidate on cargo test
Co-authored-by: Kevin Newton <kddnewton@gmail.com>
2022-08-30 14:21:43 -07:00
Takashi Kokubun ddca3482ef
Check only symbol flag bits (#6301)
* Check only symbol flag bits

* Check all 4 bits
2022-08-29 21:05:06 -04:00
Kevin Newton d694f320e4 Fixed width immediates (https://github.com/Shopify/ruby/pull/437)
There are a lot of times when encoding AArch64 instructions that we
need to represent an integer value with a custom fixed width. For
example, the offset for a B instruction is 26 bits, so we store an
i32 on the instruction struct and then mask it when we encode.

We've been doing this masking everywhere, which has worked, but
it's getting a bit copy-pasty all over the place. This commit
centralizes that logic to make sure we stay consistent.
2022-08-29 09:09:41 -07:00
Alan Wu 46007b88af A64: Only clear icache when writing out new code (https://github.com/Shopify/ruby/pull/442)
Previously we cleared the cache for all the code in the system when we
flip memory protection, which was prohibitively expensive since the
operation is not constant time. Instead, only clear the cache for the
memory region of newly written code when we write out new code.

This brings the runtime for the 30k_if_else test down to about 6 seconds
from the previous 45 seconds on my laptop.
2022-08-29 09:09:41 -07:00
Kevin Newton 29e0713a12 TBZ and TBNZ for AArch64 (https://github.com/Shopify/ruby/pull/434) 2022-08-29 09:09:41 -07:00
Kevin Newton 44c6bcff1d LDRH and STRH for AArch64 (https://github.com/Shopify/ruby/pull/438) 2022-08-29 09:09:41 -07:00
Maxime Chevalier-Boisvert 929a6a75eb Remove ir_ssa.rs as we aren't using it and it's now outdated 2022-08-29 09:09:41 -07:00
Takashi Kokubun def3ade8a8 Add --yjit-dump-disasm to dump every compiled code (https://github.com/Shopify/ruby/pull/430)
* Add --yjit-dump-disasm to dump every compiled code

* Just use get_option

* Carve out disasm_from_addr

* Avoid push_str with format!

* Share the logic through asm.compile

* This seems to negatively impact the compilation speed
2022-08-29 09:09:41 -07:00
Kevin Newton 54c7bc67a2 Various AArch64 optimizations (https://github.com/Shopify/ruby/pull/433)
* When we're storing an immediate 0 value at a memory address, we
  can use STUR XZR, Xd instead of loading 0 into a register and
  then storing that register.
* When we're moving 0 into an argument register, we can use
  MOV Xd, XZR instead of loading the value into a register first.
* In the newarray instruction, we can skip looking at the stack at
  all if the number of values we're using is 0.
2022-08-29 09:09:41 -07:00
Noah Gibbs 93c5a5f023 Fix and re-enable String to_s, << and unary plus (https://github.com/Shopify/ruby/pull/429) 2022-08-29 09:09:41 -07:00
Alan Wu 29bda0ff81 Use shorter syntax for the same pattern (https://github.com/Shopify/ruby/pull/425) 2022-08-29 09:09:41 -07:00
Kevin Newton 932885244e Better variable name, no must_use on ccall (https://github.com/Shopify/ruby/pull/424) 2022-08-29 09:09:41 -07:00
Kevin Newton f883aabc13 Instruction enum (https://github.com/Shopify/ruby/pull/423)
* Remove references to explicit instruction parts

Previously we would reference individual instruction fields
manually. We can't do that with instructions that are enums, so
this commit removes those references. As a side effect, we can
remove the push_insn_parts() function from the assembler because we
now explicitly push instruction structs every time.

* Switch instructions to enum

Instructions are now no longer a large struct with a bunch of
optional fields. Instead they are an enum with individual shapes
for the variants.

In terms of size, the instruction struct was 120 bytes while the
new instruction enum is 106 bytes. The bigger win however is that
we're not allocating any vectors for instruction operands (except
for CCall), which should help cut down on memory usage.

Adding new instructions will be a little more complicated going
forward, but every mission-critical function that needs to be
touched will have an exhaustive match, so the compiler should guide
any additions.
2022-08-29 09:09:41 -07:00
Kevin Newton 1c67e90bde More work toward instruction enum (https://github.com/Shopify/ruby/pull/421)
* Operand iterators

There are a couple of times when we're dealing with instructions
that we need to iterate through their operands. At the moment this
is relatively easy because there's an opnds field and we can work
with it directly. When the instructions become enums, however, the
shape of each variant will be different so we'll need an iterator
to make sense of the shape.

This commit introduces two new iterators that are created from an
instruction. One iterates over references to each operand (for
instances where they don't need to be mutable like updating live
ranges) and one iterates over mutable references to each operand
(for instances where you need to mutate them like loading values in
arm64).

Note that because iterators can't have generic items (i.e., be
associated with lifetimes) the mutable iterator forces you to use
the `while let Some` syntax as opposed to the for-loop like we did
with instructions.

This commit eliminates the last reference to insn.opnds, which is
going to make it much easier to transition to an enum.

* Consolidate output operand fetching

Currently we always look at the .out field on instructions whenever
we want to access the output operand. When the instructions become
an enum, this is not going to be possible since the shape of the
variants will be different. Instead, this commit introduces two
functions on Insn: out_opnd() and out_opnd_mut(). These return an
Option containing a reference to the output operand and a mutable
reference to the output operand, respectively.

This commit then uses those functions to replace all instances of
accessing the output operand. For the most part this was
straightforward; when we previously checked if it was Opnd::None
we now check that it's None, when we assumed there was an output
operand we now unwrap.
2022-08-29 09:09:41 -07:00
Alan Wu 342459576d Use VALUE for callinfos that are on the heap (https://github.com/Shopify/ruby/pull/420)
Yet another case of `jit_mov_gc_ptr()` being yanked out during the
transition to the new backend, causing a crash after object movement.
The intresting wrinkle with this one is that not all callinfos are GC'ed
objects, so the old code had an implicit assumption.

b0b9f7201a/yjit/src/codegen.rs (L4087-L4095)
2022-08-29 09:09:41 -07:00
Takashi Kokubun 5114ddce3f Avoid marking op_type on gen_defined (https://github.com/Shopify/ruby/pull/419) 2022-08-29 09:09:41 -07:00
Takashi Kokubun a78bbef12f Use VALUE for block_iseq (https://github.com/Shopify/ruby/pull/417)
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
2022-08-29 09:09:41 -07:00
Takashi Kokubun e0e63b1a01 Fix a bus error on regenerate_branch (https://github.com/Shopify/ruby/pull/408)
* Fix a bus error on regenerate_branch

* Fix pad_size
2022-08-29 09:09:41 -07:00
Kevin Newton b00606eb64 Even more prep for instruction enum (https://github.com/Shopify/ruby/pull/413)
* Mutate in place for register allocation

Currently we allocate a new instruction every time when we're
doing register allocation by first splitting up the instruction
into its component parts, mapping the operands and the output, and
then pushing all of its parts onto the new assembler.

Since we don't need the old instruction, we can mutate the existing
one in place. While it's not that big of a win in and of itself, it
matches much more closely to what we're going to have to do when we
switch the instruction from being a struct to being an enum,
because it's much easier for the instruction to modify itself since
it knows its own shape than it is to push a new instruction that
very closely matches.

* Mutate in place for arm64 split

When we're splitting instructions for the arm64 backend, we map all
of the operands for a given instruction when it has an Opnd::Value.
We can do this in place with the existing operand instead of
allocating a new vector each time. This enables us to pattern match
against the entire instruction instead of just the opcode, which is
much closer to matching against an enum.

* Match against entire instruction in arm64_emit

Instead of matching against the opcode and then accessing all of
the various fields on the instruction when emitting bytecode for
arm64, we should instead match against the entire instruction.
This makes it much closer to what's going to happen when we switch
it over to being an enum.

* Match against entire instruction in x86_64 backend

When we're splitting or emitting code for x86_64, we should match
against the entire instruction instead of matching against just the
opcode. This gets us closer to matching against an enum instead of
a struct.

* Reuse instructions for arm64_split

When we're splitting, the default behavior was previously to split
up the instruction into its component parts and then reassemble
them in a new instruction. Instead, we can reuse the existing
instruction.
2022-08-29 09:09:41 -07:00
Alan Wu c70d1471c1 Only check lowest bit for _Bool type (https://github.com/Shopify/ruby/pull/412)
* Only check lowest bit for _Bool type

The `test AL, AL` got lost during porting and we were
generating `test RAX, RAX` instead. The upper bits of a `_Bool` return
type is unspecified and we were failing
`TestClass#test_singleton_class_should_has_own_namespace`
due to interpreterting the return value incorrectly.

* Enable test_class for test-all on x86_64
2022-08-29 09:09:41 -07:00
Kevin Newton d57a9f61a0
Build output operands explicitly (https://github.com/Shopify/ruby/pull/411)
When we're pushing instructions onto the assembler, we previously
would iterate through the instruction's operands and then assign
the output operand to it through the push_insn function. This is
easy when all instructions have a vector of operands, but is much
more difficult when the shape differs in an enum.

This commit changes it so that we explicitly define the output
operand for each instruction before it gets pushed onto the
assembler. This has the added benefit of changing the definition
of push_insn to no longer require a mutable instruction.

This paves the way to make the out field on the instructions an
Option<Opnd> instead which is going to more accurately reflect
the behavior we're going to have once we switch the instructions
over to an enum instead of a struct.
2022-08-29 08:47:11 -07:00
Kevin Newton b735eb5ef3
Instruction builders for backend IR (https://github.com/Shopify/ruby/pull/410)
Currently we use macros to define the shape of each of the
instruction building methods. This works while all of the
instructions share the same fields, but is really hard to get
working when they're an enum with different shapes. This is an
incremental step toward a bigger refactor of changing the Insn
from a struct to an enum.
2022-08-29 08:47:11 -07:00
Maxime Chevalier-Boisvert 1cf9f56c55
Fix issue with expandarray, add missing jl, enable tests (https://github.com/Shopify/ruby/pull/409) 2022-08-29 08:47:11 -07:00
Maxime Chevalier-Boisvert 95dce1ccac
Temporarily disable rb_str_concat, add CI tests (https://github.com/Shopify/ruby/pull/407)
Make sure we can load the test-all runner and run test_yjit.rb
2022-08-29 08:47:11 -07:00
Noah Gibbs (and/or Benchmark CI) 09c12111d4
Port jit_rb_str_concat to new backend, re-enable cfunc lookup (https://github.com/Shopify/ruby/pull/402) 2022-08-29 08:47:11 -07:00
Maple Ong 5a76a15a0f
YJIT: Implement concatarray in yjit (https://github.com/Shopify/ruby/pull/405)
* Create code generation func

* Make rb_vm_concat_array available to use in Rust

* Map opcode to code gen func

* Implement code gen for concatarray

* Add test for concatarray

* Use new asm backend

* Add comment to C func wrapper
2022-08-29 08:47:11 -07:00
Alan Wu 2f9df46654
Use bindgen for old manual extern declarations (https://github.com/Shopify/ruby/pull/404)
We have a large extern block in cruby.rs leftover from the port. We can
use bindgen for it now and reserve the manual declaration for just a
handful of vm_insnhelper.c functions.

Fixup a few minor discrepencies bindgen found between the C declaration
and the manual declaration. Mostly missing `const` on the C side.
2022-08-29 08:47:11 -07:00
Kevin Newton ff3f1d15d2
Optimize bitmask immediates (https://github.com/Shopify/ruby/pull/403) 2022-08-29 08:47:11 -07:00
Kevin Newton be730cdae5
AArch64 Ruby immediates (https://github.com/Shopify/ruby/pull/400) 2022-08-29 08:47:10 -07:00
Maxime Chevalier-Boisvert c022a60540
Fix bugs in gen_opt_getinlinecache 2022-08-29 08:47:10 -07:00
Zack Deveau cb15886e61
Port opt_getinlinecache to the new backend (https://github.com/Shopify/ruby/pull/399) 2022-08-29 08:47:10 -07:00