Граф коммитов

69 Коммитов

Автор SHA1 Сообщение Дата
Peter Zhu 7577c101ed
Unify length field for embedded and heap strings (#7908)
* Unify length field for embedded and heap strings

The length field is of the same type and position in RString for both
embedded and heap allocated strings, so we can unify it.

* Remove RSTRING_EMBED_LEN
2023-06-06 10:19:20 -04:00
Nobuyoshi Nakada 8d242a33af
`rb_bug` prints a newline after the message 2023-05-20 21:43:30 +09:00
John Hawthorn 2dff1d4fda
YJIT: Fix raw sample stack lengths in exit traces (#7728)
yjit-trace-exits appends a synthetic sample for the instruction being
exited, but we didn't increment the size of the stack. Fixing this count
correctly lets us successfully generate a flamegraph from the exits.

I also replaced the line number for instructions with 0, as I don't
think the previous value had meaning.

Co-authored-by: Adam Hess <HParker@github.com>
2023-04-18 10:09:16 -04:00
Matt Valentine-House d91a82850a Pull the shape tree out of the vm object 2023-04-06 11:07:16 +01:00
Takashi Kokubun 1587494b0b
YJIT: Add codegen for Integer methods (#7665)
* YJIT: Add codegen for Integer methods

* YJIT: Update dependencies

* YJIT: Fix Integer#[] for argc=2
2023-04-05 13:19:31 -07:00
Takashi Kokubun 1b475fcd10 Remove an unneeded function copy 2023-04-01 23:09:05 -07:00
Maxime Chevalier-Boisvert 39a34694a0
YJIT: Add `--yjit-pause` and `RubyVM::YJIT.resume` (#7609)
* YJIT: Add --yjit-pause and RubyVM::YJIT.resume

This allows booting YJIT in a suspended state. We chose to add a new
command line option as opposed to simply allowing YJIT.resume to work
without any command line option because it allows for combining with
YJIT tuning command line options. It also simpifies implementation.

Paired with Kokubun and Maxime.

* Update yjit.rb

Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>

---------

Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
2023-03-28 15:21:19 -04:00
Alan Wu 35e9b5348d
YJIT: Constify EC to avoid an `as` pointer cast (#7591) 2023-03-24 12:36:06 -04:00
Takashi Kokubun 32e0c97dfa RJIT: Optimize String#bytesize 2023-03-18 23:35:42 -07:00
Takashi Kokubun 9fd94d6a0c
YJIT: Support entry for multiple PCs per ISEQ (GH-7535) 2023-03-17 11:53:17 -07:00
Takashi Kokubun 9947574b9c Refactor jit_func_t and jit_exec
I closed https://github.com/ruby/ruby/pull/7543, but part of the diff
seems useful regardless, so I extracted it.
2023-03-16 10:42:17 -07:00
Alan Wu de174681f7 YJIT: Assert that we have the VM lock while marking
Somewhat important because having the lock is a key part of the
soundness reasoning for the `unsafe` usage here.
2023-03-15 15:45:20 -04:00
Takashi Kokubun 70ba310212
YJIT: Introduce no_gc attribute (#7511) 2023-03-14 15:38:58 -07:00
Jimmy Miller 45127c84d9
YJIT: Handle rest+splat where non-splat < required (#7499) 2023-03-13 11:12:23 -04:00
Takashi Kokubun 94da5f7c36 Rename builtin attr :inline to :leaf 2023-03-11 14:25:12 -08:00
Takashi Kokubun 0c0c88d383 Support multiple attributes with Primitive.attr! 2023-03-11 14:19:46 -08:00
Jimmy Miller 56df6d5f9d
YJIT: Handle splat+rest for args pass greater than required (#7468)
For example:

```ruby
def my_func(x, y, *rest)
    p [x, y, rest]
end

my_func(1, 2, 3, *[4, 5])
```
2023-03-07 17:03:43 -05:00
Nobuyoshi Nakada ef00c6da88
Adjust `else` style to be consistent in each files [ci skip] 2023-02-26 13:20:43 +09:00
Peter Zhu 3e09822407 Fix incorrect line numbers in GC hook
If the previous instruction is not a leaf instruction, then the PC was
incremented before the instruction was ran (meaning the currently
executing instruction is actually the previous instruction), so we
should not increment the PC otherwise we will calculate the source
line for the next instruction.

This bug can be reproduced in the following script:

```
require "objspace"

ObjectSpace.trace_object_allocations_start
a =

  1.0 / 0.0
p [ObjectSpace.allocation_sourceline(a), ObjectSpace.allocation_sourcefile(a)]
```

Which outputs: [4, "test.rb"]

This is incorrect because the object was allocated on line 10 and not
line 4. The behaviour is correct when we use a leaf instruction (e.g.
if we replaced `1.0 / 0.0` with `"hello"`), then the output is:
[10, "test.rb"].

[Bug #19456]
2023-02-24 14:10:09 -05:00
Takashi Kokubun 21f9c92c71
YJIT: Show Context stats on exit (#7327) 2023-02-16 11:32:13 -08:00
Takashi Kokubun 15ef2b2d7c
YJIT: Optimize != for Integers and Strings (#7301) 2023-02-14 16:31:33 -05:00
Matt Valentine-House 72aba64fff Merge gc.h and internal/gc.h
[Feature #19425]
2023-02-09 10:32:29 -05:00
Jimmy Miller 1148fab7ae
YJIT: Handle splat with opt more fully (#7209)
* YJIT: Handle splat with opt more fully

* Update yjit/src/codegen.rs

---------

Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
2023-01-31 16:18:56 -05:00
Maxime Chevalier-Boisvert 6bb576fe75
YJIT: implement codegen for `String#empty?` (#7148)
YJIT: implement codegen for String#empty?
2023-01-18 15:41:28 -05:00
Takashi Kokubun c80edc9f98
YJIT: Add object shape count to stats (#6754) 2022-11-17 12:59:59 -08:00
Jimmy Miller 1a65ab20cb
Implement optimize call (#6691)
This dispatches to a c func for doing the dynamic lookup. I experimented with chain on the proc but wasn't able to detect which call sites would be monomorphic vs polymorphic. There is definitely room for optimization here, but it does reduce exits.
2022-11-08 15:28:28 -05:00
Takashi Kokubun 81e84e0a4d
YJIT: Support invokeblock (#6640)
* YJIT: Support invokeblock

* Update yjit/src/backend/arm64/mod.rs

* Update yjit/src/codegen.rs

Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
2022-11-02 12:30:48 -04:00
Noah Gibbs ee7c031dc4
YJIT: don't show a full crash report if mmap is only out of memory (#6659) 2022-11-02 11:16:26 -04:00
Takashi Kokubun 2b39640b0b
YJIT: Add RubyVM::YJIT.code_gc (#6644)
* YJIT: Add RubyVM::YJIT.code_gc

* Rename compiled_page_count to live_page_count
2022-10-31 14:29:45 -04:00
Matthew Draper c746f380f2
YJIT: Support nil and blockparamproxy as blockarg in send (#6492)
Co-authored-by: John Hawthorn <john@hawthorn.email>

Co-authored-by: John Hawthorn <john@hawthorn.email>
2022-10-26 15:27:59 -04:00
Takashi Kokubun b7644a2311
YJIT: GC and recompile all code pages (#6406)
when it fails to allocate a new page.

Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
2022-10-25 09:07:10 -07:00
Takashi Kokubun e7166c9bb7
Allow passing a Rust closure to rb_iseq_callback (#6575) 2022-10-18 09:07:11 -07:00
Takashi Kokubun e7c71c6c92
Make mjit_cont sharable with YJIT (#6556)
* Make mjit_cont sharable with YJIT

* Update dependencies

* Update YJIT binding
2022-10-17 09:27:59 -07:00
Tatsuya Kawano 07a93b1e37
YJIT: Do not call `mprotect` when `mem_size` is zero (#6563)
This allows x86_64 based YJIT to run on Docker Desktop on Apple silicon (arm64)
Mac because it will avoid a subtle behavior difference in `mprotect` system call
between the Linux kernel and `qemu-x86_64` user space emulator.
2022-10-17 12:26:36 -04:00
Jimmy Miller 467992ee35
Implement optimize send in yjit (#6488)
* Implement optimize send in yjit

This successfully makes all our benchmarks exit way less for optimize send reasons.
It makes some benchmarks faster, but not by as much as I'd like. I think this implementation
works, but there are definitely more optimial arrangements. For example, what if we compiled
send to a jump table? That seems like perhaps the most optimal we could do, but not obvious (to me)
how to implement give our current setup.

Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>

* Attempt at fixing the issues raised by @XrXr

* fix allowlist

* returns 0 instead of nil when not found

* remove comment about encoding exception

* Fix up c changes

* Update assert

Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>

* get rid of unneeded code and fix the flags

* Apply suggestions from code review

Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>

* rename and fix typo

Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
2022-10-11 16:37:05 -04:00
Alan Wu 7293bfe1bf
YJIT: add support for calling bmethods (#6489)
* YJIT: fix a parameter name

* YJIT: add support for calling bmethods

This commit adds support for the VM_METHOD_TYPE_BMETHOD method type in
YJIT. You can get these type of methods from facilities like
Kernel#define_singleton_method and Module#define_method.

Even though the body of these methods are blocks, the parameter setup
for them is exactly the same as VM_METHOD_TYPE_ISEQ, so we can reuse
the same logic in gen_send_iseq(). You can see this from how
vm_call_bmethod() eventually calls setup_parameters_complex() with
arg_setup_method.

Bmethods do need their frame environment to be setup differently. We
handle this by allowing callers of gen_send_iseq() to control the iseq,
the frame flag, and the prev_ep. The `prev_ep` goes into the same
location as the block handler would go into in an iseq method frame.

Co-authored-by: John Hawthorn <john@hawthorn.email>

Co-authored-by: John Hawthorn <john@hawthorn.email>
2022-10-04 22:48:05 -04:00
Noah Gibbs cc7f6fe734
YJIT should die if we compile on Aarch64 with no instruction cache clear available (#6380)
YJIT should die if we compile on ARM64 with no icache clear available
2022-09-15 10:14:27 -04:00
John Hawthorn f98d6d3f38
YJIT: Implement specialized respond_to? (#6363)
* Add rb_callable_method_entry_or_negative

* YJIT: Implement specialized respond_to?

This implements a specialized respond_to? in YJIT.

* Update yjit/src/codegen.rs

Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
2022-09-14 16:15:55 -04:00
Jimmy Miller 758a1d7302
Initial support for VM_CALL_ARGS_SPLAT (#6341)
* Initial support for VM_CALL_ARGS_SPLAT

This implements support for calls with splat (*) for some methods. In
benchmarks this made very little difference for most benchmarks, but a
large difference for binarytrees. Looking at side exits, many
benchmarks now don't exit for splat, but exit for some other
reason. Binarytrees however had a number of calls that used splat args
that are now much faster. In my non-scientific benchmarking this made
splat args performance on par with not using splat args at all.

* Fix wording and whitespace

Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>

* Get rid of side_effect reassignment

Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
2022-09-14 10:32:22 -04:00
Alan Wu 46007b88af A64: Only clear icache when writing out new code (https://github.com/Shopify/ruby/pull/442)
Previously we cleared the cache for all the code in the system when we
flip memory protection, which was prohibitively expensive since the
operation is not constant time. Instead, only clear the cache for the
memory region of newly written code when we write out new code.

This brings the runtime for the 30k_if_else test down to about 6 seconds
from the previous 45 seconds on my laptop.
2022-08-29 09:09:41 -07:00
Alan Wu 2f9df46654
Use bindgen for old manual extern declarations (https://github.com/Shopify/ruby/pull/404)
We have a large extern block in cruby.rs leftover from the port. We can
use bindgen for it now and reserve the manual declaration for just a
handful of vm_insnhelper.c functions.

Fixup a few minor discrepencies bindgen found between the C declaration
and the manual declaration. Mostly missing `const` on the C side.
2022-08-29 08:47:11 -07:00
Maxime Chevalier-Boisvert 4024553d13
Add ifdef to clear cache 2022-08-29 08:47:03 -07:00
Maxime Chevalier-Boisvert 7e22ec7439
Clear the icache on arm 2022-08-29 08:47:03 -07:00
Noah Gibbs b4be3c00c5 add --yjit-dump-iseqs param (https://github.com/Shopify/ruby/pull/332) 2022-08-24 10:42:45 -07:00
Matthew Draper ab08a43ec5
YJIT: Teach getblockparamproxy to handle the no-block case without exiting (#6191)
Teach getblockparamproxy to handle the no-block case without exiting

Co-authored-by: John Hawthorn <john@hawthorn.email>

Co-authored-by: John Hawthorn <john@hawthorn.email>
2022-07-28 11:38:07 -04:00
Nobuyoshi Nakada f42230ff22
Adjust styles [ci skip] 2022-07-27 18:42:27 +09:00
Eileen M. Uchitelle 59c6b7b7ab
Speed up --yjit-trace-exits code (#6106)
In a small script the speed of this feature isn't really noticeable but
on Rails it's very noticeable how slow this can be. This PR aims to
speed up two parts of the functionality.

1) The Rust exit recording code

Instead of adding all samples as we see them to the yjit_raw_samples and
yjit_line_samples, we can increment the counter on the ones we've seen
before. This will be faster on traces where we are hitting the same
stack often. In a crude measurement of booting just the active record
base test (`test/cases/base_test.rb`) we found that this improved the
speed by 1 second.

This also results in a smaller marshal dump file which sped up the test
boot time by 4 seconds with trace exits on.

2) The Ruby parsing code

Previously we were allocating new arrays using `shift` and
`each_with_index`. This change avoids allocating new arrays by using an
index. This change saves us the most amount of time, gaining 11 seconds.

Before this change the test boot time took 62 seconds, after it took 47
seconds. This is still too long but it's a step closer to faster
functionality. Next we're going to tackle allowing you to collect trace
exits for a specific instruction. There is also some potential slowness
in the GC code that I'd like to take a second look at.

Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>

Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
2022-07-12 16:40:49 -04:00
Dave Schwantes b6f6fc6e87
YJIT: Refactor gen_opt_mod (#6078)
Refactor gen_opt_mod in YJIT
2022-06-30 10:26:46 -04:00
Alan Wu 41a024f2b9 YJIT: Update note about symbol prefixes [ci skip] 2022-06-17 18:12:55 -04:00
Alan Wu 9f09397bfe
YJIT: On-demand executable memory allocation; faster boot (#5944)
This commit makes YJIT allocate memory for generated code gradually as
needed. Previously, YJIT allocates all the memory it needs on boot in
one go, leading to higher than necessary resident set size (RSS) and
time spent on boot initializing the memory with a large memset().

Users should no longer need to search for a magic number to pass to
`--yjit-exec-mem` since physical memory consumption should now more
accurately reflect the requirement of the workload.

YJIT now reserves a range of addresses on boot. This region start out
with no access permission at all so buggy attempts to jump to the region
crashes like before this change. To get this hardening at finer
granularity than the page size, we fill each page with trapping
instructions when we first allocate physical memory for the page.

Most of the time applications don't need 256 MiB of executable code, so
allocating on-demand ends up doing less total work than before. Case in
point, a simple `ruby --yjit-call-threshold=1 -eitself` takes about
half as long after this change. In terms of memory consumption, here is
a table to give a rough summary of the impact:

    | Peak RSS in MiB | -eitself example | railsbench once |
    | :-------------: | ---------------: | --------------: |
    |     before      |              265 |             377 |
    |      after      |               11 |             143 |
    |     no YJIT     |               10 |             101 |

A new module is introduced to handle allocation bookkeeping.
`CodePtr` is moved into the module since it has a close relationship
with the new `VirtualMemory` struct. This new interface has a slightly
smaller surface than before in that marking a region as writable is no
longer a public operation.
2022-06-14 10:23:13 -04:00