Граф коммитов

883 Коммитов

Автор SHA1 Сообщение Дата
Peter Zhu ae8ef06580 [Feature #20470] Implement support for USE_SHARED_GC
This commit implements support to load Ruby's current GC as a DSO.
2024-07-03 09:03:40 -04:00
Peter Zhu 51bd816517 [Feature #20470] Split GC into gc_impl.c
This commit splits gc.c into two files:

- gc.c now only contains code not specific to Ruby GC. This includes
  code to mark objects (which the GC implementation may choose not to
  use) and wrappers for internal APIs that the implementation may need
  to use (e.g. locking the VM).

- gc_impl.c now contains the implementation of Ruby's GC. This includes
  marking, sweeping, compaction, and statistics. Most importantly,
  gc_impl.c only uses public APIs in Ruby and a limited set of functions
  exposed in gc.c. This allows us to build gc_impl.c independently of
  Ruby and plug Ruby's GC into itself.
2024-07-03 09:03:40 -04:00
yui-knk 9d76a0ab4a Add RB_GC_GUARD for ast_value
I think this change fixes the following assertion failure:

```
[BUG] unexpected rb_parser_ary_data_type (2114076960) for script lines
```

It seems that `ast_value` is collected then `rb_parser_build_script_lines_from`
touches invalid memory address.
This change prevents `ast_value` from being collected by RB_GC_GUARD.
2024-06-30 09:20:38 +09:00
Peter Zhu 90763e04ba Load external GC using command line argument
This commit changes the external GC to be loaded with the `--gc-library`
command line argument instead of the RUBY_GC_LIBRARY_PATH environment
variable because @nobu pointed out that loading binaries using environment
variables can pose a security risk.
2024-06-21 11:49:01 -04:00
Aaron Patterson cdf33ed5f3 Optimized forwarding callers and callees
This patch optimizes forwarding callers and callees. It only optimizes methods that only take `...` as their parameter, and then pass `...` to other calls.

Calls it optimizes look like this:

```ruby
def bar(a) = a
def foo(...) = bar(...) # optimized
foo(123)
```

```ruby
def bar(a) = a
def foo(...) = bar(1, 2, ...) # optimized
foo(123)
```

```ruby
def bar(*a) = a

def foo(...)
  list = [1, 2]
  bar(*list, ...) # optimized
end
foo(123)
```

All variants of the above but using `super` are also optimized, including a bare super like this:

```ruby
def foo(...)
  super
end
```

This patch eliminates intermediate allocations made when calling methods that accept `...`.
We can observe allocation elimination like this:

```ruby
def m
  x = GC.stat(:total_allocated_objects)
  yield
  GC.stat(:total_allocated_objects) - x
end

def bar(a) = a
def foo(...) = bar(...)

def test
  m { foo(123) }
end

test
p test # allocates 1 object on master, but 0 objects with this patch
```

```ruby
def bar(a, b:) = a + b
def foo(...) = bar(...)

def test
  m { foo(1, b: 2) }
end

test
p test # allocates 2 objects on master, but 0 objects with this patch
```

How does it work?
-----------------

This patch works by using a dynamic stack size when passing forwarded parameters to callees.
The caller's info object (known as the "CI") contains the stack size of the
parameters, so we pass the CI object itself as a parameter to the callee.
When forwarding parameters, the forwarding ISeq uses the caller's CI to determine how much stack to copy, then copies the caller's stack before calling the callee.
The CI at the forwarded call site is adjusted using information from the caller's CI.

I think this description is kind of confusing, so let's walk through an example with code.

```ruby
def delegatee(a, b) = a + b

def delegator(...)
  delegatee(...)  # CI2 (FORWARDING)
end

def caller
  delegator(1, 2) # CI1 (argc: 2)
end
```

Before we call the delegator method, the stack looks like this:

```
Executing Line | Code                                  | Stack
---------------+---------------------------------------+--------
              1| def delegatee(a, b) = a + b           | self
              2|                                       | 1
              3| def delegator(...)                    | 2
              4|   #                                   |
              5|   delegatee(...)  # CI2 (FORWARDING)  |
              6| end                                   |
              7|                                       |
              8| def caller                            |
          ->  9|   delegator(1, 2) # CI1 (argc: 2)     |
             10| end                                   |
```

The ISeq for `delegator` is tagged as "forwardable", so when `caller` calls in
to `delegator`, it writes `CI1` on to the stack as a local variable for the
`delegator` method.  The `delegator` method has a special local called `...`
that holds the caller's CI object.

Here is the ISeq disasm fo `delegator`:

```
== disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)>
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] "..."@0
0000 putself                                                          (   1)[LiCa]
0001 getlocal_WC_0                          "..."@0
0003 send                                   <calldata!mid:delegatee, argc:0, FCALL|FORWARDING>, nil
0006 leave                                  [Re]
```

The local called `...` will contain the caller's CI: CI1.

Here is the stack when we enter `delegator`:

```
Executing Line | Code                                  | Stack
---------------+---------------------------------------+--------
              1| def delegatee(a, b) = a + b           | self
              2|                                       | 1
              3| def delegator(...)                    | 2
           -> 4|   #                                   | CI1 (argc: 2)
              5|   delegatee(...)  # CI2 (FORWARDING)  | cref_or_me
              6| end                                   | specval
              7|                                       | type
              8| def caller                            |
              9|   delegator(1, 2) # CI1 (argc: 2)     |
             10| end                                   |
```

The CI at `delegatee` on line 5 is tagged as "FORWARDING", so it knows to
memcopy the caller's stack before calling `delegatee`.  In this case, it will
memcopy self, 1, and 2 to the stack before calling `delegatee`.  It knows how much
memory to copy from the caller because `CI1` contains stack size information
(argc: 2).

Before executing the `send` instruction, we push `...` on the stack.  The
`send` instruction pops `...`, and because it is tagged with `FORWARDING`, it
knows to memcopy (using the information in the CI it just popped):

```
== disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)>
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] "..."@0
0000 putself                                                          (   1)[LiCa]
0001 getlocal_WC_0                          "..."@0
0003 send                                   <calldata!mid:delegatee, argc:0, FCALL|FORWARDING>, nil
0006 leave                                  [Re]
```

Instruction 001 puts the caller's CI on the stack.  `send` is tagged with
FORWARDING, so it reads the CI and _copies_ the callers stack to this stack:

```
Executing Line | Code                                  | Stack
---------------+---------------------------------------+--------
              1| def delegatee(a, b) = a + b           | self
              2|                                       | 1
              3| def delegator(...)                    | 2
              4|   #                                   | CI1 (argc: 2)
           -> 5|   delegatee(...)  # CI2 (FORWARDING)  | cref_or_me
              6| end                                   | specval
              7|                                       | type
              8| def caller                            | self
              9|   delegator(1, 2) # CI1 (argc: 2)     | 1
             10| end                                   | 2
```

The "FORWARDING" call site combines information from CI1 with CI2 in order
to support passing other values in addition to the `...` value, as well as
perfectly forward splat args, kwargs, etc.

Since we're able to copy the stack from `caller` in to `delegator`'s stack, we
can avoid allocating objects.

I want to do this to eliminate object allocations for delegate methods.
My long term goal is to implement `Class#new` in Ruby and it uses `...`.

I was able to implement `Class#new` in Ruby
[here](https://github.com/ruby/ruby/pull/9289).
If we adopt the technique in this patch, then we can optimize allocating
objects that take keyword parameters for `initialize`.

For example, this code will allocate 2 objects: one for `SomeObject`, and one
for the kwargs:

```ruby
SomeObject.new(foo: 1)
```

If we combine this technique, plus implement `Class#new` in Ruby, then we can
reduce allocations for this common operation.

Co-Authored-By: John Hawthorn <john@hawthorn.email>
Co-Authored-By: Alan Wu <XrXr@users.noreply.github.com>
2024-06-18 09:28:25 -07:00
yui-knk 899d9f79dd Rename `vast` to `ast_value`
There is an English word "vast".
This commit changes the name to be more clear name to avoid confusion.
2024-05-03 12:40:35 +09:00
HASUMI Hitoshi 2244c58b00 [Universal parser] Decouple IMEMO from rb_ast_t
This patch removes the `VALUE flags` member from the `rb_ast_t` structure making `rb_ast_t` no longer an IMEMO object.

## Background

We are trying to make the Ruby parser generated from parse.y a universal parser that can be used by other implementations such as mruby.
To achieve this, it is necessary to exclude VALUE and IMEMO from parse.y, AST, and NODE.

## Summary (file by file)

- `rubyparser.h`
  - Remove the `VALUE flags` member from `rb_ast_t`
- `ruby_parser.c` and `internal/ruby_parser.h`
  - Use TypedData_Make_Struct VALUE which wraps `rb_ast_t` `in ast_alloc()` so that GC can manage it
    - You can retrieve `rb_ast_t` from the VALUE by `rb_ruby_ast_data_get()`
  - Change the return type of `rb_parser_compile_XXXX()` functions from `rb_ast_t *` to `VALUE`
  - rb_ruby_ast_new() which internally `calls ast_alloc()` is to create VALUE vast outside ruby_parser.c
- `iseq.c` and `vm_core.h`
  - Amend the first parameter of `rb_iseq_new_XXXX()` functions from `rb_ast_body_t *` to `VALUE`
  - This keeps the VALUE of AST on the machine stack to prevent being removed by GC
- `ast.c`
  - Almost all change is replacement `rb_ast_t *ast` with `VALUE vast` (sorry for the big diff)
  - Fix `node_memsize()`
    - Now it includes `rb_ast_local_table_link`, `tokens` and script_lines
- `compile.c`, `load.c`, `node.c`, `parse.y`, `proc.c`, `ruby.c`, `template/prelude.c.tmpl`, `vm.c` and `vm_eval.c`
  - Follow-up due to the above changes
- `imemo.{c|h}`
  - If an object with `imemo_ast` appears, considers it a bug

Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
2024-04-26 11:21:08 +09:00
Peter Zhu f248e1008a Embed rb_gc_function_map_t in rb_vm_t
Avoids a pointer indirection and memory allocation.
2024-04-25 09:25:33 -04:00
Peter Zhu 480287d140 Add macro load_external_gc_func for loading functions from external GC 2024-04-24 10:31:03 -04:00
Koichi Sasada 662ce928a7 `RUBY_TRY_UNUSED_BLOCK_WARNING_STRICT`
`RUBY_TRY_UNUSED_BLOCK_WARNING_STRICT=1 ruby ...` will enable
strict check for unused block warning.

This option is only for trial to compare the results so the
envname is not considered well.
Should be removed before Ruby 3.4.0 release.
2024-04-19 14:28:54 +09:00
Koichi Sasada e9d7478ded relax unused block warning for duck typing
if a method `foo` uses a block, other (unrelated) method `foo`
can receives a block. So try to relax the unused block warning
condition.

```ruby
      class C0
        def f = yield
      end

      class C1 < C0
        def f = nil
      end

      [C0, C1].f{ block } # do not warn
```
2024-04-17 20:26:49 +09:00
Matt Valentine-House 065710c0f5 Initialize external GC Library
Co-Authored-By: Peter Zhu <peter@peterzhu.ca>
2024-04-15 19:50:47 +01:00
HASUMI Hitoshi 9b1e97b211 [Universal parser] DeVALUE of p->debug_lines and ast->body.script_lines
This patch is part of universal parser work.

## Summary
- Decouple VALUE from members below:
  - `(struct parser_params *)->debug_lines`
  - `(rb_ast_t *)->body.script_lines`
- Instead, they are now `rb_parser_ary_t *`
  - They can also be a `(VALUE)FIXNUM` as before to hold line count
- `ISEQ_BODY(iseq)->variable.script_lines` remains VALUE
  - In order to do this,
  - Add `VALUE script_lines` param to `rb_iseq_new_with_opt()`
  - Introduce `rb_parser_build_script_lines_from()` to convert `rb_parser_ary_t *` into `VALUE`

## Other details
- Extend `rb_parser_ary_t *`. It previously could only store `rb_parser_ast_token *`, now can store script_lines, too
- Change tactics of building the top-level `SCRIPT_LINES__` in `yycompile0()`
  - Before: While parsing, each line of the script is added to `SCRIPT_LINES__[path]`
  - After: After `yyparse(p)`, `SCRIPT_LINES__[path]` will be built from `p->debug_lines`
- Remove the second parameter of `rb_parser_set_script_lines()` to make it simple
- Introduce `script_lines_free()` to be called from `rb_ast_free()` because the GC no longer takes care of the script_lines
- Introduce `rb_parser_string_deep_copy()` in parse.y to maintain script_lines when `rb_ruby_parser_free()` called
  - With regard to this, please see *Future tasks* below

## Future tasks
- Decouple IMEMO from `rb_ast_t *`
  - This lifts the five-members-restriction of Ruby object,
  - So we will be able to move the ownership of the `lex.string_buffer` from parser to AST
  - Then we remove `rb_parser_string_deep_copy()` to make the whole thing simple
2024-04-15 20:51:54 +09:00
Koichi Sasada 9180e33ca3 show warning for unused block
With verbopse mode (-w), the interpreter shows a warning if
a block is passed to a method which does not use the given block.

Warning on:

* the invoked method is written in C
* the invoked method is not `initialize`
* not invoked with `super`
* the first time on the call-site with the invoked method
  (`obj.foo{}` will be warned once if `foo` is same method)

[Feature #15554]

`Primitive.attr! :use_block` is introduced to declare that primitive
functions (written in C) will use passed block.

For minitest, test needs some tweak, so use
ea9caafc07
for `test-bundled-gems`.
2024-04-15 12:08:07 +09:00
KJ Tsanaktsidis 2535a09e85 Check ASAN fake stacks when marking non-current threads
Currently, we check the values on the machine stack & register state to
see if they're actually a pointer to an ASAN fake stack, and mark the
values on the fake stack too if required. However, we are only doing
that for the _current_ thread (the one actually running the GC), not for
any other thread in the program.

Make rb_gc_mark_machine_context (which is called for marking non-current
threads) perform the same ASAN fake stack handling that
mark_current_machine_context performs.

[Bug #20310]
2024-03-25 14:57:04 +11:00
KJ Tsanaktsidis 48d3bdddba Move asan_fake_stack_handle to EC, not thread
It's really a property of the EC; each fiber (which has its own EC) also
has its own asan_fake_stack_handle.

[Bug #20310]
2024-03-25 14:57:04 +11:00
Peter Zhu c2170e5c2b Fix typo from gloabl_object_list to global_object_list 2024-03-14 13:52:20 -04:00
Peter Zhu 4559a161af Move gloabl_object_list from objspace to VM
This is to be consistent with the mark_object_ary that is in the VM.
2024-03-14 13:29:59 -04:00
Jean Boussier d4f3dcf4df Refactor VM root modules
This `st_table` is used to both mark and pin classes
defined from the C API. But `vm->mark_object_ary` already
does both much more efficiently.

Currently a Ruby process starts with 252 rooted classes,
which uses `7224B` in an `st_table` or `2016B` in an `RArray`.

So a baseline of 5kB saved, but since `mark_object_ary` is
preallocated with `1024` slots but only use `405` of them,
it's a net `7kB` save.

`vm->mark_object_ary` is also being refactored.

Prior to this changes, `mark_object_ary` was a regular `RArray`, but
since this allows for references to be moved, it was marked a second
time from `rb_vm_mark()` to pin these objects.

This has the detrimental effect of marking these references on every
minors even though it's a mostly append only list.

But using a custom TypedData we can save from having to mark
all the references on minor GC runs.

Addtionally, immediate values are now ignored and not appended
to `vm->mark_object_ary` as it's just wasted space.
2024-03-06 15:33:43 -05:00
Kevin Newton 0f1ca9492c [PRISM] Provide runtime flag for prism in iseq 2024-02-21 11:44:40 -05:00
Peter Zhu 330830dd1a Add IMEMO_NEW
Rather than exposing that an imemo has a flag and four fields, this
changes the implementation to only expose one field (the klass) and
fills the rest with 0. The type will have to fill in the values themselves.
2024-02-21 11:33:05 -05:00
John Hawthorn 1c97abaaba De-dup identical callinfo objects
Previously every call to vm_ci_new (when the CI was not packable) would
result in a different callinfo being returned this meant that every
kwarg callsite had its own CI.

When calling, different CIs result in different CCs. These CIs and CCs
both end up persisted on the T_CLASS inside cc_tbl. So in an eval loop
this resulted in a memory leak of both types of object. This also likely
resulted in extra memory used, and extra time searching, in non-eval
cases.

For simplicity in this commit I always allocate a CI object inside
rb_vm_ci_lookup, but ideally we would lazily allocate it only when
needed. I hope to do that as a follow up in the future.
2024-02-20 18:55:00 -08:00
Nobuyoshi Nakada f3cc1f9a70
Show actual imemo type when unexpected type 2024-02-08 18:08:42 +09:00
Jeremy Evans 0f90a24a81 Introduce Allocationless Anonymous Splat Forwarding
Ruby makes it easy to delegate all arguments from one method to another:

```ruby
def f(*args, **kw)
  g(*args, **kw)
end
```

Unfortunately, this indirection decreases performance.  One reason it
decreases performance is that this allocates an array and a hash per
call to `f`, even if `args` and `kw` are not modified.

Due to Ruby's ability to modify almost anything at runtime, it's
difficult to avoid the array allocation in the general case. For
example, it's not safe to avoid the allocation in a case like this:

```ruby
def f(*args, **kw)
  foo(bar)
  g(*args, **kw)
end
```

Because `foo` may be `eval` and `bar` may be a string referencing `args`
or `kw`.

To fix this correctly, you need to perform something similar to escape
analysis on the variables.  However, there is a case where you can
avoid the allocation without doing escape analysis, and that is when
the splat variables are anonymous:

```ruby
def f(*, **)
  g(*, **)
end
```

When splat variables are anonymous, it is not possible to reference
them directly, it is only possible to use them as splats to other
methods.  Since that is the case, if `f` is called with a regular
splat and a keyword splat, it can pass the arguments directly to
`g` without copying them, avoiding allocation.  For example:

```ruby
def g(a, b:)
  a + b
end

def f(*, **)
  g(*, **)
end

a = [1]
kw = {b: 2}

f(*a, **kw)
```

I call this technique: Allocationless Anonymous Splat Forwarding.

This is implemented using a couple additional iseq param flags,
anon_rest and anon_kwrest.  If anon_rest is set, and an array splat
is passed when calling the method when the array splat can be used
without modification, `setup_parameters_complex` does not duplicate
it.  Similarly, if anon_kwest is set, and a keyword splat is passed
when calling the method, `setup_parameters_complex` does not
duplicate it.
2024-01-24 18:25:55 -08:00
Takashi Kokubun 27c1dd8634
YJIT: Allow inlining ISEQ calls with a block (#9622)
* YJIT: Allow inlining ISEQ calls with a block

* Leave a TODO comment about u16 inline_block
2024-01-23 19:36:23 +00:00
Nobuyoshi Nakada 127b19ab56 Use line numbers as builtin-index
The order of iseq may differ from the order of tokens, typically
`while`/`until` conditions are put after the body.

These orders can match by using line numbers as builtin-indexes, but
at the same time, it introduces the restriction that multiple `cexpr!`
and `cstmt!` cannot appear in the same line.

Another possible idea is to use `RubyVM::AbstractSyntaxTree` and
`node_id` instead of ripper, with making BASERUBY 3.1 or later.
2024-01-22 19:39:34 +09:00
KJ Tsanaktsidis 61da90c1b8 Mark asan fake stacks during machine stack marking
ASAN leaves a pointer to the fake frame on the stack; we can use the
__asan_addr_is_in_fake_stack API to work out the extent of the fake
stack and thus mark any VALUEs contained therein.

[Bug #20001]
2024-01-19 09:55:12 +11:00
KJ Tsanaktsidis 807714447e Pass down "stack start" variables from closer to the top of the stack
This commit changes how stack extents are calculated for both the main
thread and other threads. Ruby uses the address of a local variable as
part of the calculation for machine stack extents:

* pthreads uses it as a lower-bound on the start of the stack, because
  glibc (and maybe other libcs) can store its own data on the stack
  before calling into user code on thread creation.
* win32 uses it as an argument to VirtualQuery, which gets the extent of
  the memory mapping which contains the variable

However, the local being used for this is actually too low (too close to
the leaf function call) in both the main thread case and the new thread
case.

In the main thread case, we have the `INIT_STACK` macro, which is used
for pthreads to set the `native_main_thread->stack_start` value. This
value is correctly captured at the very top level of the program (in
main.c). However, this is _not_ what's used to set the execution context
machine stack (`th->ec->machine_stack.stack_start`); that gets set as
part of a call to `ruby_thread_init_stack` in `Init_BareVM`, using the
address of a local variable allocated _inside_ `Init_BareVM`. This is
too low; we need to use a local allocated closer to the top of the
program.

In the new thread case, the lolcal is allocated inside
`native_thread_init_stack`, which is, again, too low.

In both cases, this means that we might have VALUEs lying outside the
bounds of `th->ec->machine.stack_{start,end}`, which won't be marked
correctly by the GC machinery.

To fix this,

* In the main thread case: We already have `INIT_STACK` at the right
  level, so just pass that local var to `ruby_thread_init_stack`.
* In the new thread case: Allocate the local one level above the call to
  `native_thread_init_stack` in `call_thread_start_func2`.

[Bug #20001]

fix
2024-01-19 09:55:12 +11:00
Takashi Kokubun 8642a573e6 Rename BUILTIN_ATTR_SINGLE_NOARG_INLINE
to BUILTIN_ATTR_SINGLE_NOARG_LEAF

The attribute was created when the other attribute was called BUILTIN_ATTR_INLINE.
Now that the original attribute is renamed to BUILTIN_ATTR_LEAF, it's
only confusing that we call it "_INLINE".
2024-01-16 17:31:27 -08:00
Takashi Kokubun e37a37e696 Drop obsoleted BUILTIN_ATTR_NO_GC attribute
The thing that has used this in the past was very buggy, and we've never
revisied it. Let's remove it until we need it again.
2024-01-16 17:27:53 -08:00
KJ Tsanaktsidis 396e94666b Revert "Pass down "stack start" variables from closer to the top of the stack"
This reverts commit 4ba8f0dc99.
2024-01-12 17:58:54 +11:00
KJ Tsanaktsidis 688a6ff510 Revert "Mark asan fake stacks during machine stack marking"
This reverts commit d10bc3a2b8.
2024-01-12 17:58:54 +11:00
KJ Tsanaktsidis d10bc3a2b8 Mark asan fake stacks during machine stack marking
ASAN leaves a pointer to the fake frame on the stack; we can use the
__asan_addr_is_in_fake_stack API to work out the extent of the fake
stack and thus mark any VALUEs contained therein.

[Bug #20001]
2024-01-12 17:29:48 +11:00
KJ Tsanaktsidis 4ba8f0dc99 Pass down "stack start" variables from closer to the top of the stack
The implementation of `native_thread_init_stack` for the various
threading models can use the address of a local variable as part of the
calculation of the machine stack extents:

* pthreads uses it as a lower-bound on the start of the stack, because
  glibc (and maybe other libcs) can store its own data on the stack
  before calling into user code on thread creation.
* win32 uses it as an argument to VirtualQuery, which gets the extent of
  the memory mapping which contains the variable

However, the local being used for this is actually allocated _inside_
the `native_thread_init_stack` frame; that means the caller might
allocate a VALUE on the stack that actually lies outside the bounds
stored in machine.stack_{start,end}.

A local variable from one level above the topmost frame that stores
VALUEs on the stack must be drilled down into the call to
`native_thread_init_stack` to be used in the calculation. This probably
doesn't _really_ matter for the win32 case (they'll be in the same
memory mapping so VirtualQuery should return the same thing), but
definitely could matter for the pthreads case.

[Bug #20001]
2024-01-12 17:29:48 +11:00
Koichi Sasada d65d2fb6b5 Do not `poll` first
Before this patch, the MN scheduler waits for the IO with the
following steps:

1. `poll(fd, timeout=0)` to check fd is ready or not.
2. if fd is not ready, waits with MN thread scheduler
3. call `func` to issue the blocking I/O call

The advantage of advanced `poll()` is we can wait for the
IO ready for any fds. However `poll()` becomes overhead
for already ready fds.

This patch changes the steps like:

1. call `func` to issue the blocking I/O call
2. if the `func` returns `EWOULDBLOCK` the fd is `O_NONBLOCK`
   and we need to wait for fd is ready so that waits with MN
   thread scheduler.

In this case, we can wait only for `O_NONBLOCK` fds. Otherwise
it waits with blocking operations such as `read()` system call.
However we don't need to call `poll()` to check fd is ready
in advance.

With this patch we can observe performance improvement
on microbenchmark which repeats blocking I/O (not
`O_NONBLOCK` fd) with and without MN thread scheduler.

```ruby
require 'benchmark'

f = open('/dev/null', 'w')
f.sync = true

TN = 1
N = 1_000_000 / TN

Benchmark.bm{|x|
  x.report{
    TN.times.map{
      Thread.new{
        N.times{f.print '.'}
      }
    }.each(&:join)
  }
}
__END__
TN = 1
                 user     system      total        real
ruby32       0.393966   0.101122   0.495088 (  0.495235)
ruby33       0.493963   0.089521   0.583484 (  0.584091)
ruby33+MN    0.639333   0.200843   0.840176 (  0.840291) <- Slow
this+MN      0.512231   0.099091   0.611322 (  0.611074) <- Good
```
2024-01-05 05:51:25 +09:00
KJ Tsanaktsidis f8effa209a Change the semantics of rb_postponed_job_register
Our current implementation of rb_postponed_job_register suffers from
some safety issues that can lead to interpreter crashes (see bug #1991).
Essentially, the issue is that jobs can be called with the wrong
arguments.

We made two attempts to fix this whilst keeping the promised semantics,
but:
  * The first one involved masking/unmasking when flushing jobs, which
    was believed to be too expensive
  * The second one involved a lock-free, multi-producer, single-consumer
    ringbuffer, which was too complex

The critical insight behind this third solution is that essentially the
only user of these APIs are a) internal, or b) profiling gems.

For a), none of the usages actually require variable data; they will
work just fine with the preregistration interface.

For b), generally profiling gems only call a single callback with a
single piece of data (which is actually usually just zero) for the life
of the program. The ringbuffer is complex because it needs to support
multi-word inserts of job & data (which can't be atomic); but nobody
actually even needs that functionality, really.

So, this comit:
  * Introduces a pre-registration API for jobs, with a GVL-requiring
    rb_postponed_job_prereigster, which returns a handle which can be
    used with an async-signal-safe rb_postponed_job_trigger.
  * Deprecates rb_postponed_job_register (and re-implements it on top of
    the preregister function for compatability)
  * Moves all the internal usages of postponed job register
    pre-registration
2023-12-10 15:00:37 +09:00
Koichi Sasada 352a885a0f Thread specific storage APIs
This patch introduces thread specific storage APIs
for tools which use `rb_internal_thread_event_hook` APIs.

* `rb_internal_thread_specific_key_create()` to create a tool specific
  thread local storage key and allocate the storage if not available.
* `rb_internal_thread_specific_set()` sets a data to thread and tool
  specific storage.
* `rb_internal_thread_specific_get()` gets a data in thread and tool
  specific storage.

Note that `rb_internal_thread_specific_get|set(thread_val, key)`
can be called without GVL and safe for async signal and safe for
multi-threading (native threads). So you can call it in any internal
thread event hooks. Further more you can call it from other native
threads. Of course `thread_val` should be living while accessing the
data from this function.

Note that you should not forget to clean up the set data.
2023-12-08 13:16:19 +09:00
Yuta Saito 295d648f76 [wasm] Use xmalloc/xfree for jmpbuf allocation to trigger GC properly
`rb_vm_tag_jmpbuf_{init,deinit}` are safe to raise exception since the
given tag is not yet pushed to `ec->tag` or already popped from it at
the time, so `ec->tag` is always valid and it's safe to raise exception
when xmalloc fails.
2023-11-23 02:18:53 +09:00
Nobuyoshi Nakada 8f1ec6e171
Adjust spaces [ci skip] 2023-11-15 19:05:10 +09:00
Yuta Saito 50a5b76dec [wasm] allocate Asyncify setjmp buffer in heap
`rb_jmpbuf_t` type is considerably large due to inline-allocated
Asyncify buffer, and it leads to stack overflow even with small number
of C-method call frames. This commit allocates the Asyncify buffer used
by `rb_wasm_setjmp` in heap to mitigate the issue.

This patch introduces a new type `rb_vm_tag_jmpbuf_t` to abstract the
representation of a jump buffer, and init/deinit hook points to manage
lifetime of the buffer. These changes are effectively NFC for non-wasm
platforms.
2023-11-13 19:17:16 +09:00
Maxime Chevalier-Boisvert b2e1ddffa5
YJIT: port call threshold logic from Rust to C for performance (#8628)
* Port call threshold logic from Rust to C for performance

* Prefix global/field names with yjit_

* Fix linker error

* Fix preprocessor condition for rb_yjit_threshold_hit

* Fix third linker issue

* Exclude yjit_calls_at_interv from RJIT bindgen

---------

Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
2023-10-12 10:05:34 -04:00
Koichi Sasada be1bbd5b7d M:N thread scheduler for Ractors
This patch introduce M:N thread scheduler for Ractor system.

In general, M:N thread scheduler employs N native threads (OS threads)
to manage M user-level threads (Ruby threads in this case).
On the Ruby interpreter, 1 native thread is provided for 1 Ractor
and all Ruby threads are managed by the native thread.

From Ruby 1.9, the interpreter uses 1:1 thread scheduler which means
1 Ruby thread has 1 native thread. M:N scheduler change this strategy.

Because of compatibility issue (and stableness issue of the implementation)
main Ractor doesn't use M:N scheduler on default. On the other words,
threads on the main Ractor will be managed with 1:1 thread scheduler.

There are additional settings by environment variables:

`RUBY_MN_THREADS=1` enables M:N thread scheduler on the main ractor.
Note that non-main ractors use the M:N scheduler without this
configuration. With this configuration, single ractor applications
run threads on M:1 thread scheduler (green threads, user-level threads).

`RUBY_MAX_CPU=n` specifies maximum number of native threads for
M:N scheduler (default: 8).

This patch will be reverted soon if non-easy issues are found.

[Bug #19842]
2023-10-12 14:47:01 +09:00
Nobuyoshi Nakada 85984a53e8 Abort dumping when output failed 2023-09-25 22:57:28 +09:00
Nobuyoshi Nakada ac244938e8 Dump backtraces to an arbitrary stream 2023-09-25 22:57:28 +09:00
Alan Wu 39ee3e22bd Make Kernel#lambda raise when given non-literal block
Previously, Kernel#lambda returned a non-lambda proc when given a
non-literal block and issued a warning under the `:deprecated` category.
With this change, Kernel#lambda will always return a lambda proc, if it
returns without raising.

Due to interactions with block passing optimizations, we previously had
two separate code paths for detecting whether Kernel#lambda got a
literal block. This change allows us to remove one path, the hack done
with rb_control_frame_t::block_code introduced in 85a337f for supporting
situations where Kernel#lambda returned a non-lambda proc.

[Feature #19777]

Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
2023-09-12 11:25:07 -04:00
Maxime Chevalier-Boisvert 30a5b94517
YJIT: implement side chain fallback for setlocal to avoid exiting (#8227)
* YJIT: implement side chain fallback for setlocal to avoid exiting

* Update yjit/src/codegen.rs

Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>

---------

Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
2023-08-17 10:11:17 -04:00
Takashi Kokubun 7740526b1c Reorder bp_check and jit_return in cfp
It's the actual cfp[6] in the default build, so it's confusing to say
otherwise in the comment.
2023-08-11 17:57:04 -07:00
Takashi Kokubun cd8d20cd1f
YJIT: Compile exception handlers (#8171)
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
2023-08-08 16:06:22 -07:00
Koichi Sasada 280419d0e0 `calling->cd` instead of `calling->ci`
`struct rb_calling_info::cd` is introduced and `rb_calling_info::ci`
is replaced with it to manipulate the inline cache of iseq while
method invocation process. So that `ci` can be acessed with
`calling->cd->ci`. It adds one indirection but it can be justified
by the following points:

1) `vm_search_method_fastpath()` doesn't need `ci` and also
`vm_call_iseq_setup_normal()` doesn't need `ci`. It means
reducing `cd->ci` access in `vm_sendish()` can make it faster.

2) most of method types need to access `ci` once in theory
so that 1 additional indirection doesn't matter.
2023-07-31 17:13:43 +09:00
Takashi Kokubun 38be9a9b72
Clean up OPT_STACK_CACHING (#8132) 2023-07-27 17:27:05 -07:00