Граф коммитов

1184 Коммитов

Автор SHA1 Сообщение Дата
Aaron Patterson cdf33ed5f3 Optimized forwarding callers and callees
This patch optimizes forwarding callers and callees. It only optimizes methods that only take `...` as their parameter, and then pass `...` to other calls.

Calls it optimizes look like this:

```ruby
def bar(a) = a
def foo(...) = bar(...) # optimized
foo(123)
```

```ruby
def bar(a) = a
def foo(...) = bar(1, 2, ...) # optimized
foo(123)
```

```ruby
def bar(*a) = a

def foo(...)
  list = [1, 2]
  bar(*list, ...) # optimized
end
foo(123)
```

All variants of the above but using `super` are also optimized, including a bare super like this:

```ruby
def foo(...)
  super
end
```

This patch eliminates intermediate allocations made when calling methods that accept `...`.
We can observe allocation elimination like this:

```ruby
def m
  x = GC.stat(:total_allocated_objects)
  yield
  GC.stat(:total_allocated_objects) - x
end

def bar(a) = a
def foo(...) = bar(...)

def test
  m { foo(123) }
end

test
p test # allocates 1 object on master, but 0 objects with this patch
```

```ruby
def bar(a, b:) = a + b
def foo(...) = bar(...)

def test
  m { foo(1, b: 2) }
end

test
p test # allocates 2 objects on master, but 0 objects with this patch
```

How does it work?
-----------------

This patch works by using a dynamic stack size when passing forwarded parameters to callees.
The caller's info object (known as the "CI") contains the stack size of the
parameters, so we pass the CI object itself as a parameter to the callee.
When forwarding parameters, the forwarding ISeq uses the caller's CI to determine how much stack to copy, then copies the caller's stack before calling the callee.
The CI at the forwarded call site is adjusted using information from the caller's CI.

I think this description is kind of confusing, so let's walk through an example with code.

```ruby
def delegatee(a, b) = a + b

def delegator(...)
  delegatee(...)  # CI2 (FORWARDING)
end

def caller
  delegator(1, 2) # CI1 (argc: 2)
end
```

Before we call the delegator method, the stack looks like this:

```
Executing Line | Code                                  | Stack
---------------+---------------------------------------+--------
              1| def delegatee(a, b) = a + b           | self
              2|                                       | 1
              3| def delegator(...)                    | 2
              4|   #                                   |
              5|   delegatee(...)  # CI2 (FORWARDING)  |
              6| end                                   |
              7|                                       |
              8| def caller                            |
          ->  9|   delegator(1, 2) # CI1 (argc: 2)     |
             10| end                                   |
```

The ISeq for `delegator` is tagged as "forwardable", so when `caller` calls in
to `delegator`, it writes `CI1` on to the stack as a local variable for the
`delegator` method.  The `delegator` method has a special local called `...`
that holds the caller's CI object.

Here is the ISeq disasm fo `delegator`:

```
== disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)>
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] "..."@0
0000 putself                                                          (   1)[LiCa]
0001 getlocal_WC_0                          "..."@0
0003 send                                   <calldata!mid:delegatee, argc:0, FCALL|FORWARDING>, nil
0006 leave                                  [Re]
```

The local called `...` will contain the caller's CI: CI1.

Here is the stack when we enter `delegator`:

```
Executing Line | Code                                  | Stack
---------------+---------------------------------------+--------
              1| def delegatee(a, b) = a + b           | self
              2|                                       | 1
              3| def delegator(...)                    | 2
           -> 4|   #                                   | CI1 (argc: 2)
              5|   delegatee(...)  # CI2 (FORWARDING)  | cref_or_me
              6| end                                   | specval
              7|                                       | type
              8| def caller                            |
              9|   delegator(1, 2) # CI1 (argc: 2)     |
             10| end                                   |
```

The CI at `delegatee` on line 5 is tagged as "FORWARDING", so it knows to
memcopy the caller's stack before calling `delegatee`.  In this case, it will
memcopy self, 1, and 2 to the stack before calling `delegatee`.  It knows how much
memory to copy from the caller because `CI1` contains stack size information
(argc: 2).

Before executing the `send` instruction, we push `...` on the stack.  The
`send` instruction pops `...`, and because it is tagged with `FORWARDING`, it
knows to memcopy (using the information in the CI it just popped):

```
== disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)>
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] "..."@0
0000 putself                                                          (   1)[LiCa]
0001 getlocal_WC_0                          "..."@0
0003 send                                   <calldata!mid:delegatee, argc:0, FCALL|FORWARDING>, nil
0006 leave                                  [Re]
```

Instruction 001 puts the caller's CI on the stack.  `send` is tagged with
FORWARDING, so it reads the CI and _copies_ the callers stack to this stack:

```
Executing Line | Code                                  | Stack
---------------+---------------------------------------+--------
              1| def delegatee(a, b) = a + b           | self
              2|                                       | 1
              3| def delegator(...)                    | 2
              4|   #                                   | CI1 (argc: 2)
           -> 5|   delegatee(...)  # CI2 (FORWARDING)  | cref_or_me
              6| end                                   | specval
              7|                                       | type
              8| def caller                            | self
              9|   delegator(1, 2) # CI1 (argc: 2)     | 1
             10| end                                   | 2
```

The "FORWARDING" call site combines information from CI1 with CI2 in order
to support passing other values in addition to the `...` value, as well as
perfectly forward splat args, kwargs, etc.

Since we're able to copy the stack from `caller` in to `delegator`'s stack, we
can avoid allocating objects.

I want to do this to eliminate object allocations for delegate methods.
My long term goal is to implement `Class#new` in Ruby and it uses `...`.

I was able to implement `Class#new` in Ruby
[here](https://github.com/ruby/ruby/pull/9289).
If we adopt the technique in this patch, then we can optimize allocating
objects that take keyword parameters for `initialize`.

For example, this code will allocate 2 objects: one for `SomeObject`, and one
for the kwargs:

```ruby
SomeObject.new(foo: 1)
```

If we combine this technique, plus implement `Class#new` in Ruby, then we can
reduce allocations for this common operation.

Co-Authored-By: John Hawthorn <john@hawthorn.email>
Co-Authored-By: Alan Wu <XrXr@users.noreply.github.com>
2024-06-18 09:28:25 -07:00
Nobuyoshi Nakada 49fcd33e13 Introduce a specialize instruction for Array#pack
Instructions for this code:

```ruby
  # frozen_string_literal: true

[a].pack("C")
```

Before this commit:

```
== disasm: #<ISeq:<main>@test.rb:1 (1,0)-(3,13)>
0000 putself                                                          (   3)[Li]
0001 opt_send_without_block                 <calldata!mid:a, argc:0, FCALL|VCALL|ARGS_SIMPLE>
0003 newarray                               1
0005 putobject                              "C"
0007 opt_send_without_block                 <calldata!mid:pack, argc:1, ARGS_SIMPLE>
0009 leave
```

After this commit:

```
== disasm: #<ISeq:<main>@test.rb:1 (1,0)-(3,13)>
0000 putself                                                          (   3)[Li]
0001 opt_send_without_block                 <calldata!mid:a, argc:0, FCALL|VCALL|ARGS_SIMPLE>
0003 putobject                              "C"
0005 opt_newarray_send                      2, :pack
0008 leave
```

Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
2024-05-23 12:11:50 -07:00
yui-knk 899d9f79dd Rename `vast` to `ast_value`
There is an English word "vast".
This commit changes the name to be more clear name to avoid confusion.
2024-05-03 12:40:35 +09:00
Peter Zhu 1ca4c52b64 Free unused_block_warning_table when RUBY_FREE_AT_EXIT 2024-04-30 10:57:32 -04:00
HASUMI Hitoshi 9ea77cb351 Remove unnecessary assignment to ast->body.line_count
This patch removes a code that assigns `-1` to `ast->body.line_count` because, at least as of now, it looks not necessary.
I made this commit atomically revertable if I was wrong.

## Relevant commits

- The preparation for this PR: https://github.com/ruby/ruby/pull/10655/files#diff-2af2e7f2e1c28da5e9d99ad117cba1c4dabd8b0bc3081da88e414c55c6aa9549R1484-R1493
- The original commit that introduced the code: d65f7458bc
2024-04-27 17:56:20 +09:00
HASUMI Hitoshi 55a402bb75 Add line_count field to rb_ast_body_t
This patch adds `int line_count` field to `rb_ast_body_t` structure.
Instead, we no longer cast `script_lines` to Fixnum.

## Background

Ref https://github.com/ruby/ruby/pull/10618

In the PR above, we have decoupled IMEMO from `rb_ast_t`.
This means we could lift the five-words-restriction of the structure
that forced us to unionize `rb_ast_t *` and `FIXNUM` in one field.

## Relating refactor

- Remove the second parameter of `rb_ruby_ast_new()` function

## Attention

I will remove a code that assigns -1 to line_count, in `rb_binding_add_dynavars()`
of vm.c, because I don't think it is necessary.
But I will make another PR for this so that we can atomically revert
in case I was wrong (See the comment on the code)
2024-04-27 12:08:26 +09:00
HASUMI Hitoshi 2244c58b00 [Universal parser] Decouple IMEMO from rb_ast_t
This patch removes the `VALUE flags` member from the `rb_ast_t` structure making `rb_ast_t` no longer an IMEMO object.

## Background

We are trying to make the Ruby parser generated from parse.y a universal parser that can be used by other implementations such as mruby.
To achieve this, it is necessary to exclude VALUE and IMEMO from parse.y, AST, and NODE.

## Summary (file by file)

- `rubyparser.h`
  - Remove the `VALUE flags` member from `rb_ast_t`
- `ruby_parser.c` and `internal/ruby_parser.h`
  - Use TypedData_Make_Struct VALUE which wraps `rb_ast_t` `in ast_alloc()` so that GC can manage it
    - You can retrieve `rb_ast_t` from the VALUE by `rb_ruby_ast_data_get()`
  - Change the return type of `rb_parser_compile_XXXX()` functions from `rb_ast_t *` to `VALUE`
  - rb_ruby_ast_new() which internally `calls ast_alloc()` is to create VALUE vast outside ruby_parser.c
- `iseq.c` and `vm_core.h`
  - Amend the first parameter of `rb_iseq_new_XXXX()` functions from `rb_ast_body_t *` to `VALUE`
  - This keeps the VALUE of AST on the machine stack to prevent being removed by GC
- `ast.c`
  - Almost all change is replacement `rb_ast_t *ast` with `VALUE vast` (sorry for the big diff)
  - Fix `node_memsize()`
    - Now it includes `rb_ast_local_table_link`, `tokens` and script_lines
- `compile.c`, `load.c`, `node.c`, `parse.y`, `proc.c`, `ruby.c`, `template/prelude.c.tmpl`, `vm.c` and `vm_eval.c`
  - Follow-up due to the above changes
- `imemo.{c|h}`
  - If an object with `imemo_ast` appears, considers it a bug

Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
2024-04-26 11:21:08 +09:00
Takashi Kokubun 7ab1a608e7
YJIT: Optimize local variables when EP == BP (take 2) (#10607)
* Revert "Revert "YJIT: Optimize local variables when EP == BP" (#10584)"

This reverts commit c878344195.

* YJIT: Take care of GC references in ISEQ invariants

Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>

---------

Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
2024-04-25 10:04:53 -04:00
Peter Zhu 214811974b Add ruby_mimcalloc
Many places call ruby_mimmalloc then MEMZERO. This can be reduced by
using ruby_mimcalloc instead.
2024-04-24 15:30:43 -04:00
Alan Wu c878344195
Revert "YJIT: Optimize local variables when EP == BP" (#10584)
This reverts commit 4cc58ea0b8.

Since the change landed call-threshold=1 CI runs have been timing out.
There has also been `verify-ctx` violations. Revert for now while we debug.
2024-04-19 16:47:25 +00:00
Koichi Sasada 662ce928a7 `RUBY_TRY_UNUSED_BLOCK_WARNING_STRICT`
`RUBY_TRY_UNUSED_BLOCK_WARNING_STRICT=1 ruby ...` will enable
strict check for unused block warning.

This option is only for trial to compare the results so the
envname is not considered well.
Should be removed before Ruby 3.4.0 release.
2024-04-19 14:28:54 +09:00
Takashi Kokubun 4cc58ea0b8
YJIT: Optimize local variables when EP == BP (#10487) 2024-04-17 15:00:03 -04:00
Koichi Sasada e9d7478ded relax unused block warning for duck typing
if a method `foo` uses a block, other (unrelated) method `foo`
can receives a block. So try to relax the unused block warning
condition.

```ruby
      class C0
        def f = yield
      end

      class C1 < C0
        def f = nil
      end

      [C0, C1].f{ block } # do not warn
```
2024-04-17 20:26:49 +09:00
Jean Boussier d019b3baec Emit a performance warning when redefining specially optimized methods
This makes it easier to notice a dependency is causing interpreter or
JIT deoptimization.

```ruby
Warning[:performance] = true

class String
  def freeze
    super
  end
end
```

```
./test.rb:4: warning: Redefining 'String#freeze' disable multiple interpreter and JIT optimizations
```
2024-04-15 18:21:41 +02:00
HASUMI Hitoshi 9b1e97b211 [Universal parser] DeVALUE of p->debug_lines and ast->body.script_lines
This patch is part of universal parser work.

## Summary
- Decouple VALUE from members below:
  - `(struct parser_params *)->debug_lines`
  - `(rb_ast_t *)->body.script_lines`
- Instead, they are now `rb_parser_ary_t *`
  - They can also be a `(VALUE)FIXNUM` as before to hold line count
- `ISEQ_BODY(iseq)->variable.script_lines` remains VALUE
  - In order to do this,
  - Add `VALUE script_lines` param to `rb_iseq_new_with_opt()`
  - Introduce `rb_parser_build_script_lines_from()` to convert `rb_parser_ary_t *` into `VALUE`

## Other details
- Extend `rb_parser_ary_t *`. It previously could only store `rb_parser_ast_token *`, now can store script_lines, too
- Change tactics of building the top-level `SCRIPT_LINES__` in `yycompile0()`
  - Before: While parsing, each line of the script is added to `SCRIPT_LINES__[path]`
  - After: After `yyparse(p)`, `SCRIPT_LINES__[path]` will be built from `p->debug_lines`
- Remove the second parameter of `rb_parser_set_script_lines()` to make it simple
- Introduce `script_lines_free()` to be called from `rb_ast_free()` because the GC no longer takes care of the script_lines
- Introduce `rb_parser_string_deep_copy()` in parse.y to maintain script_lines when `rb_ruby_parser_free()` called
  - With regard to this, please see *Future tasks* below

## Future tasks
- Decouple IMEMO from `rb_ast_t *`
  - This lifts the five-members-restriction of Ruby object,
  - So we will be able to move the ownership of the `lex.string_buffer` from parser to AST
  - Then we remove `rb_parser_string_deep_copy()` to make the whole thing simple
2024-04-15 20:51:54 +09:00
Matt Valentine-House ef19234b10 Merge rb_objspace_alloc and Init_heap.
Co-Authored-By: Peter Zhu <peter@peterzhu.ca>
2024-04-04 15:00:57 +01:00
Peter Zhu aa794cc5a2 Turn GC off at boot on Windows
This is to stop crashes like:

    .\miniruby.exe: [BUG] Segmentation fault
    ruby 3.4.0dev (2024-03-26T15:38:26Z pull/10370/merge 040ea2ae2f) [x64-mswin64_140]

    -- Control frame information -----------------------------------------------
    c:0001 p:0000 s:0003 E:000d00 DUMMY  [FINISH]

    -- Threading information ---------------------------------------------------
    Total ractor count: 1
    Ruby thread count for this ractor: 1

    -- C level backtrace information -------------------------------------------
    C:\Windows\SYSTEM32\ntdll.dll(NtWaitForSingleObject+0x14) [0x00007FFA091AFC74]
    C:\Windows\System32\KERNELBASE.dll(WaitForSingleObjectEx+0x93) [0x00007FFA05BB4513]
    D:\a\ruby\ruby\build\miniruby.exe(rb_print_backtrace+0x3e) [0x00007FF64E536EFE] d:\a\ruby\ruby\src\vm_dump.c:844
    D:\a\ruby\ruby\build\miniruby.exe(rb_vm_bugreport+0x1ae) [0x00007FF64E5370B2] d:\a\ruby\ruby\src\vm_dump.c:1154
    D:\a\ruby\ruby\build\miniruby.exe(rb_bug_for_fatal_signal+0x77) [0x00007FF64E3FF357] d:\a\ruby\ruby\src\error.c:1087
    D:\a\ruby\ruby\build\miniruby.exe(sigsegv+0x71) [0x00007FF64E4C79E5] d:\a\ruby\ruby\src\signal.c:926
    C:\Windows\System32\ucrtbase.dll(seh_filter_exe+0x233) [0x00007FFA0521CE03]
    D:\a\ruby\ruby\build\miniruby.exe(`__scrt_common_main_seh'::`1'::filt$0+0x16) [0x00007FF64E594DA0] f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl:269
    C:\Windows\SYSTEM32\VCRUNTIME140.dll(_C_specific_handler+0x9f) [0x00007FF9E54AF73F]
    C:\Windows\SYSTEM32\ntdll.dll(_chkstk+0x11f) [0x00007FFA091B4C2F]
    C:\Windows\SYSTEM32\ntdll.dll(RtlWalkFrameChain+0x14bf) [0x00007FFA09114CEF]
    C:\Windows\SYSTEM32\ntdll.dll(KiUserExceptionDispatcher+0x2e) [0x00007FFA091B399E]
    D:\a\ruby\ruby\build\miniruby.exe(newobj_of+0x6d) [0x00007FF64E418615] d:\a\ruby\ruby\src\gc.c:2949
    D:\a\ruby\ruby\build\miniruby.exe(rb_wb_protected_newobj_of+0x32) [0x00007FF64E41C7DA] d:\a\ruby\ruby\src\gc.c:2974
    D:\a\ruby\ruby\build\miniruby.exe(str_new0+0x64) [0x00007FF64E4E7F48] d:\a\ruby\ruby\src\string.c:887
    D:\a\ruby\ruby\build\miniruby.exe(rb_enc_str_new+0x40) [0x00007FF64E4D89B8] d:\a\ruby\ruby\src\string.c:945
    D:\a\ruby\ruby\build\miniruby.exe(iseq_compile_each0+0xdd7) [0x00007FF64E3B4A23] d:\a\ruby\ruby\src\compile.c:10368
    D:\a\ruby\ruby\build\miniruby.exe(iseq_compile_each+0x74) [0x00007FF64E3B3C40] d:\a\ruby\ruby\src\compile.c:9971
2024-03-27 09:39:23 -04:00
Peter Zhu 1d99fe430a Register classpath of FrozenCore before converting to ICLASS
Since ICLASS do not mark the classpath, we need to register it as a
global object before we convert RubyVM::FrozenCore as a ICLASS.
2024-03-27 09:39:23 -04:00
KJ Tsanaktsidis 2535a09e85 Check ASAN fake stacks when marking non-current threads
Currently, we check the values on the machine stack & register state to
see if they're actually a pointer to an ASAN fake stack, and mark the
values on the fake stack too if required. However, we are only doing
that for the _current_ thread (the one actually running the GC), not for
any other thread in the program.

Make rb_gc_mark_machine_context (which is called for marking non-current
threads) perform the same ASAN fake stack handling that
mark_current_machine_context performs.

[Bug #20310]
2024-03-25 14:57:04 +11:00
Nobuyoshi Nakada 28a2105a55
Prefer `enum ruby_tag_type` over `int` 2024-03-17 15:57:19 +09:00
Alan Wu def7023ee4 Initialize VM stack if VM_CHECK_MODE
Lately there has been a few flaky YJIT CI failures where a new Ruby
thread is finding the canary on the VM stack. For example:

https://github.com/ruby/ruby/actions/runs/8287357784/job/22679508482#step:14:109

After checking a local rr recording, it's clear that the canary was
written there when YJIT was using a temporary malloc region, and then
later handed to the new Ruby thread. Previously, the VM stack was
uninitialized, so it can have stale values in it, like the canary.

Though unlikely, this can happen without YJIT too. Initialize the stack
if we're spawning canaries.
2024-03-15 19:15:58 -04:00
Peter Zhu c2170e5c2b Fix typo from gloabl_object_list to global_object_list 2024-03-14 13:52:20 -04:00
Peter Zhu 4559a161af Move gloabl_object_list from objspace to VM
This is to be consistent with the mark_object_ary that is in the VM.
2024-03-14 13:29:59 -04:00
Peter Zhu 83618f2cfa [Feature #20306] Implement ruby_free_at_exit_p
ruby_free_at_exit_p is a way for extensions to determine whether they
should free all memory at shutdown.
2024-03-14 08:33:30 -04:00
Jean Boussier 2d80b6093f Retire RUBY_MARK_UNLESS_NULL
Marking `Qnil` or `Qfalse` works fine, having
an extra macro to avoid it isn't needed.
2024-03-08 14:13:14 +01:00
Jean Boussier d4f3dcf4df Refactor VM root modules
This `st_table` is used to both mark and pin classes
defined from the C API. But `vm->mark_object_ary` already
does both much more efficiently.

Currently a Ruby process starts with 252 rooted classes,
which uses `7224B` in an `st_table` or `2016B` in an `RArray`.

So a baseline of 5kB saved, but since `mark_object_ary` is
preallocated with `1024` slots but only use `405` of them,
it's a net `7kB` save.

`vm->mark_object_ary` is also being refactored.

Prior to this changes, `mark_object_ary` was a regular `RArray`, but
since this allows for references to be moved, it was marked a second
time from `rb_vm_mark()` to pin these objects.

This has the detrimental effect of marking these references on every
minors even though it's a mostly append only list.

But using a custom TypedData we can save from having to mark
all the references on minor GC runs.

Addtionally, immediate values are now ignored and not appended
to `vm->mark_object_ary` as it's just wasted space.
2024-03-06 15:33:43 -05:00
Jean Boussier b4a69351ec Move FL_SINGLETON to FL_USER1
This frees FL_USER0 on both T_MODULE and T_CLASS.

Note: prior to this, FL_SINGLETON was never set on T_MODULE,
so checking for `FL_SINGLETON` without first checking that
`FL_TYPE` was `T_CLASS` was valid. That's no longer the case.
2024-03-06 13:11:41 -05:00
Peter Zhu e8e2415bb3 Use RB_SPECIAL_CONST_P instead of rb_special_const_p
rb_special_const_p returns a VALUE (Qtrue or Qfalse), so we shouldn't
assume that Qfalse is 0. We should instead use RB_SPECIAL_CONST_P.
2024-02-27 21:11:11 -05:00
Peter Zhu 330830dd1a Add IMEMO_NEW
Rather than exposing that an imemo has a flag and four fields, this
changes the implementation to only expose one field (the klass) and
fills the rest with 0. The type will have to fill in the values themselves.
2024-02-21 11:33:05 -05:00
John Hawthorn 1c97abaaba De-dup identical callinfo objects
Previously every call to vm_ci_new (when the CI was not packable) would
result in a different callinfo being returned this meant that every
kwarg callsite had its own CI.

When calling, different CIs result in different CCs. These CIs and CCs
both end up persisted on the T_CLASS inside cc_tbl. So in an eval loop
this resulted in a memory leak of both types of object. This also likely
resulted in extra memory used, and extra time searching, in non-eval
cases.

For simplicity in this commit I always allocate a CI object inside
rb_vm_ci_lookup, but ideally we would lazily allocate it only when
needed. I hope to do that as a follow up in the future.
2024-02-20 18:55:00 -08:00
Yusuke Endoh 25d74b9527 Do not include a backtick in error messages and backtraces
[Feature #16495]
2024-02-15 18:42:31 +09:00
Nobuyoshi Nakada 84d8dbe7a5 Enable redefinition check for rbinc methods 2024-02-12 11:51:06 -08:00
Peter Zhu d0b774cfb8 Remove null checks for xfree
xfree can handle null values, so we don't need to check it.
2024-01-19 10:25:02 -05:00
KJ Tsanaktsidis 807714447e Pass down "stack start" variables from closer to the top of the stack
This commit changes how stack extents are calculated for both the main
thread and other threads. Ruby uses the address of a local variable as
part of the calculation for machine stack extents:

* pthreads uses it as a lower-bound on the start of the stack, because
  glibc (and maybe other libcs) can store its own data on the stack
  before calling into user code on thread creation.
* win32 uses it as an argument to VirtualQuery, which gets the extent of
  the memory mapping which contains the variable

However, the local being used for this is actually too low (too close to
the leaf function call) in both the main thread case and the new thread
case.

In the main thread case, we have the `INIT_STACK` macro, which is used
for pthreads to set the `native_main_thread->stack_start` value. This
value is correctly captured at the very top level of the program (in
main.c). However, this is _not_ what's used to set the execution context
machine stack (`th->ec->machine_stack.stack_start`); that gets set as
part of a call to `ruby_thread_init_stack` in `Init_BareVM`, using the
address of a local variable allocated _inside_ `Init_BareVM`. This is
too low; we need to use a local allocated closer to the top of the
program.

In the new thread case, the lolcal is allocated inside
`native_thread_init_stack`, which is, again, too low.

In both cases, this means that we might have VALUEs lying outside the
bounds of `th->ec->machine.stack_{start,end}`, which won't be marked
correctly by the GC machinery.

To fix this,

* In the main thread case: We already have `INIT_STACK` at the right
  level, so just pass that local var to `ruby_thread_init_stack`.
* In the new thread case: Allocate the local one level above the call to
  `native_thread_init_stack` in `call_thread_start_func2`.

[Bug #20001]

fix
2024-01-19 09:55:12 +11:00
Jeremy Evans 5c823aa686
Support keyword splatting nil
nil is treated similarly to the empty hash in this case, passing
no keywords and not calling any conversion methods.

Fixes [Bug #20064]

Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
2024-01-14 11:41:02 -08:00
KJ Tsanaktsidis 396e94666b Revert "Pass down "stack start" variables from closer to the top of the stack"
This reverts commit 4ba8f0dc99.
2024-01-12 17:58:54 +11:00
KJ Tsanaktsidis 4ba8f0dc99 Pass down "stack start" variables from closer to the top of the stack
The implementation of `native_thread_init_stack` for the various
threading models can use the address of a local variable as part of the
calculation of the machine stack extents:

* pthreads uses it as a lower-bound on the start of the stack, because
  glibc (and maybe other libcs) can store its own data on the stack
  before calling into user code on thread creation.
* win32 uses it as an argument to VirtualQuery, which gets the extent of
  the memory mapping which contains the variable

However, the local being used for this is actually allocated _inside_
the `native_thread_init_stack` frame; that means the caller might
allocate a VALUE on the stack that actually lies outside the bounds
stored in machine.stack_{start,end}.

A local variable from one level above the topmost frame that stores
VALUEs on the stack must be drilled down into the call to
`native_thread_init_stack` to be used in the calculation. This probably
doesn't _really_ matter for the win32 case (they'll be in the same
memory mapping so VirtualQuery should return the same thing), but
definitely could matter for the pthreads case.

[Bug #20001]
2024-01-12 17:29:48 +11:00
Peter Zhu 057df4379f Free environ when RUBY_FREE_AT_EXIT
The environ is malloc'd, so it gets reported as a memory leak. This
commit adds ruby_free_proctitle which frees it during shutdown when
RUBY_FREE_AT_EXIT is set.

    STACK OF 1 INSTANCE OF 'ROOT LEAK: <calloc in ruby_init_setproctitle>':
    5   dyld                                  0x18b7090e0 start + 2360
    4   ruby                                  0x10000e3a8 main + 100  main.c:58
    3   ruby                                  0x1000b4dfc ruby_options + 180  eval.c:121
    2   ruby                                  0x1001c5f70 ruby_process_options + 200  ruby.c:3014
    1   ruby                                  0x10035c9fc ruby_init_setproctitle + 76  setproctitle.c:105
    0   libsystem_malloc.dylib                0x18b8c7b78 _malloc_zone_calloc_instrumented_or_legacy + 100
2024-01-11 10:09:53 -05:00
Peter Zhu 46f7fac878 Free rb_native_thread of main thread
The rb_native_thread gets reported as a memory leak by the macOS leaks
tool:

    $ RUBY_FREE_AT_EXIT=1 leaks -q --atExit -- ./miniruby -e ""
    STACK OF 1 INSTANCE OF 'ROOT LEAK: <calloc in ruby_xcalloc_body>':
    6   dyld                                  0x1818a90e0 start + 2360
    5   miniruby                              0x104730128 main + 88  main.c:58
    4   miniruby                              0x1047d7710 ruby_init + 16  eval.c:102
    3   miniruby                              0x1047d757c ruby_setup + 112  eval.c:83
    2   miniruby                              0x104977278 Init_BareVM + 524  vm.c:4193
    1   miniruby                              0x1047f2478 ruby_xcalloc_body + 176  gc.c:12878
    0   libsystem_malloc.dylib                0x181a67b78 _malloc_zone_calloc_instrumented_or_legacy + 100
    ====
        1 (176 bytes) ROOT LEAK: <calloc in ruby_xcalloc_body 0x126604780> [176]
2024-01-03 15:41:08 -05:00
John Hawthorn f1b7424cbe FREE_AT_EXIT: Don't free main stack post-fork
When a forked process was started in a thread, this would result in a
double-free during the child process exit.

    RUBY_FREE_AT_EXIT=1 ./miniruby -e 'Thread.new { fork { } }.join; Process.waitpid'

This is because the main thread in the forked process was not the
initial VM thread, and the new thread's stack was freed as part of
objectspace iteration.

This change also allows rb_threadptr_root_fiber_release to run without
EC being available.
2023-12-22 18:07:22 -08:00
John Hawthorn 339978ef38 Free default_rand_key after freeing Ractors
Ractor's free iterates through its TLS keys so we need to keep this
memory available until after Ractors are freed.

Minimal reproduction:

    RUBY_FREE_AT_EXIT=1 ./miniruby -e rand
2023-12-22 18:07:22 -08:00
HParker 7ef90b3978 Correct free_on_exit env var to free_at_exit 2023-12-20 14:36:32 +09:00
Nobuyoshi Nakada 40fc9b070c
[DOC] No document for internal or debug methods 2023-12-18 20:17:45 +09:00
HParker 55326a915f Introduce --parser runtime flag
Introduce runtime flag for specifying the parser,

```
ruby --parser=prism
```

also update the description:

```
$ ruby --parser=prism --version
ruby 3.3.0dev (2023-12-08T04:47:14Z add-parser-runtime.. 0616384c9f) +PRISM [x86_64-darwin23]
```

[Bug #20044]
2023-12-15 13:42:19 -05:00
HParker 474b4c42f4 free ractors with ractor_free
Previously with RUBY_FREE_ON_EXIT, ractors where being xfree-ed which is incorrect since they are not xmalloced.
Instead we can free ractors with ractor free during shutdown. This change only effects main ractor freeing when RUBY_FREE_ON_EXIT is set.

Co-authored-by: John Hawthorn <john@hawthorn.email>
2023-12-15 10:31:15 -05:00
KJ Tsanaktsidis f8effa209a Change the semantics of rb_postponed_job_register
Our current implementation of rb_postponed_job_register suffers from
some safety issues that can lead to interpreter crashes (see bug #1991).
Essentially, the issue is that jobs can be called with the wrong
arguments.

We made two attempts to fix this whilst keeping the promised semantics,
but:
  * The first one involved masking/unmasking when flushing jobs, which
    was believed to be too expensive
  * The second one involved a lock-free, multi-producer, single-consumer
    ringbuffer, which was too complex

The critical insight behind this third solution is that essentially the
only user of these APIs are a) internal, or b) profiling gems.

For a), none of the usages actually require variable data; they will
work just fine with the preregistration interface.

For b), generally profiling gems only call a single callback with a
single piece of data (which is actually usually just zero) for the life
of the program. The ringbuffer is complex because it needs to support
multi-word inserts of job & data (which can't be atomic); but nobody
actually even needs that functionality, really.

So, this comit:
  * Introduces a pre-registration API for jobs, with a GVL-requiring
    rb_postponed_job_prereigster, which returns a handle which can be
    used with an async-signal-safe rb_postponed_job_trigger.
  * Deprecates rb_postponed_job_register (and re-implements it on top of
    the preregister function for compatability)
  * Moves all the internal usages of postponed job register
    pre-registration
2023-12-10 15:00:37 +09:00
Koichi Sasada 352a885a0f Thread specific storage APIs
This patch introduces thread specific storage APIs
for tools which use `rb_internal_thread_event_hook` APIs.

* `rb_internal_thread_specific_key_create()` to create a tool specific
  thread local storage key and allocate the storage if not available.
* `rb_internal_thread_specific_set()` sets a data to thread and tool
  specific storage.
* `rb_internal_thread_specific_get()` gets a data in thread and tool
  specific storage.

Note that `rb_internal_thread_specific_get|set(thread_val, key)`
can be called without GVL and safe for async signal and safe for
multi-threading (native threads). So you can call it in any internal
thread event hooks. Further more you can call it from other native
threads. Of course `thread_val` should be living while accessing the
data from this function.

Note that you should not forget to clean up the set data.
2023-12-08 13:16:19 +09:00
Adam Hess 6816e8efcf Free everything at shutdown
when the RUBY_FREE_ON_SHUTDOWN environment variable is set, manually free memory at shutdown.

Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
Co-authored-by: Peter Zhu <peter@peterzhu.ca>
2023-12-07 15:52:35 -05:00
Alan Wu 195dbf241f Fix potential compaction issue in env_copy()
`src_ep[VM_ENV_DATA_INDEX_ME_CREF]` was read out and held without
marking across the allocation in vm_env_new(). In case vm_env_new() ran
compaction, an invalid reference could have been written into
`copied_env`.

It might've been hard to actually produce a crash with this issue due to
the pinning marking of the field in rb_execution_context_mark().
2023-12-07 11:16:12 -05:00
Alan Wu 050806f425 Add missing write barrier to env_copy()
Previously, the following crashed with
`vm_assert_env:imemo_type_p(obj, imemo_env)` due to missing a missing
WB:

    o = Object.new
    def o.foo(n)
      freeze

      GC.stress = 1

      # inflate block nesting get an imemo_env for each level
      n.tap do |i|
        i.tap do |local|
          return Ractor.make_shareable(-> do
            local + i + n
          end)
        end
      end
    ensure
      GC.stress = false
      GC.verify_internal_consistency
    end

    p o.foo(1)[]

By the time the recursive env_copy() call returns, `copied_env` could
have aged or have turned greyed, so we need a WB for the
`ep[VM_ENV_DATA_INDEX_SPECVAL]` assignment which adds an edge.

Fix: 674eb7df7f
2023-12-07 11:16:12 -05:00