Our current implementation of rb_postponed_job_register suffers from
some safety issues that can lead to interpreter crashes (see bug #1991).
Essentially, the issue is that jobs can be called with the wrong
arguments.
We made two attempts to fix this whilst keeping the promised semantics,
but:
* The first one involved masking/unmasking when flushing jobs, which
was believed to be too expensive
* The second one involved a lock-free, multi-producer, single-consumer
ringbuffer, which was too complex
The critical insight behind this third solution is that essentially the
only user of these APIs are a) internal, or b) profiling gems.
For a), none of the usages actually require variable data; they will
work just fine with the preregistration interface.
For b), generally profiling gems only call a single callback with a
single piece of data (which is actually usually just zero) for the life
of the program. The ringbuffer is complex because it needs to support
multi-word inserts of job & data (which can't be atomic); but nobody
actually even needs that functionality, really.
So, this comit:
* Introduces a pre-registration API for jobs, with a GVL-requiring
rb_postponed_job_prereigster, which returns a handle which can be
used with an async-signal-safe rb_postponed_job_trigger.
* Deprecates rb_postponed_job_register (and re-implements it on top of
the preregister function for compatability)
* Moves all the internal usages of postponed job register
pre-registration
This patch introduces thread specific storage APIs
for tools which use `rb_internal_thread_event_hook` APIs.
* `rb_internal_thread_specific_key_create()` to create a tool specific
thread local storage key and allocate the storage if not available.
* `rb_internal_thread_specific_set()` sets a data to thread and tool
specific storage.
* `rb_internal_thread_specific_get()` gets a data in thread and tool
specific storage.
Note that `rb_internal_thread_specific_get|set(thread_val, key)`
can be called without GVL and safe for async signal and safe for
multi-threading (native threads). So you can call it in any internal
thread event hooks. Further more you can call it from other native
threads. Of course `thread_val` should be living while accessing the
data from this function.
Note that you should not forget to clean up the set data.
`rb_vm_tag_jmpbuf_{init,deinit}` are safe to raise exception since the
given tag is not yet pushed to `ec->tag` or already popped from it at
the time, so `ec->tag` is always valid and it's safe to raise exception
when xmalloc fails.
`rb_jmpbuf_t` type is considerably large due to inline-allocated
Asyncify buffer, and it leads to stack overflow even with small number
of C-method call frames. This commit allocates the Asyncify buffer used
by `rb_wasm_setjmp` in heap to mitigate the issue.
This patch introduces a new type `rb_vm_tag_jmpbuf_t` to abstract the
representation of a jump buffer, and init/deinit hook points to manage
lifetime of the buffer. These changes are effectively NFC for non-wasm
platforms.
* Port call threshold logic from Rust to C for performance
* Prefix global/field names with yjit_
* Fix linker error
* Fix preprocessor condition for rb_yjit_threshold_hit
* Fix third linker issue
* Exclude yjit_calls_at_interv from RJIT bindgen
---------
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
This patch introduce M:N thread scheduler for Ractor system.
In general, M:N thread scheduler employs N native threads (OS threads)
to manage M user-level threads (Ruby threads in this case).
On the Ruby interpreter, 1 native thread is provided for 1 Ractor
and all Ruby threads are managed by the native thread.
From Ruby 1.9, the interpreter uses 1:1 thread scheduler which means
1 Ruby thread has 1 native thread. M:N scheduler change this strategy.
Because of compatibility issue (and stableness issue of the implementation)
main Ractor doesn't use M:N scheduler on default. On the other words,
threads on the main Ractor will be managed with 1:1 thread scheduler.
There are additional settings by environment variables:
`RUBY_MN_THREADS=1` enables M:N thread scheduler on the main ractor.
Note that non-main ractors use the M:N scheduler without this
configuration. With this configuration, single ractor applications
run threads on M:1 thread scheduler (green threads, user-level threads).
`RUBY_MAX_CPU=n` specifies maximum number of native threads for
M:N scheduler (default: 8).
This patch will be reverted soon if non-easy issues are found.
[Bug #19842]
Previously, Kernel#lambda returned a non-lambda proc when given a
non-literal block and issued a warning under the `:deprecated` category.
With this change, Kernel#lambda will always return a lambda proc, if it
returns without raising.
Due to interactions with block passing optimizations, we previously had
two separate code paths for detecting whether Kernel#lambda got a
literal block. This change allows us to remove one path, the hack done
with rb_control_frame_t::block_code introduced in 85a337f for supporting
situations where Kernel#lambda returned a non-lambda proc.
[Feature #19777]
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
`struct rb_calling_info::cd` is introduced and `rb_calling_info::ci`
is replaced with it to manipulate the inline cache of iseq while
method invocation process. So that `ci` can be acessed with
`calling->cd->ci`. It adds one indirection but it can be justified
by the following points:
1) `vm_search_method_fastpath()` doesn't need `ci` and also
`vm_call_iseq_setup_normal()` doesn't need `ci`. It means
reducing `cd->ci` access in `vm_sendish()` can make it faster.
2) most of method types need to access `ci` once in theory
so that 1 additional indirection doesn't matter.
Remove rb_control_frame_t::__bp__ and optimize bmethod calls
This commit removes the __bp__ field from rb_control_frame_t. It was
introduced to help MJIT, but since MJIT was replaced by RJIT, we can use
vm_base_ptr() to compute it from the SP of the previous control frame
instead. Removing the field avoids needing to set it up when pushing new
frames.
Simply removing __bp__ would cause crashes since RJIT and YJIT used a
slightly different stack layout for bmethod calls than the interpreter.
At the moment of the call, the two layouts looked as follows:
┌────────────┐ ┌────────────┐
│ frame_base │ │ frame_base │
├────────────┤ ├────────────┤
│ ... │ │ ... │
├────────────┤ ├────────────┤
│ args │ │ args │
├────────────┤ └────────────┘<─prev_frame_sp
│ receiver │
prev_frame_sp─>└────────────┘
RJIT & YJIT interpreter
Essentially, vm_base_ptr() needs to compute the address to frame_base
given prev_frame_sp in the diagrams. The presence of the receiver
created an off-by-one situation.
Make the interpreter use the layout the JITs use for iseq-to-iseq
bmethod calls. Doing so removes unnecessary argument shifting and
vm_exec_core() re-entry from the interpreter, yielding a speed
improvement visible through `benchmark/vm_defined_method.yml`:
patched: 7578743.1 i/s
master: 4796596.3 i/s - 1.58x slower
C-to-iseq bmethod calls now store one more VALUE than before, but that
should have negligible impact on overall performance.
Note that re-entering vm_exec_core() used to be necessary for firing
TracePoint events, but that's no longer the case since
9121e57a5f.
Closesruby/ruby#6428
After [1], using ext/Setup to link some, but not all extensions failed
during linking. I did not know about this option, and had assumed that
only `--with-static-linked-ext` builds can include statically linked
extensions.
Include the support code for statically linked extensions in all
configurations like before [1]. Initialize the table lazily to minimize
footprint on builds that have no statically linked extensions.
[1]: 790cf4b6d0 "Fix autoload status of
statically linked extensions"
Originally, when 2e7bceb34e fixed cfuncs to no
longer use the VM stack for large array splats, it was thought to have fully
fixed Bug #4040, since the issue was fixed for methods defined in Ruby (iseqs)
back in Ruby 2.2.
After additional research, I determined that same issue affects almost all
types of method calls, not just iseq and cfunc calls. There were two main
types of remaining issues, important cases (where large array splat should
work) and pedantic cases (where large array splat raised SystemStackError
instead of ArgumentError).
Important cases:
```ruby
define_method(:a){|*a|}
a(*1380888.times)
def b(*a); end
send(:b, *1380888.times)
:b.to_proc.call(self, *1380888.times)
def d; yield(*1380888.times) end
d(&method(:b))
def self.method_missing(*a); end
not_a_method(*1380888.times)
```
Pedantic cases:
```ruby
def a; end
a(*1380888.times)
def b(_); end
b(*1380888.times)
def c(_=nil); end
c(*1380888.times)
c = Class.new do
attr_accessor :a
alias b a=
end.new
c.a(*1380888.times)
c.b(*1380888.times)
c = Struct.new(:a) do
alias b a=
end.new
c.a(*1380888.times)
c.b(*1380888.times)
```
This patch fixes all usage of CALLER_SETUP_ARG with splatting a large
number of arguments, and required similar fixes to use a temporary
hidden array in three other cases where the VM would use the VM stack
for handling a large number of arguments. However, it is possible
there may be additional cases where splatting a large number
of arguments still causes a SystemStackError.
This has a measurable performance impact, as it requires additional
checks for a large number of arguments in many additional cases.
This change is fairly invasive, as there were many different VM
functions that needed to be modified to support this. To avoid
too much API change, I modified struct rb_calling_info to add a
heap_argv member for storing the array, so I would not have to
thread it through many functions. This struct is always stack
allocated, which helps ensure sure GC doesn't collect it early.
Because of how invasive the changes are, and how rarely large
arrays are actually splatted in Ruby code, the existing test/spec
suites are not great at testing for correct behavior. To try to
find and fix all issues, I tested this in CI with
VM_ARGC_STACK_MAX to -1, ensuring that a temporary array is used
for all array splat method calls. This was very helpful in
finding breaking cases, especially ones involving flagged keyword
hashes.
Fixes [Bug #4040]
Co-authored-by: Jimmy Miller <jimmy.miller@shopify.com>
Rebuilding the loaded feature index slowed down with the bug fix
for #17885 in 79a4484a07. The
slowdown was extreme if realpath emulation was used, but even when
not emulated, it could be about 10x slower.
This adds loaded_features_realpath_map to rb_vm_struct. This is a
hidden hash mapping loaded feature paths to realpaths. When
rebuilding the loaded feature index, look at this hash to get
cached realpath values, and skip calling rb_check_realpath if a
cached value is found.
Fixes [Bug #19246]
The `catch_except_p` flag is used for communicating between parent and
child iseq's that a throw instruction was emitted. So for example if a
child iseq has a throw in it and the parent wants to catch the throw, we
use this flag to communicate to the parent iseq that a throw instruction
was emitted.
This flag is only useful at compile time, it only impacts the
compilation process so it seems to be fine to move it from the iseq body
to the compile_data struct.
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
`rb_current_ractor()` expects it has valid `ec` and `r`.
`rb_current_ractor_raw()` with a parameter `false` allows to return
NULL if `ec` is not available.
If the iseq only contains `opt_invokebuiltin_delegate_leave` insn and
the builtin-function (bf) is inline-able, the caller doesn't need to
build a method frame.
`vm_call_single_noarg_inline_builtin` is fast path for such cases.
* Revert "Remove special handling of `SIGCHLD`. (#7482)"
This reverts commit 44a0711eab.
* Revert "Remove prototypes for functions that are no longer used. (#7497)"
This reverts commit 4dce12bead.
* Revert "Remove SIGCHLD `waidpid`. (#7476)"
This reverts commit 1658e7d966.
* Fix change to rjit variable name.
We used to require MJIT is supported when YJIT is supported. However,
now that RJIT dropped some platforms that YJIT supports, it no longer
makes sense. We should be able to enable only YJIT, and vice versa.
This patch is follo-up of 0a82bfe.
Without this patch, if env is escaped (Proc'ed), strange svar
can be touched.
This patch tracks escaped env and use it.
The interrupt check will unintentionally release the VM lock when loading an iseq.
And this will cause issues with the `debug` gem's
[`ObjectSpace.each_iseq` method](0fcfc28aca/ext/debug/iseq_collector.c (L61-L67)),
which wraps iseqs with a wrapper and exposes their internal states when they're actually not ready to be used.
And when that happens, errors like this would occur and kill the `debug` gem's thread:
```
DEBUGGER: ReaderThreadError: uninitialized InstructionSequence
┃ DEBUGGER: Disconnected.
┃ ["/opt/rubies/ruby-3.2.0/lib/ruby/gems/3.2.0/gems/debug-1.7.1/lib/debug/breakpoint.rb:247:in `absolute_path'",
┃ "/opt/rubies/ruby-3.2.0/lib/ruby/gems/3.2.0/gems/debug-1.7.1/lib/debug/breakpoint.rb:247:in `block in iterate_iseq'",
┃ "/opt/rubies/ruby-3.2.0/lib/ruby/gems/3.2.0/gems/debug-1.7.1/lib/debug/breakpoint.rb:246:in `each_iseq'",
...
```
A way to reproduce the issue is to satisfy these conditions at the same time:
1. `debug` gem calling `ObjectSpace.each_iseq` (e.g. [activating a `LineBreakpoint`](0fcfc28aca/lib/debug/breakpoint.rb (L246))).
2. A large amount of iseq being loaded from another thread (possibly through the `bootsnap` gem).
3. 1 and 2 iterating through the same iseq(s) at the same time.
Because this issue requires external dependencies and a rather complicated timing setup to reproduce, I wasn't able to write a test case for it.
But here's some pseudo code to help reproduce it:
```rb
require "debug/session"
Thread.new do
100.times do
ObjectSpace.each_iseq do |iseq|
iseq.absolute_path
end
end
end
sleep 0.1
load_a_bunch_of_iseq
possibly_through_bootsnap
```
[Bug #19348]
Co-authored-by: Peter Zhu <peter@peterzhu.ca>