* [win32] fix compilation for windows-arm64
Credits to MSYS2 Ruby package using this patch.
* [win32] nm use full options
Fix compilation error when using MSYS2 environment.
Credits to MSYS2 Ruby package using this patch.
* [win32] detect llvm-windres (used for windows-arm64)
When adding preprocessor option for llvm-windres (using clang as
parameter), it fails. Thus, do not add this.
It's needed to be able to compile windows-arm64 version, because MSYS2
toolchain is LLVM based (instead of GCC/binutils).
* [win32] pioinfo detection for windows-arm64
This fixes "unexpected ucrtbase.dll" for native windows-arm64 ruby
binary. It does not solve issue with x64 version emulated on this
platform.
Value of pioinfo pointer can be found in ucrtbase.dll at latest adrp/add
sequence before return of _isatty function. This works for both release
and debug ucrt.
Due to the nature of aarch64 ISA (vs x86 or x64), it's needed to
disassemble instructions to retrieve offset value, which is a bit more
complicated than matching specific string patterns.
Details about adrp/add usage can be found in this blog post:
https://devblogs.microsoft.com/oldnewthing/20220809-00/?p=106955
For instruction decoding, the Arm documentation was used as a reference.
Remove rb_control_frame_t::__bp__ and optimize bmethod calls
This commit removes the __bp__ field from rb_control_frame_t. It was
introduced to help MJIT, but since MJIT was replaced by RJIT, we can use
vm_base_ptr() to compute it from the SP of the previous control frame
instead. Removing the field avoids needing to set it up when pushing new
frames.
Simply removing __bp__ would cause crashes since RJIT and YJIT used a
slightly different stack layout for bmethod calls than the interpreter.
At the moment of the call, the two layouts looked as follows:
┌────────────┐ ┌────────────┐
│ frame_base │ │ frame_base │
├────────────┤ ├────────────┤
│ ... │ │ ... │
├────────────┤ ├────────────┤
│ args │ │ args │
├────────────┤ └────────────┘<─prev_frame_sp
│ receiver │
prev_frame_sp─>└────────────┘
RJIT & YJIT interpreter
Essentially, vm_base_ptr() needs to compute the address to frame_base
given prev_frame_sp in the diagrams. The presence of the receiver
created an off-by-one situation.
Make the interpreter use the layout the JITs use for iseq-to-iseq
bmethod calls. Doing so removes unnecessary argument shifting and
vm_exec_core() re-entry from the interpreter, yielding a speed
improvement visible through `benchmark/vm_defined_method.yml`:
patched: 7578743.1 i/s
master: 4796596.3 i/s - 1.58x slower
C-to-iseq bmethod calls now store one more VALUE than before, but that
should have negligible impact on overall performance.
Note that re-entering vm_exec_core() used to be necessary for firing
TracePoint events, but that's no longer the case since
9121e57a5f.
Closesruby/ruby#6428
This is useful for crash triaging. It also helps to hint extension
developers about the misuse of `rb_thread_call_without_gvl()`.
Example:
$ ./miniruby -e 'Ractor.new{Ractor.receive};
Thread.new{sleep}; Process.kill:SEGV,Process.pid'
<snip>
-- Threading information ---------------------------------------------------
Total ractor count: 2
Ruby thread count for this ractor: 2
This patch pushes dummy frames when loading code for the
profiling purpose.
The following methods push a dummy frame:
* `Kernel#require`
* `Kernel#load`
* `RubyVM::InstructionSequence.compile_file`
* `RubyVM::InstructionSequence.load_from_binary`
https://bugs.ruby-lang.org/issues/18559
In the event that we are crashing due to a corrupt Ruby stack, we might
re-enter rb_vm_bugreport() due to failed assertions in
rb_backtrace_print_as_bugreport() or SDR(). In these cases we were
printing the bug report ad infinitum with unbounded recusion.
I seem to run into this every once in a while and the amount of log it
prints out is pretty distracting. On CI environments it makes the log
output unnecessarily big. Let's fix this.
`NON_SCALAR_THREAD_ID` shows `pthread_t` is non-scalar (non-pointer)
and only s390x is known platform. However, the supporting code is
very complex and it is only used for deubg print information.
So this patch removes the support of `NON_SCALAR_THREAD_ID`
and make the code simple.
`rb_thread_t` contained `native_thread_data_t` to represent
thread implementation dependent data. This patch separates
them and rename it `rb_native_thread` and point it from
`rb_thraed_t`.
Now, 1 Ruby thread (`rb_thread_t`) has 1 native thread (`rb_native_thread`).
Use ISEQ_BODY macro to get the rb_iseq_constant_body of the ISeq. Using
this macro will make it easier for us to change the allocation strategy
of rb_iseq_constant_body when using Variable Width Allocation.
If we crash during GC, allocating new objects in the segv handler can
cause an infinite loop. This commit is to avoid creating new objects in
the crash handler
* HAVE_ macros should only be defined or undefined, not used for their value.
* See [Feature #17752]
Co-authored-by: xtkoba (Tee KOBAYASHI) <xtkoba+ruby@gmail.com>
To make some kind of Ractor related extensions, some functions
should be exposed.
* include/ruby/thread_native.h
* rb_native_mutex_*
* rb_native_cond_*
* include/ruby/ractor.h
* RB_OBJ_SHAREABLE_P(obj)
* rb_ractor_shareable_p(obj)
* rb_ractor_std*()
* rb_cRactor
and rm ractor_pub.h
and rename srcdir/ractor.h to srcdir/ractor_core.h
(to avoid conflict with include/ruby/ractor.h)
This commit introduces Ractor mechanism to run Ruby program in
parallel. See doc/ractor.md for more details about Ractor.
See ticket [Feature #17100] to see the implementation details
and discussions.
[Feature #17100]
This commit does not complete the implementation. You can find
many bugs on using Ractor. Also the specification will be changed
so that this feature is experimental. You will see a warning when
you make the first Ractor with `Ractor.new`.
I hope this feature can help programmers from thread-safety issues.
Since this commit (9e1b06e17d), the VM Debug Level constant is moved from `vm_insnhelper.h` to `vm_core.h`. This PR is a super tiny update to reflect that change so that people won't waste time on searching in a wrong file.
This patch contains several ideas:
(1) Disposable inline method cache (IMC) for race-free inline method cache
* Making call-cache (CC) as a RVALUE (GC target object) and allocate new
CC on cache miss.
* This technique allows race-free access from parallel processing
elements like RCU.
(2) Introduce per-Class method cache (pCMC)
* Instead of fixed-size global method cache (GMC), pCMC allows flexible
cache size.
* Caching CCs reduces CC allocation and allow sharing CC's fast-path
between same call-info (CI) call-sites.
(3) Invalidate an inline method cache by invalidating corresponding method
entries (MEs)
* Instead of using class serials, we set "invalidated" flag for method
entry itself to represent cache invalidation.
* Compare with using class serials, the impact of method modification
(add/overwrite/delete) is small.
* Updating class serials invalidate all method caches of the class and
sub-classes.
* Proposed approach only invalidate the method cache of only one ME.
See [Feature #16614] for more details.