Граф коммитов

497 Коммитов

Автор SHA1 Сообщение Дата
KJ Tsanaktsidis dcf3add96b Delete reserve_stack code
This code was working around a bug in the Linux kernel. It was
previously possible for the kernel to place heap pages in a region where
the stack was allowed to grow into, and then therefore run out of usable
stack memory before RLIMIT_STACK was reached.

This bug was fixed in Linux commit
c204d21f22
for kernel 4.13 in 2017. Therefore, in 2024, we should be safe to delete
this workaround.

[Bug #20804]
2024-10-22 17:27:20 +11:00
John Bampton 3fc1495c30 Fix spelling 2024-10-09 07:14:44 +09:00
Nobuyoshi Nakada e1889dd7de
Assertions should not have side effects 2024-09-29 11:42:10 +09:00
Nobuyoshi Nakada c193ca5218
Adjust indent [ci skip] 2024-09-19 14:33:32 +09:00
Jean Boussier 63cbe3f6ac Proof of Concept: Allow to prevent fork from happening in known fork unsafe API
[Feature #20590]

For better of for worse, fork(2) remain the primary provider of
parallelism in Ruby programs. Even though it's frowned uppon in
many circles, and a lot of literature will simply state that only
async-signal safe APIs are safe to use after `fork()`, in practice
most APIs work well as long as you are careful about not forking
while another thread is holding a pthread mutex.

One of the APIs that is known cause fork safety issues is `getaddrinfo`.
If you fork while another thread is inside `getaddrinfo`, a mutex
may be left locked in the child, with no way to unlock it.

I think we could reduce the impact of these problem by preventing
in for the most notorious and common cases, by locking around
`fork(2)` and known unsafe APIs with a read-write lock.
2024-09-05 11:43:46 +02:00
John Hawthorn 87a85550ed Re-initialize vm->ractor.sched.lock after fork
Previously under certain conditions it was possible to encounter a
deadlock in the forked child process if ractor.sched.lock was held.

Co-authored-by: Nathan Froyd <froydnj@gmail.com>
2024-08-13 11:52:24 -07:00
Koichi Sasada e500222de1 fix last commit
`th` is gone.
2024-07-09 06:43:28 +09:00
Koichi Sasada ffc69eec0a `struct rb_thread_sched_waiting`
Introduce `struct rb_thread_sched_waiting` and `timer_th.waiting`
can contain other than `rb_thread_t`.
2024-07-09 05:57:03 +09:00
Koichi Sasada 685a4e5be7 VM barrier needs to store GC root
On the VM barrier waiting, it needs to store machine context
as a GC root.

Also it needs to wait for barrier synchronization correctly
by `while` (for continuous barrier sync by another ractor).

This is why GC marking misses are occerred on ARM machines.
like: https://rubyci.s3.amazonaws.com/fedora40-arm/ruby-master/log/20240702T063002Z.fail.html.gz
2024-07-05 21:20:54 +09:00
Jeremy Evans ef3803ed40 Ignore the result of pthread_kill in ubf_wakeup_thread
After an upgrade to Ruby 3.3.0, I experienced reproducible production crashes
of the form:

[BUG] pthread_kill: No such process (ESRCH)

This is the only pthread_kill call in Ruby. The result of pthread_kill was
previously ignored in Ruby 3.2 and below. Checking the result was added in
be1bbd5b7d (MaNy).

I have not yet been able to create a minimal self-contained example,
but it should be safe to remove the checks.
2024-05-07 09:44:25 -07:00
Daisuke Fujimura (fd0) 02d40b6c17 Use ubf list on cygwin 2024-03-29 08:13:25 +09:00
KJ Tsanaktsidis 48d3bdddba Move asan_fake_stack_handle to EC, not thread
It's really a property of the EC; each fiber (which has its own EC) also
has its own asan_fake_stack_handle.

[Bug #20310]
2024-03-25 14:57:04 +11:00
Koichi Sasada d578684989 `rb_thread_lock_native_thread()`
Introduce `rb_thread_lock_native_thread()` to allocate dedicated
native thread to the current Ruby thread for M:N threads.
This C API is similar to Go's `runtime.LockOSThread()`.

Accepted at https://github.com/ruby/dev-meeting-log/blob/master/2023/DevMeeting-2023-08-24.md
(and missed to implement on Ruby 3.3.0)
2024-02-21 15:38:29 +09:00
KJ Tsanaktsidis 719db18b50 notify ASAN about M:N threading stack switches
In a similar way to how we do it with fibers in cont.c, we need to call
__sanitize_start_switch_fiber and __sanitize_finish_switch_fiber around
the call to coroutine_transfer to let ASAN save & restore the fake stack
pointer.

When a M:N thread is exiting, we pass `to_dead` to the new
coroutine_transfer0 function, so that we can pass NULL for saving the
stack pointer. This signals to ASAN that the fake stack can be freed
(otherwise it would be leaked)

[Bug #20220]
2024-02-06 22:23:42 +11:00
Nobuyoshi Nakada c7e87b2118
Fix up [Bug #20001] 2024-01-23 01:42:32 +09:00
Peter Zhu d0b774cfb8 Remove null checks for xfree
xfree can handle null values, so we don't need to check it.
2024-01-19 10:25:02 -05:00
KJ Tsanaktsidis cabdaebc70 Make stack bounds detection work with ASAN
Where a local variable is used as part of the stack bounds detection, it
has to actually be on the stack. ASAN can put local variable on "fake
stacks", however, with addresses in different memory mappings. This
completely destroys the stack bounds calculation, and can lead to e.g.
things not getting GC marked on the machine stack or stackoverflow
checks that always fail.

The __asan_addr_is_in_fake_stack helper can be used to get the _real_
stack address of such variables, and thus perform the stack size
calculation properly

[Bug #20001]
2024-01-19 09:55:12 +11:00
KJ Tsanaktsidis 807714447e Pass down "stack start" variables from closer to the top of the stack
This commit changes how stack extents are calculated for both the main
thread and other threads. Ruby uses the address of a local variable as
part of the calculation for machine stack extents:

* pthreads uses it as a lower-bound on the start of the stack, because
  glibc (and maybe other libcs) can store its own data on the stack
  before calling into user code on thread creation.
* win32 uses it as an argument to VirtualQuery, which gets the extent of
  the memory mapping which contains the variable

However, the local being used for this is actually too low (too close to
the leaf function call) in both the main thread case and the new thread
case.

In the main thread case, we have the `INIT_STACK` macro, which is used
for pthreads to set the `native_main_thread->stack_start` value. This
value is correctly captured at the very top level of the program (in
main.c). However, this is _not_ what's used to set the execution context
machine stack (`th->ec->machine_stack.stack_start`); that gets set as
part of a call to `ruby_thread_init_stack` in `Init_BareVM`, using the
address of a local variable allocated _inside_ `Init_BareVM`. This is
too low; we need to use a local allocated closer to the top of the
program.

In the new thread case, the lolcal is allocated inside
`native_thread_init_stack`, which is, again, too low.

In both cases, this means that we might have VALUEs lying outside the
bounds of `th->ec->machine.stack_{start,end}`, which won't be marked
correctly by the GC machinery.

To fix this,

* In the main thread case: We already have `INIT_STACK` at the right
  level, so just pass that local var to `ruby_thread_init_stack`.
* In the new thread case: Allocate the local one level above the call to
  `native_thread_init_stack` in `call_thread_start_func2`.

[Bug #20001]

fix
2024-01-19 09:55:12 +11:00
KJ Tsanaktsidis 396e94666b Revert "Pass down "stack start" variables from closer to the top of the stack"
This reverts commit 4ba8f0dc99.
2024-01-12 17:58:54 +11:00
KJ Tsanaktsidis 6af0f442c7 Revert "Make stack bounds detection work with ASAN"
This reverts commit 6185cfdf38.
2024-01-12 17:58:54 +11:00
KJ Tsanaktsidis 6185cfdf38 Make stack bounds detection work with ASAN
Where a local variable is used as part of the stack bounds detection, it
has to actually be on the stack. ASAN can put local variable on "fake
stacks", however, with addresses in different memory mappings. This
completely destroys the stack bounds calculation, and can lead to e.g.
things not getting GC marked on the machine stack or stackoverflow
checks that always fail.

The __asan_addr_is_in_fake_stack helper can be used to get the _real_
stack address of such variables, and thus perform the stack size
calculation properly

[Bug #20001]
2024-01-12 17:29:48 +11:00
KJ Tsanaktsidis 4ba8f0dc99 Pass down "stack start" variables from closer to the top of the stack
The implementation of `native_thread_init_stack` for the various
threading models can use the address of a local variable as part of the
calculation of the machine stack extents:

* pthreads uses it as a lower-bound on the start of the stack, because
  glibc (and maybe other libcs) can store its own data on the stack
  before calling into user code on thread creation.
* win32 uses it as an argument to VirtualQuery, which gets the extent of
  the memory mapping which contains the variable

However, the local being used for this is actually allocated _inside_
the `native_thread_init_stack` frame; that means the caller might
allocate a VALUE on the stack that actually lies outside the bounds
stored in machine.stack_{start,end}.

A local variable from one level above the topmost frame that stores
VALUEs on the stack must be drilled down into the call to
`native_thread_init_stack` to be used in the calculation. This probably
doesn't _really_ matter for the win32 case (they'll be in the same
memory mapping so VirtualQuery should return the same thing), but
definitely could matter for the pthreads case.

[Bug #20001]
2024-01-12 17:29:48 +11:00
Shia 9368782d5c Use max_cpu when RUBY_MAX_CPU given 2024-01-02 08:16:29 +09:00
Daisuke Fujimura (fd0) daefbf8fbf Use native_thread_init_stack on cygwin 2023-12-24 08:06:03 +09:00
Koichi Sasada a4b737213e MN: ceil timeout milli seconds
`hrrel / RB_HRTIME_PER_MSEC` floor the timeout value and it can
return wrong value by `Mutex#sleep` (return Integer even if
it should return nil (timeout'ed)).

This patch ceil the value and the issue was solved.
2023-12-23 05:56:02 +09:00
JP Camara 8782e02138 KQueue support for M:N threads
* Allows macOS users to use M:N threads (and technically FreeBSD, though it has not been verified on FreeBSD)

* Include sys/event.h header check for macros, and include sys/event.h when present

* Rename epoll_fd to more generic kq_fd (Kernel event Queue) for use by both epoll and kqueue

* MAP_STACK is not available on macOS so conditionall apply it to mmap flags

* Set fd to close on exec

* Log debug messages specific to kqueue and epoll on creation

* close_invalidate raises an error for the kqueue fd on child process fork. It's unclear rn if that's a bug, or if it's kqueue specific behavior

Use kq with rb_thread_wait_for_single_fd

* Only platforms with `USE_POLL` (linux) had changes applied to take advantage of kernel event queues. It needed to be applied to the `select` so that kqueue could be properly applied

* Clean up kqueue specific code and make sure only flags that were actually set are removed (or an error is raised)

* Also handle kevent specific errnos, since most don't apply from epoll to kqueue

* Use the more platform standard close-on-exec approach of `fcntl` and `FD_CLOEXEC`. The io-event gem uses `ioctl`, but fcntl seems to be the recommended choice. It is also what Go, Bun, and Libuv use

* We're making changes in this file anyways - may as well fix a couple spelling mistakes while here

Make sure FD_CLOEXEC carries over in dup

* Otherwise the kqueue descriptor should have FD_CLOEXEC, but doesn't and fails in assert_close_on_exec
2023-12-20 16:23:38 +09:00
Koichi Sasada 6c4b04de5c clear `sched->lock_onwer` at fork
`sched->lock_owner` can be non-NULL at fork because the timer thread
can acquire the lock while forking. `lock_owner` information is for
debugging, so we only need to clear it at fork.

I hope this patch fixes the following assertion failure:
```
thread_pthread.c:354:thread_sched_lock_:sched->lock_owner == NULL
```
2023-12-19 02:26:37 +09:00
John Hawthorn b2ad4fec1a Add missing GVL hooks for M:N threads and ractors 2023-12-09 09:31:41 -08:00
John Hawthorn 85bc80a51b Revert "Add missing GVL hooks for M:N threads and ractors"
This reverts commit ad54fbf281.
2023-12-03 18:37:06 -08:00
John Hawthorn ad54fbf281 Add missing GVL hooks for M:N threads and ractors
[Bug #20019]

This fixes GVL instrumentation in three locations it was missing:
- Suspending when blocking on a Ractor
- Suspending when doing a coroutine transfer from an M:N thread
- Resuming after an M:N thread starts

Co-authored-by: Matthew Draper <matthew@trebex.net>
2023-12-02 10:06:07 -08:00
Jean Boussier 982641939c Further fix the GVL instrumentation API
Followup: https://github.com/ruby/ruby/pull/9029

[Bug #20019]

Some events still weren't triggered from the right place.

The test suite was also improved a bit more.
2023-11-28 20:06:55 +01:00
Jean Boussier 23a7714343 Refactor and fix the GVL instrumentation API
This entirely changes how it is tested. Rather than to use counters
we now record the timeline of events with associated threads which
makes it much easier to assert that certains events are only preceded
by a specific event, and makes it much easier to debug unexpected
timelines.

Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>
Co-Authored-By: JP Camara <jp@jpcamara.com>
Co-Authored-By: John Hawthorn <john@hawthorn.email>
2023-11-27 17:37:57 +01:00
Jean Boussier 9ca41e9991 GVL Instrumentation: pass thread->self as part of event data
Context: https://github.com/ivoanjo/gvl-tracing/pull/4

Some hooks may want to collect data on a per thread basis.
Right now the only way to identify the concerned thread is to
use `rb_nativethread_self()` or similar, but even then because
of the thread cache or MaNy, two distinct Ruby threads may report
the same native thread id.

By passing `thread->self`, hooks can use it as a key to store
the metadata.

NB: Most hooks are executed outside the GVL, so such data collection
need to use a thread-safe data-structure, and shouldn't use the
reference in other ways from inside the hook.

They must also either pin that value or handle compaction.
2023-11-13 08:45:20 +01:00
Sergey Fedorov e3b4fe1b76 thread_pthread.c: unbreak 10.5 Intel by restoring accidentally deleted macro 2023-11-01 23:29:15 +09:00
Koichi Sasada c9990c8d0f "+MN" in description
If `RUBY_MN_THREADS=1` is given, this patch shows `+MN` in
`RUBY_DESCRIPTION` like:

```
$ RUBY_MN_THREADS=1 ./miniruby  --yjit -v
ruby 3.3.0dev (2023-10-17T04:10:14Z master 908f8fffa2) +YJIT +MN [x86_64-linux]
```

Before this patch, a warning is displayed if `$VERBOSE` is given.
However it can make troubles with tests (with `$VERBOSE`), do not
show any warning with a MN threads configuration.
2023-10-17 17:43:52 +09:00
Kazuhiro NISHIYAMA fe08839d8a
Fix typos [ci skip] 2023-10-16 18:29:59 +09:00
Koichi Sasada eb79b0319b release sched_lock before VM lock
to avoid deadlock

```ruby
r = Ractor.new do
  obj = Thread.new{}
  Ractor.yield obj
rescue => e
  e.message
end
p r.take
```

```
(lldb) bt
* thread #1, name = 'miniruby', stop reason = signal SIGSTOP
  * frame #0: 0x0000ffff44881410 libpthread.so.0`__lll_lock_wait + 88
    frame #1: 0x0000ffff4487a078 libpthread.so.0`__pthread_mutex_lock + 232
    frame #2: 0x0000aaab617c0980 miniruby`rb_native_mutex_lock(lock=<unavailable>) at thread_pthread.c:109:14
    frame #3: 0x0000aaab617c1d58 miniruby`ubf_event_waiting [inlined] thread_sched_lock_(th=0x0000aaab9df82980, file=<unavailable>, line=46, sched=0x0000aaab9dec79b8) at thread_pthread.c:351:5
    frame #4: 0x0000aaab617c1d50 miniruby`ubf_event_waiting(ptr=0x0000aaab9df82980) at thread_pthread_mn.c:46:5
    frame #5: 0x0000aaab617c6020 miniruby`rb_threadptr_interrupt [inlined] rb_threadptr_interrupt_common(trap=0, th=0x0000aaab9df82980) at thread.c:352:25
    frame #6: 0x0000aaab617c5fec miniruby`rb_threadptr_interrupt(th=0x0000aaab9df82980) at thread.c:365:5
    frame #7: 0x0000aaab617379b0 miniruby`rb_ractor_terminate_all at ractor.c:2364:13
    frame #8: 0x0000aaab6173797c miniruby`rb_ractor_terminate_all at ractor.c:2383:17
    frame #9: 0x0000aaab61737958 miniruby`rb_ractor_terminate_all [inlined] ractor_terminal_interrupt_all(vm=0x0000aaab9dea3320) at ractor.c:2375:1
    frame #10: 0x0000aaab61737950 miniruby`rb_ractor_terminate_all at ractor.c:2424:13
    frame #11: 0x0000aaab6164f108 miniruby`rb_ec_cleanup(ec=0x0000aaab9dea5900, ex=RUBY_TAG_NONE) at eval.c:239:9
    frame #12: 0x0000aaab6164fa3c miniruby`ruby_run_node(n=0x0000ffff417ed178) at eval.c:328:12
    frame #13: 0x0000aaab615a5ab0 miniruby`main at main.c:39:12
    frame #14: 0x0000aaab615a5a98 miniruby`main(argc=<unavailable>, argv=<unavailable>) at main.c:58:12
    frame #15: 0x0000ffff44714b2c libc.so.6`__libc_start_main + 228
    frame #16: 0x0000aaab615a5b0c miniruby`_start + 52
(lldb) thread select 3
* thread #3, name = 'bootstraptest.*', stop reason = signal SIGSTOP
    frame #0: 0x0000ffff448813ec libpthread.so.0`__lll_lock_wait + 52
libpthread.so.0`__lll_lock_wait:
->  0xffff448813ec <+52>: svc    #0
    0xffff448813f0 <+56>: eor    w20, w20, #0x80
    0xffff448813f4 <+60>: sxtw   x20, w20
    0xffff448813f8 <+64>: b      0xffff44881414            ; <+92>
(lldb) bt
* thread #3, name = 'bootstraptest.*', stop reason = signal SIGSTOP
  * frame #0: 0x0000ffff448813ec libpthread.so.0`__lll_lock_wait + 52
    frame #1: 0x0000ffff4487a078 libpthread.so.0`__pthread_mutex_lock + 232
    frame #2: 0x0000aaab617c0980 miniruby`rb_native_mutex_lock(lock=<unavailable>) at thread_pthread.c:109:14
    frame #3: 0x0000aaab61823d68 miniruby`rb_vm_lock_enter_body [inlined] vm_lock_enter(no_barrier=false, lev=0x0000ffff215bfbe4, locked=false, vm=0x0000aaab9dea3320, cr=0x0000aaab9dec7890) at vm_sync.c:57:9
    frame #4: 0x0000aaab61823d60 miniruby`rb_vm_lock_enter_body(lev=0x0000ffff215bfbe4) at vm_sync.c:119:9
    frame #5: 0x0000aaab617c1b30 miniruby`thread_sched_setup_running_threads [inlined] rb_vm_lock_enter(file=<unavailable>, line=597, lev=0x0000ffff215bfbe4) at vm_sync.h:75:9
    frame #6: 0x0000aaab617c1b14 miniruby`thread_sched_setup_running_threads(vm=0x0000aaab9dea3320, add_th=0x0000aaab9df82980, del_th=<unavailable>, add_timeslice_th=0x0000000000000000, cr=<unavailable>, sched=<unavailable>, sched=<unavailable>) at thread_pthread.c:597:9
    frame #7: 0x0000aaab617c29b4 miniruby`thread_sched_wait_running_turn at thread_pthread.c:614:5
    frame #8: 0x0000aaab617c298c miniruby`thread_sched_wait_running_turn(sched=0x0000aaab9dec79b8, th=0x0000aaab9df82980, can_direct_transfer=true) at thread_pthread.c:868:9
    frame #9: 0x0000aaab617c6f0c miniruby`thread_sched_wait_events(sched=0x0000aaab9dec79b8, th=0x0000aaab9df82980, fd=<unavailable>, events=<unavailable>, rel=<unavailable>) at thread_pthread_mn.c:90:17
    frame #10: 0x0000aaab617c7354 miniruby`rb_thread_terminate_all at thread_pthread.c:3248:13
    frame #11: 0x0000aaab617c733c miniruby`rb_thread_terminate_all(th=0x0000aaab9df82980) at thread.c:466:13
    frame #12: 0x0000aaab617c7a64 miniruby`thread_start_func_2(th=0x0000aaab9df82980, stack_start=<unavailable>) at thread.c:713:9
    frame #13: 0x0000aaab617c7d1c miniruby`co_start [inlined] call_thread_start_func_2(th=0x0000aaab9df82980) at thread_pthread.c:2165:5
    frame #14: 0x0000aaab617c7cd0 miniruby`co_start(from=<unavailable>, self=0x0000aaab9df0f760) at thread_pthread_mn.c:421:9
```
2023-10-14 13:26:02 +09:00
Koichi Sasada 275c18525c Allow `NON_SCALAR_THREAD_ID` machines
s390x (Ubuntu) still fails tests with 62dfaeec2c.
2023-10-14 08:12:53 +09:00
Koichi Sasada 62dfaeec2c disable MN schedulers for some platforms
* on `__EMSCRIPTEN__` provides epoll* declarations, but no implementations.
* on `NON_SCALAR_THREAD_ID`, now we can not debug issues on x390s/Ubuntu so skip it.

x390s/RHEL works fine, so I think we can remove second limitation but
I could not login to it so it seems hard to debug now.
2023-10-14 00:52:51 +09:00
Koichi Sasada cdb36dfe7d fix `native_thread_destroy()` timing
With M:N thread scheduler, the native thread (NT) related resources
should be freed when the NT is no longer needed. So the calling
`native_thread_destroy()` at the end of `is will be freed when
`thread_cleanup_func()` (at the end of Ruby thread) is not correct
timing. Call it when the corresponding Ruby thread is collected.
2023-10-13 09:19:31 +09:00
Koichi Sasada 2dca02e273 disable MN scheduler on !`USE_MN_THREADS` 2023-10-13 02:11:29 +09:00
Nobuyoshi Nakada 2cd9aae4b7
Fix unused-function warning for 'ruby_ppoll' [ci skip] 2023-10-12 17:36:24 +09:00
Koichi Sasada be1bbd5b7d M:N thread scheduler for Ractors
This patch introduce M:N thread scheduler for Ractor system.

In general, M:N thread scheduler employs N native threads (OS threads)
to manage M user-level threads (Ruby threads in this case).
On the Ruby interpreter, 1 native thread is provided for 1 Ractor
and all Ruby threads are managed by the native thread.

From Ruby 1.9, the interpreter uses 1:1 thread scheduler which means
1 Ruby thread has 1 native thread. M:N scheduler change this strategy.

Because of compatibility issue (and stableness issue of the implementation)
main Ractor doesn't use M:N scheduler on default. On the other words,
threads on the main Ractor will be managed with 1:1 thread scheduler.

There are additional settings by environment variables:

`RUBY_MN_THREADS=1` enables M:N thread scheduler on the main ractor.
Note that non-main ractors use the M:N scheduler without this
configuration. With this configuration, single ractor applications
run threads on M:1 thread scheduler (green threads, user-level threads).

`RUBY_MAX_CPU=n` specifies maximum number of native threads for
M:N scheduler (default: 8).

This patch will be reverted soon if non-easy issues are found.

[Bug #19842]
2023-10-12 14:47:01 +09:00
KJ Tsanaktsidis 0117a6d389
Fix Thread#native_thread_id being cached across fork (#8418)
The native thread ID can and does change on some operating systems (e.g.
Linux) after forking, so it needs to be re-queried.

[Bug #19873]
2023-09-15 10:33:32 +09:00
Nobuyoshi Nakada 0765b890b5
Fix `USE_THREAD_CACHE=0` 2023-07-19 14:35:43 +09:00
Nobuyoshi Nakada 1c4a523006
Move `posix_signal` declaration internal with prefix `ruby_` 2023-07-17 21:31:59 +09:00
Nobuyoshi Nakada c1432a4816
Compile disabled code for thread cache always 2023-06-30 23:59:05 +09:00
KJ Tsanaktsidis 8e1abef469 Fix a potential busy-loop in the thread scheduler (esp. on FreeBSD)
This patch fixes a potential busy-loop in the thread scheduler. If there
are two threads, the main thread (where Ruby signal handlers must run)
and a sleeping thread, it is possible for the following sequence of
events to occur:

* The sleeping thread is in native_sleep -> sigwait_sleep A signal
* arives, kicking this thread out of rb_sigwait_sleep The sleeping
* thread calls THREAD_BLOCKING_END and eventually
  thread_sched_to_running_common
* the sleeping thread writes into the sigwait_fd pipe by calling
  rb_thread_wakeup_timer_thread
* the sleeping thread re-loops around in native_sleep() because
  the desired sleep time has not actually yet expired
* that calls rb_sigwait_sleep again the ppoll() in rb_sigwait_sleep
* immediately returns because
  of the byte written into the sigwait_fd by
rb_thread_wakeup_timer_thread
* that wakes the thread up again and kicks the whole cycle off again.

Such a loop can only be broken by the main thread waking up and handling
the signal, such that ubf_threads_empty() below becomes true again;
however this loop can actually keep things so busy (and cause so much
contention on the main thread's interrupt_lock) that the main thread
doesn't deal with the signal for many seconds. This seems particuarly
likely on FreeBSD 13.

(the cycle can also be broken by the sleeping thread finally elapsing
its desired sleep time).

The fix for _this_ loop is to only wakeup the timer thrad in
thread_sched_to_running_common if the current thread is not itself the
sigwait thread.

An almost identical loop also happens in the same circumstances because
the call to check_signals_nogvl (through sigwait_timeout) in
rb_sigwait_sleep returns true if there is any pending signal for the
main thread to handle. That then causes rb_sigwait_sleep to skip over
sleeping entirely.

This is unnescessary and counterproductive, I believe; if the main
thread needs to be woken up that is done inline in check_signals_nogvl
anyway.

See https://bugs.ruby-lang.org/issues/19680
2023-05-26 14:48:08 +09:00
Nobuyoshi Nakada 8d242a33af
`rb_bug` prints a newline after the message 2023-05-20 21:43:30 +09:00
Koichi Sasada f803bcfc87 pass `th` to `thread_sched_to_waiting()`
for future extension
2023-03-31 18:50:10 +09:00