This patch fixes a potential busy-loop in the thread scheduler. If there
are two threads, the main thread (where Ruby signal handlers must run)
and a sleeping thread, it is possible for the following sequence of
events to occur:
* The sleeping thread is in native_sleep -> sigwait_sleep A signal
* arives, kicking this thread out of rb_sigwait_sleep The sleeping
* thread calls THREAD_BLOCKING_END and eventually
thread_sched_to_running_common
* the sleeping thread writes into the sigwait_fd pipe by calling
rb_thread_wakeup_timer_thread
* the sleeping thread re-loops around in native_sleep() because
the desired sleep time has not actually yet expired
* that calls rb_sigwait_sleep again the ppoll() in rb_sigwait_sleep
* immediately returns because
of the byte written into the sigwait_fd by
rb_thread_wakeup_timer_thread
* that wakes the thread up again and kicks the whole cycle off again.
Such a loop can only be broken by the main thread waking up and handling
the signal, such that ubf_threads_empty() below becomes true again;
however this loop can actually keep things so busy (and cause so much
contention on the main thread's interrupt_lock) that the main thread
doesn't deal with the signal for many seconds. This seems particuarly
likely on FreeBSD 13.
(the cycle can also be broken by the sleeping thread finally elapsing
its desired sleep time).
The fix for _this_ loop is to only wakeup the timer thrad in
thread_sched_to_running_common if the current thread is not itself the
sigwait thread.
An almost identical loop also happens in the same circumstances because
the call to check_signals_nogvl (through sigwait_timeout) in
rb_sigwait_sleep returns true if there is any pending signal for the
main thread to handle. That then causes rb_sigwait_sleep to skip over
sleeping entirely.
This is unnescessary and counterproductive, I believe; if the main
thread needs to be woken up that is done inline in check_signals_nogvl
anyway.
See https://bugs.ruby-lang.org/issues/19680
Show native thread's serial on `RUBY_DEBUG_LOG`.
`nt->serial` is also stored into `ruby_nt_serial` if the compiler
supports `RB_THREAD_LOCAL_SPECIFIER`.
[Feature #19443]
Until recently most libc would cache `getpid()` so this was a cheap check to make.
However as of glibc version 2.25 the PID cache is removed and calls to getpid() always
invoke the actual system call which significantly degrades the performance of existing applications.
The reason glibc removed the cache is that some libraries were bypassing fork(2)
by issuing system calls themselves, causing stale cache issues.
That isn't a concern for Ruby as bypassing MRI's primitive for forking would
render the VM unusable, so we can safely cache the PID.
* Revert "Remove special handling of `SIGCHLD`. (#7482)"
This reverts commit 44a0711eab.
* Revert "Remove prototypes for functions that are no longer used. (#7497)"
This reverts commit 4dce12bead.
* Revert "Remove SIGCHLD `waidpid`. (#7476)"
This reverts commit 1658e7d966.
* Fix change to rjit variable name.
```
1) Failure:
TestThreadInstrumentation#test_thread_instrumentation [/tmp/ruby/src/trunk-repeat20-asserts/test/-ext-/thread/test_instrumentation_api.rb:33]:
Call counters[4]: [3, 4, 4, 4, 0].
Expected 0 to be > 0.
```
We fire the EXIT hook after the call to `thread_sched_to_dead` which
mean another thread might be running before the `EXIT` hook have been
executed.
[Bug #18900]
Thread#join and a few other codepaths are using native sleep as
a way to suspend the current thread. So we should call the relevant
hook when this happen, otherwise some thread may transition
directly from `RESUMED` to `READY`.
[Feature #18339]
After experimenting with the initial version of the API I figured there is a need
for an exit event to cleanup instrumentation data. e.g. if you record data in a
{thread_id -> data} table, you need to free associated data when a thread goes away.
Ref: https://bugs.ruby-lang.org/issues/18339
Design:
- This tries to minimize the overhead when no hook is registered.
It should only incur an extra unsynchronized boolean check.
- The hook list is protected with a read-write lock as to cause
contention when some hooks are registered.
- The hooks MUST be thread safe, and MUST NOT call into Ruby as they
are executed outside the GVL.
- It's simply a noop on Windows.
API:
```
rb_internal_thread_event_hook_t * rb_internal_thread_add_event_hook(rb_internal_thread_event_callback callback, rb_event_flag_t internal_event, void *user_data);
bool rb_internal_thread_remove_event_hook(rb_internal_thread_event_hook_t * hook);
```
You can subscribe to 3 events:
- READY: called right before attempting to acquire the GVL
- RESUMED: called right after successfully acquiring the GVL
- SUSPENDED: called right after releasing the GVL.
The hooks MUST be threadsafe, as they are executed outside of the GVL, they also MUST NOT call any Ruby API.
`NON_SCALAR_THREAD_ID` shows `pthread_t` is non-scalar (non-pointer)
and only s390x is known platform. However, the supporting code is
very complex and it is only used for deubg print information.
So this patch removes the support of `NON_SCALAR_THREAD_ID`
and make the code simple.
* add coroutines for ppc & ppc64
* fix universal coroutine to include ppc & ppc64
* add powerpc*-darwin to configure.ac
* fix thread_pthread for older systems
`rb_thread_t` contained `native_thread_data_t` to represent
thread implementation dependent data. This patch separates
them and rename it `rb_native_thread` and point it from
`rb_thraed_t`.
Now, 1 Ruby thread (`rb_thread_t`) has 1 native thread (`rb_native_thread`).
Now GVL is not process *Global* so this patch try to use
another words.
* `rb_global_vm_lock_t` -> `struct rb_thread_sched`
* `gvl->owner` -> `sched->running`
* `gvl->waitq` -> `sched->readyq`
* `rb_gvl_init` -> `rb_thread_sched_init`
* `gvl_destroy` -> `rb_thread_sched_destroy`
* `gvl_acquire` -> `thread_sched_to_running` # waiting -> ready -> running
* `gvl_release` -> `thread_sched_to_waiting` # running -> waiting
* `gvl_yield` -> `thread_sched_yield`
* `GVL_UNLOCK_BEGIN` -> `THREAD_BLOCKING_BEGIN`
* `GVL_UNLOCK_END` -> `THREAD_BLOCKING_END`
* removed
* `rb_ractor_gvl`
* `rb_vm_gvl_destroy` (not used)
There are GVL functions such as `rb_thread_call_without_gvl()` yet
but I don't have good name to replace them. Maybe GVL stands for
"Greate Valuable Lock" or something like that.
The last parameter of `ccan_list_top()` is to acquire the pointer
of the top of element, so `node.ubf` is no problem. But this context
it accesses gvl list, so `node.gvl` is better.
Must not be a bad idea to improve documents. [ci skip]
In fact many functions declared in the header file are already
documented more or less. They were just copy & pasted, with applying
some style updates.