This patch introduce M:N thread scheduler for Ractor system.
In general, M:N thread scheduler employs N native threads (OS threads)
to manage M user-level threads (Ruby threads in this case).
On the Ruby interpreter, 1 native thread is provided for 1 Ractor
and all Ruby threads are managed by the native thread.
From Ruby 1.9, the interpreter uses 1:1 thread scheduler which means
1 Ruby thread has 1 native thread. M:N scheduler change this strategy.
Because of compatibility issue (and stableness issue of the implementation)
main Ractor doesn't use M:N scheduler on default. On the other words,
threads on the main Ractor will be managed with 1:1 thread scheduler.
There are additional settings by environment variables:
`RUBY_MN_THREADS=1` enables M:N thread scheduler on the main ractor.
Note that non-main ractors use the M:N scheduler without this
configuration. With this configuration, single ractor applications
run threads on M:1 thread scheduler (green threads, user-level threads).
`RUBY_MAX_CPU=n` specifies maximum number of native threads for
M:N scheduler (default: 8).
This patch will be reverted soon if non-easy issues are found.
[Bug #19842]
* Revert "Extract `do_mutex_lock_check_interrupts` to try and fix `ppc64le`. (#8393)"
This reverts commit 5184b40dd4.
* .travis.yml: Try default gcc 9.4.0 instead of gcc-10 in ppc64le and s390x.
Use gcc 9.4.0 instead of gcc-10 to avoid the current failures by a possible GCC
10 compiler bug in the Travis ppc64le and s390x cases. And it also aligns with
RubyCI Ubuntu ppc64le and s390x where the default gcc is used.
---------
Co-authored-by: Jun Aruga <jaruga@ruby-lang.org>
We found some tests were hanging in `do_mutex_lock`, specifically the
fiber scheduler autoload test. After much investigation, it may be a code
generation bug. Because we didn't change the code, but only extracted it
into a separate function, and it appears to fix the problem.
It's possible (but very rare) to have a race condition between setting
`mutex->fiber = NULL` and `thread_mutex_remove(th, mutex)` which results
in the following bug:
```
[BUG] invalid keeping_mutexes: Attempt to unlock a mutex which is not locked
```
Fixes <https://bugs.ruby-lang.org/issues/19480>.
Fixes the following compilation warnings:
thread_sync.c:908:48: warning: taking address of packed member of `struct rb_queue` may result in an unaligned pointer value [-Waddress-of-packed-member]
thread_sync.c:1181:48: warning: taking address of packed member of `struct rb_queue` may result in an unaligned pointer value [-Waddress-of-packed-member]
[Bug #19105]
If no fiber scheduler is registered and the fiber that
owns the lock and the one that try to acquire it
both belong to the same thread, we're in a deadlock case.
Co-authored-by: Jean Boussier <byroot@ruby-lang.org>
[Feature #18982]
Instead of introducing an `exception: false` argument to have `non_block`
return nil rather than raise, we can clearly document that a timeout of 0
immediately returns.
The code is refactored a bit to avoid doing a time calculation in
such case.
When I removed the SizeQueue#push timeout from my PR, I forgot to
update the `queue_sleep` parameters to be a `queue_sleep_arg`.
Somehow this worked on most archs, but on Solaris/Sparc it would
legitimately crash when trying to access the `timeout` and `end`
members of the struct.
rb_ary_tmp_new suggests that the array is temporary in some way, but
that's not true, it just creates an array that's hidden and not on the
transient heap. This commit renames it to rb_ary_hidden_new.
mutex_mark is (basically) NULL, so we don't have any references to mark.
This means we should safely be able to mark Mutex as WB_PROTECTED
without changing anything else.
* Rename `rb_scheduler` to `rb_fiber_scheduler`.
* Use public interface if available.
* Use `rb_check_funcall` where possible.
* Don't use `unblock` unless the fiber was non-blocking.
When a scheduler is present, it's entirely possible for
`th->keeping_mutexes` to be updated while enumerating the waitq. Therefore
it must be fetched only during the removal operation.
* When there is a scheduler, the Fiber that would be blocked has already
been rescheduled and there is no point to interrupt something else.
That blocked Fiber will be rescheduled as the next call to the scheduler
(e.g., IO, sleep, other blocking sync).
* See discussion on https://github.com/ruby/ruby/commit/d01954632d