It seems that the Ractor sleep GVL event arrives very slightly after the
value becomes available and other threads wake (which makes sense) so we
need a little additional time to ensure we end up in a consisteny state.
[Bug #20019]
This fixes GVL instrumentation in three locations it was missing:
- Suspending when blocking on a Ractor
- Suspending when doing a coroutine transfer from an M:N thread
- Resuming after an M:N thread starts
Co-authored-by: Matthew Draper <matthew@trebex.net>
This entirely changes how it is tested. Rather than to use counters
we now record the timeline of events with associated threads which
makes it much easier to assert that certains events are only preceded
by a specific event, and makes it much easier to debug unexpected
timelines.
Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>
Co-Authored-By: JP Camara <jp@jpcamara.com>
Co-Authored-By: John Hawthorn <john@hawthorn.email>
Context: https://github.com/ivoanjo/gvl-tracing/pull/4
Some hooks may want to collect data on a per thread basis.
Right now the only way to identify the concerned thread is to
use `rb_nativethread_self()` or similar, but even then because
of the thread cache or MaNy, two distinct Ruby threads may report
the same native thread id.
By passing `thread->self`, hooks can use it as a key to store
the metadata.
NB: Most hooks are executed outside the GVL, so such data collection
need to use a thread-safe data-structure, and shouldn't use the
reference in other ways from inside the hook.
They must also either pin that value or handle compaction.
We saw the following failure:
```
TestThreadInstrumentation#test_thread_instrumentation [/tmp/ruby/v3/src/trunk-random3/test/-ext-/thread/test_instrumentation_api.rb:25]:
Expected 0..3 to include 4.
```
Which shouldn't happen unless somehow there was a leaked thread.
Recently `TestThreadInstrumentation#test_join_counters` often fails as
```
<[1, 1, 1]> expected but was
<[2, 2, 2]>.
```
Probably it seems that the thread is suspended more than once.
There may be no guarantee that the subject thread never be suspended
more than once.
[Bug #18900]
Thread#join and a few other codepaths are using native sleep as
a way to suspend the current thread. So we should call the relevant
hook when this happen, otherwise some thread may transition
directly from `RESUMED` to `READY`.
I suspect that sometimes on CI the last thread is prempted before eaching the exit hook
causing the test to flake. I can't find a good way to force it to run.
It previously failed with:
```
1) Failure:
TestThreadInstrumentation#test_thread_instrumentation_fork_safe [/home/runner/work/ruby/ruby/src/test/-ext-/thread/test_instrumentation_api.rb:50]:
<5> expected but was
<4>.
```
Suggesting one `EXIT` event wasn't fired or processed.
Adding an assetion on `Thead#status` may help figure out what is wrong.
[Feature #18339]
After experimenting with the initial version of the API I figured there is a need
for an exit event to cleanup instrumentation data. e.g. if you record data in a
{thread_id -> data} table, you need to free associated data when a thread goes away.
`test_thread_instrumentation_fork_safe` has been failing occasionaly
on Ubuntu and Arch. At this stage we're not sure why, all we know is
that the child exit with status 1.
I suspect that something entirely unrelated cause the forked children
to fail on exit, so by using `exit!(0)` and doing assertions in the
parent I hope to be resilient to that.
Ref: https://bugs.ruby-lang.org/issues/18339
Design:
- This tries to minimize the overhead when no hook is registered.
It should only incur an extra unsynchronized boolean check.
- The hook list is protected with a read-write lock as to cause
contention when some hooks are registered.
- The hooks MUST be thread safe, and MUST NOT call into Ruby as they
are executed outside the GVL.
- It's simply a noop on Windows.
API:
```
rb_internal_thread_event_hook_t * rb_internal_thread_add_event_hook(rb_internal_thread_event_callback callback, rb_event_flag_t internal_event, void *user_data);
bool rb_internal_thread_remove_event_hook(rb_internal_thread_event_hook_t * hook);
```
You can subscribe to 3 events:
- READY: called right before attempting to acquire the GVL
- RESUMED: called right after successfully acquiring the GVL
- SUSPENDED: called right after releasing the GVL.
The hooks MUST be threadsafe, as they are executed outside of the GVL, they also MUST NOT call any Ruby API.