Граф коммитов

1138 Коммитов

Автор SHA1 Сообщение Дата
Jeremy Evans 7f1fe5f091 Raise a TypeError for Thread#thread_variable{?,_get} for non-symbol
Previously, a TypeError was not raised if there were no thread
variables, because the conversion to symbol was done after that
check.  Convert to symbol before checking for whether thread
variables are set to make the behavior consistent.

Fixes [Bug #20606]
2024-07-06 12:07:32 +02:00
Aaron Patterson d9487dd011
Speed up chunkypng benchmark (#11087)
* Speed up chunkypng benchmark

Since d037c5196a we're seeing a slowdown
in ChunkyPNG benchmarks in YJIT bench. This patch addresses the
slowdown. Making the thread volatile speeds up the benchmark by 2 or 3%
on my machine.

```
before: ruby 3.4.0dev (2024-07-02T18:48:43Z master b2b8306b46) [x86_64-linux]
after: ruby 3.4.0dev (2024-07-02T20:07:44Z speed-chunkypng 418334dba9) [x86_64-linux]

----------  -----------  ----------  ----------  ----------  -------------  ------------
bench       before (ms)  stddev (%)  after (ms)  stddev (%)  after 1st itr  before/after
chunky-png  1000.2       0.1         980.6       0.1         1.02           1.02
----------  -----------  ----------  ----------  ----------  -------------  ------------
Legend:
- after 1st itr: ratio of before/after time for the first benchmarking iteration.
- before/after: ratio of before/after time. Higher is better for after. Above 1 represents a speedup.

Output:
./data/output_015.csv
```

* Update thread.c

Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>

---------

Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
2024-07-02 22:20:01 +00:00
Nobuyoshi Nakada c05f60a600
Suppress -Wclobbered warning for BLOCKING_REGION 2024-06-01 16:25:12 +09:00
Nobuyoshi Nakada da69c9235f
Fix -Wclobbered warnings 2024-05-29 17:38:26 +09:00
Nobuyoshi Nakada d037c5196a
Suppress -Wclobbered warnings
At 7afc16aa48, now `BLOCKING_REGION`
contains `setjmp` call in `RB_VM_SAVE_MACHINE_CONTEXT`.  By this
change, variables in blocks for this macro may be clobbered.
2024-05-20 01:30:49 +09:00
KJ Tsanaktsidis 7afc16aa48 Inline RB_VM_SAVE_MACHINE_CONTEXT into BLOCKING_REGION
There's an exhaustive explanation of this in the linked redmine bug, but
the short version is as follows:

blocking_region_begin can spill callee-saved registers to the stack for
its own use. That means they're not saved to ec->machine by the call to
setjmp, since by that point they're already on the stack and new,
different values are in the real registers. ec->machine's end-of-stack
pointer is also bumped to accomodate this, BUT, after
blocking_region_begin returns, that points past the end of the stack!

As far as C is concerned, that's fine; the callee-saved registers are
restored when blocking_region_begin returns. But, if another thread
triggers GC, it is relying on finding references to Ruby objects by
walking the stack region pointed to by ec->machine.

If the C code in exec; subsequently does things that use that stack
memory, then the value will be overwritten and the GC might prematurely
collect something it shouldn't.

[Bug #20493]
2024-05-19 12:08:35 +09:00
Jean Boussier f06670c5a2 Eliminate usage of OBJ_FREEZE_RAW
Previously it would bypass the `FL_ABLE` check, but
since shapes introduction, it started having a different
behavior than `OBJ_FREEZE`, as it would onyl set the `FL_FREEZE`
flag, but not update the shape.

I have no indication of this causing a bug yet, but it seems
like a trap waiting to happen.
2024-04-16 17:20:35 +02:00
Samuel Williams a7ff264477
Don't clear pending interrupts in the parent process. (#10365) 2024-03-27 10:10:07 +13:00
Takashi Kokubun 696b2716e0 Return stdbool from recursive_check()
The return value is used as a boolean value in C. Since it's not used as
a Ruby object, it just seems confusing that it returns a VALUE.
2024-03-26 11:33:02 -07:00
Takashi Kokubun 16cf9047c6 [DOC] Fix a couple other descriptions
similarly to 332f4938cf
2024-03-26 11:24:44 -07:00
Takashi Kokubun 332f4938cf [DOC] Fix a description about rb_exec_recursive_outer
It gives true/TRUE (int) instead of Qtrue (VALUE).
2024-03-26 11:21:32 -07:00
KJ Tsanaktsidis 48d3bdddba Move asan_fake_stack_handle to EC, not thread
It's really a property of the EC; each fiber (which has its own EC) also
has its own asan_fake_stack_handle.

[Bug #20310]
2024-03-25 14:57:04 +11:00
Nobuyoshi Nakada e2a9b87126
`rb_thread_sched_destroy` is not used now at all 2024-03-22 18:53:44 +09:00
Nobuyoshi Nakada 127d7a35df
Some functions are not used when `THREAD_MODEL=none` 2024-03-22 18:20:13 +09:00
Nobuyoshi Nakada 28a2105a55
Prefer `enum ruby_tag_type` over `int` 2024-03-17 15:57:19 +09:00
Samuel Williams 6a0b05f413
Remove `SAVE_ROOT_JMPBUF` as it no longer has any effect. (#10066) 2024-02-22 22:35:54 +13:00
Samuel Williams 78d9fe6947
Ensure that exiting thread invokes end-of-life behaviour. (#10039) 2024-02-22 00:32:59 +13:00
Yusuke Endoh 25d74b9527 Do not include a backtick in error messages and backtraces
[Feature #16495]
2024-02-15 18:42:31 +09:00
Nobuyoshi Nakada c7e87b2118
Fix up [Bug #20001] 2024-01-23 01:42:32 +09:00
KJ Tsanaktsidis 61da90c1b8 Mark asan fake stacks during machine stack marking
ASAN leaves a pointer to the fake frame on the stack; we can use the
__asan_addr_is_in_fake_stack API to work out the extent of the fake
stack and thus mark any VALUEs contained therein.

[Bug #20001]
2024-01-19 09:55:12 +11:00
KJ Tsanaktsidis 807714447e Pass down "stack start" variables from closer to the top of the stack
This commit changes how stack extents are calculated for both the main
thread and other threads. Ruby uses the address of a local variable as
part of the calculation for machine stack extents:

* pthreads uses it as a lower-bound on the start of the stack, because
  glibc (and maybe other libcs) can store its own data on the stack
  before calling into user code on thread creation.
* win32 uses it as an argument to VirtualQuery, which gets the extent of
  the memory mapping which contains the variable

However, the local being used for this is actually too low (too close to
the leaf function call) in both the main thread case and the new thread
case.

In the main thread case, we have the `INIT_STACK` macro, which is used
for pthreads to set the `native_main_thread->stack_start` value. This
value is correctly captured at the very top level of the program (in
main.c). However, this is _not_ what's used to set the execution context
machine stack (`th->ec->machine_stack.stack_start`); that gets set as
part of a call to `ruby_thread_init_stack` in `Init_BareVM`, using the
address of a local variable allocated _inside_ `Init_BareVM`. This is
too low; we need to use a local allocated closer to the top of the
program.

In the new thread case, the lolcal is allocated inside
`native_thread_init_stack`, which is, again, too low.

In both cases, this means that we might have VALUEs lying outside the
bounds of `th->ec->machine.stack_{start,end}`, which won't be marked
correctly by the GC machinery.

To fix this,

* In the main thread case: We already have `INIT_STACK` at the right
  level, so just pass that local var to `ruby_thread_init_stack`.
* In the new thread case: Allocate the local one level above the call to
  `native_thread_init_stack` in `call_thread_start_func2`.

[Bug #20001]

fix
2024-01-19 09:55:12 +11:00
KJ Tsanaktsidis 396e94666b Revert "Pass down "stack start" variables from closer to the top of the stack"
This reverts commit 4ba8f0dc99.
2024-01-12 17:58:54 +11:00
KJ Tsanaktsidis 688a6ff510 Revert "Mark asan fake stacks during machine stack marking"
This reverts commit d10bc3a2b8.
2024-01-12 17:58:54 +11:00
KJ Tsanaktsidis d10bc3a2b8 Mark asan fake stacks during machine stack marking
ASAN leaves a pointer to the fake frame on the stack; we can use the
__asan_addr_is_in_fake_stack API to work out the extent of the fake
stack and thus mark any VALUEs contained therein.

[Bug #20001]
2024-01-12 17:29:48 +11:00
KJ Tsanaktsidis 4ba8f0dc99 Pass down "stack start" variables from closer to the top of the stack
The implementation of `native_thread_init_stack` for the various
threading models can use the address of a local variable as part of the
calculation of the machine stack extents:

* pthreads uses it as a lower-bound on the start of the stack, because
  glibc (and maybe other libcs) can store its own data on the stack
  before calling into user code on thread creation.
* win32 uses it as an argument to VirtualQuery, which gets the extent of
  the memory mapping which contains the variable

However, the local being used for this is actually allocated _inside_
the `native_thread_init_stack` frame; that means the caller might
allocate a VALUE on the stack that actually lies outside the bounds
stored in machine.stack_{start,end}.

A local variable from one level above the topmost frame that stores
VALUEs on the stack must be drilled down into the call to
`native_thread_init_stack` to be used in the calculation. This probably
doesn't _really_ matter for the win32 case (they'll be in the same
memory mapping so VirtualQuery should return the same thing), but
definitely could matter for the pthreads case.

[Bug #20001]
2024-01-12 17:29:48 +11:00
Koichi Sasada 41dd15944f fix `rb_thread_wait_for_single_fd` on non MN case
`rb_thread_wait_for_single_fd(fd)` waits until `fd` is ready.
Without MN it shouldn't use `thread_io_wait_events()` for the
retry checking (alwasy false if MN is not active).
2024-01-09 05:43:28 +09:00
Nobuyoshi Nakada c30b8ae947
Adjust styles and indents [ci skip] 2024-01-08 00:50:41 +09:00
Koichi Sasada d65d2fb6b5 Do not `poll` first
Before this patch, the MN scheduler waits for the IO with the
following steps:

1. `poll(fd, timeout=0)` to check fd is ready or not.
2. if fd is not ready, waits with MN thread scheduler
3. call `func` to issue the blocking I/O call

The advantage of advanced `poll()` is we can wait for the
IO ready for any fds. However `poll()` becomes overhead
for already ready fds.

This patch changes the steps like:

1. call `func` to issue the blocking I/O call
2. if the `func` returns `EWOULDBLOCK` the fd is `O_NONBLOCK`
   and we need to wait for fd is ready so that waits with MN
   thread scheduler.

In this case, we can wait only for `O_NONBLOCK` fds. Otherwise
it waits with blocking operations such as `read()` system call.
However we don't need to call `poll()` to check fd is ready
in advance.

With this patch we can observe performance improvement
on microbenchmark which repeats blocking I/O (not
`O_NONBLOCK` fd) with and without MN thread scheduler.

```ruby
require 'benchmark'

f = open('/dev/null', 'w')
f.sync = true

TN = 1
N = 1_000_000 / TN

Benchmark.bm{|x|
  x.report{
    TN.times.map{
      Thread.new{
        N.times{f.print '.'}
      }
    }.each(&:join)
  }
}
__END__
TN = 1
                 user     system      total        real
ruby32       0.393966   0.101122   0.495088 (  0.495235)
ruby33       0.493963   0.089521   0.583484 (  0.584091)
ruby33+MN    0.639333   0.200843   0.840176 (  0.840291) <- Slow
this+MN      0.512231   0.099091   0.611322 (  0.611074) <- Good
```
2024-01-05 05:51:25 +09:00
Koichi Sasada 541371e286 accept `RB_WAITFD_IN | RB_WAITFD_OUT` for waiting events
Assrsion was `events == RB_WAITFD_IN || events == RB_WAITFD_OUT`
but it should accept `RB_WAITFD_IN | RB_WAITFD_OUT`.
2023-12-24 14:53:46 +09:00
Koichi Sasada fa5de8f68d MN: skip waiting on fiber schedulers
If the Fiber is nonblocking mode, fiber scheduler needs to handle
IO events.
2023-12-23 08:10:41 +09:00
Koichi Sasada bbfc262c99 MN: fix "raise on close"
Introduce `thread_io_wait_events()` to make 1 function to call
`thread_sched_wait_events()`.

In ``thread_io_wait_events()`, manipulate `waiting_fd` to raise
an exception when closing the IO correctly.
2023-12-23 05:56:02 +09:00
JP Camara 256f34ab6b Hand thread into `thread_sched_wait_events_timeval`
* When we have the thread already, it saves a lookup

* `event_wait`, not `kq`

Clean up the `thread_sched_wait_events_timeval` calls

* By handling the PTHREAD check inside the function, all the other code can become much simpler and just call the function directly without additional checks
2023-12-20 16:23:38 +09:00
JP Camara 8782e02138 KQueue support for M:N threads
* Allows macOS users to use M:N threads (and technically FreeBSD, though it has not been verified on FreeBSD)

* Include sys/event.h header check for macros, and include sys/event.h when present

* Rename epoll_fd to more generic kq_fd (Kernel event Queue) for use by both epoll and kqueue

* MAP_STACK is not available on macOS so conditionall apply it to mmap flags

* Set fd to close on exec

* Log debug messages specific to kqueue and epoll on creation

* close_invalidate raises an error for the kqueue fd on child process fork. It's unclear rn if that's a bug, or if it's kqueue specific behavior

Use kq with rb_thread_wait_for_single_fd

* Only platforms with `USE_POLL` (linux) had changes applied to take advantage of kernel event queues. It needed to be applied to the `select` so that kqueue could be properly applied

* Clean up kqueue specific code and make sure only flags that were actually set are removed (or an error is raised)

* Also handle kevent specific errnos, since most don't apply from epoll to kqueue

* Use the more platform standard close-on-exec approach of `fcntl` and `FD_CLOEXEC`. The io-event gem uses `ioctl`, but fcntl seems to be the recommended choice. It is also what Go, Bun, and Libuv use

* We're making changes in this file anyways - may as well fix a couple spelling mistakes while here

Make sure FD_CLOEXEC carries over in dup

* Otherwise the kqueue descriptor should have FD_CLOEXEC, but doesn't and fails in assert_close_on_exec
2023-12-20 16:23:38 +09:00
Koichi Sasada 2fe5fc176b setup `waiting_fd` for `thread_sched_wait_events()`
`thread_sched_wait_events()` suspend the thread until the target
fd is ready. Howver, other threads can close the target fd and
suspended thread should be awake. To support it, setup `waiting_fd`
before `thread_sched_wait_events()`.

`rb_thread_io_wake_pending_closer()` should be called before
`RUBY_VM_CHECK_INTS_BLOCKING()` because it can return this function.

This patch introduces additional overhead (setup/cleanup `waiting_fd`)
and maybe we can reduce the overhead.
2023-12-20 07:00:41 +09:00
KJ Tsanaktsidis f8effa209a Change the semantics of rb_postponed_job_register
Our current implementation of rb_postponed_job_register suffers from
some safety issues that can lead to interpreter crashes (see bug #1991).
Essentially, the issue is that jobs can be called with the wrong
arguments.

We made two attempts to fix this whilst keeping the promised semantics,
but:
  * The first one involved masking/unmasking when flushing jobs, which
    was believed to be too expensive
  * The second one involved a lock-free, multi-producer, single-consumer
    ringbuffer, which was too complex

The critical insight behind this third solution is that essentially the
only user of these APIs are a) internal, or b) profiling gems.

For a), none of the usages actually require variable data; they will
work just fine with the preregistration interface.

For b), generally profiling gems only call a single callback with a
single piece of data (which is actually usually just zero) for the life
of the program. The ringbuffer is complex because it needs to support
multi-word inserts of job & data (which can't be atomic); but nobody
actually even needs that functionality, really.

So, this comit:
  * Introduces a pre-registration API for jobs, with a GVL-requiring
    rb_postponed_job_prereigster, which returns a handle which can be
    used with an async-signal-safe rb_postponed_job_trigger.
  * Deprecates rb_postponed_job_register (and re-implements it on top of
    the preregister function for compatability)
  * Moves all the internal usages of postponed job register
    pre-registration
2023-12-10 15:00:37 +09:00
Koichi Sasada 352a885a0f Thread specific storage APIs
This patch introduces thread specific storage APIs
for tools which use `rb_internal_thread_event_hook` APIs.

* `rb_internal_thread_specific_key_create()` to create a tool specific
  thread local storage key and allocate the storage if not available.
* `rb_internal_thread_specific_set()` sets a data to thread and tool
  specific storage.
* `rb_internal_thread_specific_get()` gets a data in thread and tool
  specific storage.

Note that `rb_internal_thread_specific_get|set(thread_val, key)`
can be called without GVL and safe for async signal and safe for
multi-threading (native threads). So you can call it in any internal
thread event hooks. Further more you can call it from other native
threads. Of course `thread_val` should be living while accessing the
data from this function.

Note that you should not forget to clean up the set data.
2023-12-08 13:16:19 +09:00
Jean Boussier 23a7714343 Refactor and fix the GVL instrumentation API
This entirely changes how it is tested. Rather than to use counters
we now record the timeline of events with associated threads which
makes it much easier to assert that certains events are only preceded
by a specific event, and makes it much easier to debug unexpected
timelines.

Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>
Co-Authored-By: JP Camara <jp@jpcamara.com>
Co-Authored-By: John Hawthorn <john@hawthorn.email>
2023-11-27 17:37:57 +01:00
Jean Boussier c8d99fa662 Embed ThreadGroup object
These are rare but embedding them is trivial and make them fit
in a 40B slot.
2023-11-22 18:42:04 +01:00
Jean Boussier 9ca41e9991 GVL Instrumentation: pass thread->self as part of event data
Context: https://github.com/ivoanjo/gvl-tracing/pull/4

Some hooks may want to collect data on a per thread basis.
Right now the only way to identify the concerned thread is to
use `rb_nativethread_self()` or similar, but even then because
of the thread cache or MaNy, two distinct Ruby threads may report
the same native thread id.

By passing `thread->self`, hooks can use it as a key to store
the metadata.

NB: Most hooks are executed outside the GVL, so such data collection
need to use a thread-safe data-structure, and shouldn't use the
reference in other ways from inside the hook.

They must also either pin that value or handle compaction.
2023-11-13 08:45:20 +01:00
Koichi Sasada cdb36dfe7d fix `native_thread_destroy()` timing
With M:N thread scheduler, the native thread (NT) related resources
should be freed when the NT is no longer needed. So the calling
`native_thread_destroy()` at the end of `is will be freed when
`thread_cleanup_func()` (at the end of Ruby thread) is not correct
timing. Call it when the corresponding Ruby thread is collected.
2023-10-13 09:19:31 +09:00
Koichi Sasada be1bbd5b7d M:N thread scheduler for Ractors
This patch introduce M:N thread scheduler for Ractor system.

In general, M:N thread scheduler employs N native threads (OS threads)
to manage M user-level threads (Ruby threads in this case).
On the Ruby interpreter, 1 native thread is provided for 1 Ractor
and all Ruby threads are managed by the native thread.

From Ruby 1.9, the interpreter uses 1:1 thread scheduler which means
1 Ruby thread has 1 native thread. M:N scheduler change this strategy.

Because of compatibility issue (and stableness issue of the implementation)
main Ractor doesn't use M:N scheduler on default. On the other words,
threads on the main Ractor will be managed with 1:1 thread scheduler.

There are additional settings by environment variables:

`RUBY_MN_THREADS=1` enables M:N thread scheduler on the main ractor.
Note that non-main ractors use the M:N scheduler without this
configuration. With this configuration, single ractor applications
run threads on M:1 thread scheduler (green threads, user-level threads).

`RUBY_MAX_CPU=n` specifies maximum number of native threads for
M:N scheduler (default: 8).

This patch will be reverted soon if non-easy issues are found.

[Bug #19842]
2023-10-12 14:47:01 +09:00
Matthew Draper aed5215104 Optimize handle_interrupt(Exception => ..) as a common case
When interrupt behavior is configured for all possible exceptions using
'Exception', there's no need to iterate the pending exception's
ancestors for hash lookups.

More significantly, by storing the catch-all timing symbol directly in
the mask stack, we can skip allocating the hash we would otherwise need.
2023-09-07 13:51:15 -07:00
Matthew Draper ed712e0e9d Skip allocation if handle_interrupt arg is already usable
If the supplied hash is already frozen and compare-by-identity, we can
use it directly (still checking its contents are valid symbols), without
making a new copy.
2023-09-07 13:51:15 -07:00
Peter Zhu 3223181284 Remove RARRAY_CONST_PTR_TRANSIENT
RARRAY_CONST_PTR now does the same things as RARRAY_CONST_PTR_TRANSIENT.
2023-07-13 14:48:14 -04:00
Nobuyoshi Nakada c1432a4816
Compile disabled code for thread cache always 2023-06-30 23:59:05 +09:00
Peter Zhu 58386814a7 Don't check for null pointer in calls to free
According to the C99 specification section 7.20.3.2 paragraph 2:

> If ptr is a null pointer, no action occurs.

So we do not need to check that the pointer is a null pointer.
2023-06-30 09:13:31 -04:00
Samuel Williams 0402193723
Fix `Thread#join(timeout)` when running inside the fiber scheduler. (#7903) 2023-06-03 12:41:36 +09:00
KJ Tsanaktsidis edee9b6a12
Use a real Ruby mutex in rb_io_close_wait_list (#7884)
Because a thread calling IO#close now blocks in a native condvar wait,
it's possible for there to be _no_ threads left to actually handle
incoming signals/ubf calls/etc.

This manifested as failing tests on Solaris 10 (SPARC), because:

* One thread called IO#close, which sent a SIGVTALRM to the other
  thread to interrupt it, and then waited on the condvar to be notified
  that the reading thread was done.
* One thread was calling IO#read, but it hadn't yet reached the actual
  call to select(2) when the SIGVTALRM arrived, so it never unblocked
  itself.

This results in a deadlock.

The fix is to use a real Ruby mutex for the close lock; that way, the
closing thread goes into sigwait-sleep and can keep trying to interrupt
the select(2) thread.

See the discussion in: https://github.com/ruby/ruby/pull/7865/
2023-06-01 17:37:18 +09:00
git cc698c6cc2 * expand tabs. [ci skip]
Please consider using misc/expand_tabs.rb as a pre-commit hook.
2023-05-26 05:51:35 +00:00
KJ Tsanaktsidis 66871c5a06 Fix busy-loop when waiting for file descriptors to close
When one thread is closing a file descriptor whilst another thread is
concurrently reading it, we need to wait for the reading thread to be
done with it to prevent a potential EBADF (or, worse, file descriptor
reuse).

At the moment, that is done by keeping a list of threads still using the
file descriptor in io_close_fptr. It then continually calls
rb_thread_schedule() in fptr_finalize_flush until said list is empty.

That busy-looping seems to behave rather poorly on some OS's,
particulary FreeBSD. It can cause the TestIO#test_race_gets_and_close
test to fail (even with its very long 200 second timeout) because the
closing thread starves out the using thread.

To fix that, I introduce the concept of struct rb_io_close_wait_list; a
list of threads still using a file descriptor that we want to close. We
call `rb_notify_fd_close` to let the thread scheduler know we're closing
a FD, which fills the list with threads. Then, we call
rb_notify_fd_close_wait which will block the thread until all of the
still-using threads are done.

This is implemented with a condition variable sleep, so no busy-looping
is required.
2023-05-26 14:51:23 +09:00