Граф коммитов

416 Коммитов

Автор SHA1 Сообщение Дата
KJ Tsanaktsidis 6af0f442c7 Revert "Make stack bounds detection work with ASAN"
This reverts commit 6185cfdf38.
2024-01-12 17:58:54 +11:00
KJ Tsanaktsidis 33a03cb236 Revert "Define special macros for asan/msan being enabled"
This reverts commit bdafad8790.
2024-01-12 17:58:54 +11:00
KJ Tsanaktsidis 688a6ff510 Revert "Mark asan fake stacks during machine stack marking"
This reverts commit d10bc3a2b8.
2024-01-12 17:58:54 +11:00
KJ Tsanaktsidis d10bc3a2b8 Mark asan fake stacks during machine stack marking
ASAN leaves a pointer to the fake frame on the stack; we can use the
__asan_addr_is_in_fake_stack API to work out the extent of the fake
stack and thus mark any VALUEs contained therein.

[Bug #20001]
2024-01-12 17:29:48 +11:00
KJ Tsanaktsidis bdafad8790 Define special macros for asan/msan being enabled
__has_feature is a clang-ism, and GCC has a different way to tell if
sanitizers are enabled. For this reason, I don't want to spray
__has_feature all over the codebase for other places where conditional
compilation based on sanitizers is required.

[Bug #20001]
2024-01-12 17:29:48 +11:00
KJ Tsanaktsidis 6185cfdf38 Make stack bounds detection work with ASAN
Where a local variable is used as part of the stack bounds detection, it
has to actually be on the stack. ASAN can put local variable on "fake
stacks", however, with addresses in different memory mappings. This
completely destroys the stack bounds calculation, and can lead to e.g.
things not getting GC marked on the machine stack or stackoverflow
checks that always fail.

The __asan_addr_is_in_fake_stack helper can be used to get the _real_
stack address of such variables, and thus perform the stack size
calculation properly

[Bug #20001]
2024-01-12 17:29:48 +11:00
KJ Tsanaktsidis 4ba8f0dc99 Pass down "stack start" variables from closer to the top of the stack
The implementation of `native_thread_init_stack` for the various
threading models can use the address of a local variable as part of the
calculation of the machine stack extents:

* pthreads uses it as a lower-bound on the start of the stack, because
  glibc (and maybe other libcs) can store its own data on the stack
  before calling into user code on thread creation.
* win32 uses it as an argument to VirtualQuery, which gets the extent of
  the memory mapping which contains the variable

However, the local being used for this is actually allocated _inside_
the `native_thread_init_stack` frame; that means the caller might
allocate a VALUE on the stack that actually lies outside the bounds
stored in machine.stack_{start,end}.

A local variable from one level above the topmost frame that stores
VALUEs on the stack must be drilled down into the call to
`native_thread_init_stack` to be used in the calculation. This probably
doesn't _really_ matter for the win32 case (they'll be in the same
memory mapping so VirtualQuery should return the same thing), but
definitely could matter for the pthreads case.

[Bug #20001]
2024-01-12 17:29:48 +11:00
Peter Zhu 057df4379f Free environ when RUBY_FREE_AT_EXIT
The environ is malloc'd, so it gets reported as a memory leak. This
commit adds ruby_free_proctitle which frees it during shutdown when
RUBY_FREE_AT_EXIT is set.

    STACK OF 1 INSTANCE OF 'ROOT LEAK: <calloc in ruby_init_setproctitle>':
    5   dyld                                  0x18b7090e0 start + 2360
    4   ruby                                  0x10000e3a8 main + 100  main.c:58
    3   ruby                                  0x1000b4dfc ruby_options + 180  eval.c:121
    2   ruby                                  0x1001c5f70 ruby_process_options + 200  ruby.c:3014
    1   ruby                                  0x10035c9fc ruby_init_setproctitle + 76  setproctitle.c:105
    0   libsystem_malloc.dylib                0x18b8c7b78 _malloc_zone_calloc_instrumented_or_legacy + 100
2024-01-11 10:09:53 -05:00
KJ Tsanaktsidis 25f5b83689 Fix crash when printing RGENGC_DEBUG=5 output from GC
I was trying to debug an (unrelated) issue in the GC, and wanted to turn
on the trace-level GC output by compiling it with -DRGENGC_DEBUG=5.
Unfortunately, this actually causes a crash in newobj_init() because the
code there tries to log the obj_info() of the newly created object.
However, the object is not actually sufficiently set up for some of the
things that obj_info() tries to do:

* The instance variable table for a class is not yet initialized, and
  when using variable-length RVALUES, said ivar table is embedded in
  as-yet unitialized memory after the struct RValue. Attempting to read
  this, as obj_info() does, causes a crash.
* T_DATA variables need to dereference their ->type field to print out
  the underlying C type name, which is not set up until newobj_fill() is
  called.

To fix this, create a new method `obj_info_basic`, which dumps out only
the parts of the object that are valid before the object is fully
initialized.

[Fixes #18795]
2024-01-11 10:44:57 +11:00
yui-knk db476cc71c Introduce NODE_SYM to manage symbol literal
`:sym` was managed by `NODE_LIT` with `Symbol` object.
This commit introduces `NODE_SYM` so that

1. Symbol literal is detectable from AST Node
2. Reduce dependency on ruby object
2024-01-09 16:07:19 +09:00
yui-knk 7ffff3e043 Change numeric node value functions argument to `NODE *`
Change the argument to align with other node value functions
like `rb_node_line_lineno_val`.
2024-01-08 14:02:48 +09:00
S-H-GAMELINKS 1b8d01136c Introduce Numeric Node's 2024-01-07 09:24:34 +09:00
Koichi Sasada d65d2fb6b5 Do not `poll` first
Before this patch, the MN scheduler waits for the IO with the
following steps:

1. `poll(fd, timeout=0)` to check fd is ready or not.
2. if fd is not ready, waits with MN thread scheduler
3. call `func` to issue the blocking I/O call

The advantage of advanced `poll()` is we can wait for the
IO ready for any fds. However `poll()` becomes overhead
for already ready fds.

This patch changes the steps like:

1. call `func` to issue the blocking I/O call
2. if the `func` returns `EWOULDBLOCK` the fd is `O_NONBLOCK`
   and we need to wait for fd is ready so that waits with MN
   thread scheduler.

In this case, we can wait only for `O_NONBLOCK` fds. Otherwise
it waits with blocking operations such as `read()` system call.
However we don't need to call `poll()` to check fd is ready
in advance.

With this patch we can observe performance improvement
on microbenchmark which repeats blocking I/O (not
`O_NONBLOCK` fd) with and without MN thread scheduler.

```ruby
require 'benchmark'

f = open('/dev/null', 'w')
f.sync = true

TN = 1
N = 1_000_000 / TN

Benchmark.bm{|x|
  x.report{
    TN.times.map{
      Thread.new{
        N.times{f.print '.'}
      }
    }.each(&:join)
  }
}
__END__
TN = 1
                 user     system      total        real
ruby32       0.393966   0.101122   0.495088 (  0.495235)
ruby33       0.493963   0.089521   0.583484 (  0.584091)
ruby33+MN    0.639333   0.200843   0.840176 (  0.840291) <- Slow
this+MN      0.512231   0.099091   0.611322 (  0.611074) <- Good
```
2024-01-05 05:51:25 +09:00
yui-knk 7a050638b1 Introduce NODE_FILE
`__FILE__` was managed by `NODE_STR` with `String` object.
This commit introduces `NODE_FILE` and `struct rb_parser_string` so that

1. `__FILE__` is detectable from AST Node
2. Reduce dependency ruby object
2024-01-02 14:19:42 +09:00
yui-knk 1ade170a6c Introduce NODE_LINE
`__LINE__` was managed by `NODE_LIT` with `Integer` object.
This commit introduces `NODE_LINE` so that

1. `__LINE__` is detectable from AST Node
2. Reduce dependency ruby object
2023-12-29 18:32:27 +09:00
Peter Zhu 824ff48adc Move internal ST functions to internal/st.h
st_replace and st_init_existing_table_with_size are functions used
internally in Ruby and should not be publicly visible.
2023-12-25 10:41:12 -05:00
HParker 7ef90b3978 Correct free_on_exit env var to free_at_exit 2023-12-20 14:36:32 +09:00
Koichi Sasada ec51a3c818 declare `rb_thread_io_blocking_call` 2023-12-20 07:00:41 +09:00
Peter Zhu 28a6e4ea9d Set m_tbl right after allocation
We should set the m_tbl right after allocation before anything that can
trigger GC to avoid clone_p from becoming old and needing to fire write
barriers.

Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
2023-12-19 13:09:36 -08:00
Adam Hess 6816e8efcf Free everything at shutdown
when the RUBY_FREE_ON_SHUTDOWN environment variable is set, manually free memory at shutdown.

Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
Co-authored-by: Peter Zhu <peter@peterzhu.ca>
2023-12-07 15:52:35 -05:00
Peter Zhu 12e3b07455 Re-embed when removing Object instance variables
Objects with the same shape must always have the same "embeddedness"
(either embedded or heap allocated) because YJIT assumes so. However,
using remove_instance_variable, it's possible that some objects are
embedded and some are heap allocated because it does not re-embed heap
allocated objects.

This commit changes remove_instance_variable to re-embed Object
instance variables when it becomes small enough.
2023-12-06 11:34:07 -05:00
Peter Zhu 80ea7fbad8 Pin embedded shared strings
Embedded shared strings cannot be moved because strings point into the
slot of the shared string. There may be code using the RSTRING_PTR on
the stack, which would pin the string but not pin the shared string,
causing it to move.
2023-12-01 15:04:31 -05:00
Peter Zhu 269c705f93 Fix compaction for generic ivars
When generic instance variable has a shape, it is marked movable. If it
it transitions to too complex, it needs to update references otherwise
it may have incorrect references.
2023-11-24 13:29:04 -05:00
Aaron Patterson 6fce8c7980 Don't try compacting ivars on Classes that are "too complex"
Too complex classes use a hash table to store ivs, and should always pin
their IVs.  We shouldn't touch those classes in compaction.
2023-11-20 16:09:48 -08:00
Jean Boussier 94c9f16663 Refactor rb_obj_evacuate_ivs_to_hash_table
That function is a bit too low level to called from multiple
places. It's always used in tandem with `rb_shape_set_too_complex`
and both have to know how the object is laid out to update the
`iv_ptr`.

So instead we can provide two higher level function:

  - `rb_obj_copy_ivs_to_hash_table` to prepare a `st_table` from an
    arbitrary oject.
  - `rb_obj_convert_to_too_complex` to assign the new `st_table`
    to the old object, and safely free the old `iv_ptr`.

Unfortunately both can't be combined into one, because `rb_obj_copy_ivar`
need `rb_obj_copy_ivs_to_hash_table` to copy from one object
to another.
2023-11-17 09:19:21 +01:00
Jean Boussier 81b35fe729 rb_evict_ivars_to_hash: get rid of the sahpe paramater
It's only used to allocate the table with the right size,
but in some case we were passing `rb_shape_get_shape_by_id(SHAPE_OBJ_TOO_COMPLEX)`
which `next_iv_index` is a bit undefined.

So overall we're better to just allocate a table the size of the existing
object, it should be close enough in the vast majority of cases,
and that's already a de-optimizaton path anyway.
2023-11-16 17:49:59 +01:00
Jean Boussier d898e8d6f8 Refactor rb_shape_transition_shape_capa out
Right now the `rb_shape_get_next` shape caller need to
first check if there is capacity left, and if not call
`rb_shape_transition_shape_capa` before it can call `rb_shape_get_next`.

And on each of these it needs to checks if we got a TOO_COMPLEX
back.

All this logic is duplicated in the interpreter, YJIT and RJIT.

Instead we can have `rb_shape_get_next` do the capacity transition
when needed. The caller can compare the old and new shapes capacity
to know if resizing is needed. It also can check for TOO_COMPLEX
only once.
2023-11-08 11:02:55 +01:00
Nobuyoshi Nakada 4da6333615
Export functions used for builtins 2023-11-08 13:02:55 +09:00
Nobuyoshi Nakada 8becc889db
Suppress array-bounds warnings from gcc 13 2023-11-07 23:19:51 +09:00
Peter Zhu 1321df773b Use shape capacity transitions for generic ivars
This commit changes generic ivars to respect the capacity transition in
shapes rather than growing the capacity independently.
2023-11-03 10:15:32 -04:00
Aaron Patterson a3f66e09f6 geniv objects can become too complex 2023-10-24 10:52:06 -07:00
Jean Boussier e5364ea496 rb_shape_transition_shape_capa: use optimal sizes transitions
Previously the growth was 3(embed), 6, 12, 24, ...

With this change it's now 3(embed), 8, 16, 32, 64, ... by default.

However, since power of two isn't the best size for all allocators,
if `malloc_usable_size` is vailable, we use it to discover the best
offset.

On Linux/glibc 2.35 for instance, the growth will be 3(embed), 7, 15, 31
to avoid wasting 8B per object.

Test program:

```c

size_t test(size_t slots) {
    size_t allocated = slots * VALUE_SIZE;
    void *test_ptr = malloc(allocated);
    size_t wasted = malloc_usable_size(test_ptr) - allocated;
    free(test_ptr);
    fprintf(stderr, "slots = %lu, wasted_bytes = %lu\n", slots, wasted);
    return wasted;
}

int main(int argc, char *argv[]) {
    size_t best_padding = 0;
    size_t padding = 0;
    for (padding = 0; padding <= 2; padding++) {
        size_t wasted = test(8 - padding);
        if (wasted == 0) {
            best_padding = padding;
            break;
        }
    }

    size_t index = 0;
    fprintf(stderr, "=============== naive ================\n");

    size_t list_size = 4;
    for (index = 0; index < 10; index++) {
        test(list_size);
        list_size *= 2;
    }

    fprintf(stderr, "=============== auto-padded (-%lu) ================\n", best_padding);

    list_size = 4;
    for (index = 0; index < 10; index ++) {
        test(list_size - best_padding);
        list_size *= 2;
    }

    fprintf(stderr, "\n\n");
    return 0;
}
```

```
===== glibc ======
slots = 8, wasted_bytes = 8
slots = 7, wasted_bytes = 0
=============== naive ================
slots = 4, wasted_bytes = 8
slots = 8, wasted_bytes = 8
slots = 16, wasted_bytes = 8
slots = 32, wasted_bytes = 8
slots = 64, wasted_bytes = 8
slots = 128, wasted_bytes = 8
slots = 256, wasted_bytes = 8
slots = 512, wasted_bytes = 8
slots = 1024, wasted_bytes = 8
slots = 2048, wasted_bytes = 8
=============== auto-padded (-1) ================
slots = 3, wasted_bytes = 0
slots = 7, wasted_bytes = 0
slots = 15, wasted_bytes = 0
slots = 31, wasted_bytes = 0
slots = 63, wasted_bytes = 0
slots = 127, wasted_bytes = 0
slots = 255, wasted_bytes = 0
slots = 511, wasted_bytes = 0
slots = 1023, wasted_bytes = 0
slots = 2047, wasted_bytes = 0
```

```
==========  jemalloc =======
slots = 8, wasted_bytes = 0
=============== naive ================
slots = 4, wasted_bytes = 0
slots = 8, wasted_bytes = 0
slots = 16, wasted_bytes = 0
slots = 32, wasted_bytes = 0
slots = 64, wasted_bytes = 0
slots = 128, wasted_bytes = 0
slots = 256, wasted_bytes = 0
slots = 512, wasted_bytes = 0
slots = 1024, wasted_bytes = 0
slots = 2048, wasted_bytes = 0
=============== auto-padded (-0) ================
slots = 4, wasted_bytes = 0
slots = 8, wasted_bytes = 0
slots = 16, wasted_bytes = 0
slots = 32, wasted_bytes = 0
slots = 64, wasted_bytes = 0
slots = 128, wasted_bytes = 0
slots = 256, wasted_bytes = 0
slots = 512, wasted_bytes = 0
slots = 1024, wasted_bytes = 0
slots = 2048, wasted_bytes = 0
```
2023-10-23 09:33:15 +02:00
Yusuke Endoh 591336a0f2 Avoid the pointer hack in RCLASS_EXT
... because GCC 13 warns it.

```
In file included from class.c:24:
In function ‘RCLASS_SET_ALLOCATOR’,
    inlined from ‘class_alloc’ at class.c:251:5,
    inlined from ‘rb_module_s_alloc’ at class.c:1045:17:
internal/class.h:159:43: warning: array subscript 0 is outside array bounds of ‘rb_classext_t[0]’ {aka ‘struct rb_classext_struct[]’} [-Warray-bounds=]
  159 |     RCLASS_EXT(klass)->as.class.allocator = allocator;
      |                                           ^
```
https://rubyci.s3.amazonaws.com/arch/ruby-master/log/20231015T030003Z.log.html.gz
2023-10-15 15:35:45 +09:00
Nobuyoshi Nakada 5fc9810bf3 Shorten `rb_strterm_literal_t` members 2023-10-14 11:08:43 +09:00
Nobuyoshi Nakada a075c55d0c Manage `rb_strterm_t` without imemo 2023-10-14 11:08:43 +09:00
Nobuyoshi Nakada cb06b6632a Remove unions in `rb_strterm` structs for alignment 2023-10-14 11:08:43 +09:00
Koichi Sasada be1bbd5b7d M:N thread scheduler for Ractors
This patch introduce M:N thread scheduler for Ractor system.

In general, M:N thread scheduler employs N native threads (OS threads)
to manage M user-level threads (Ruby threads in this case).
On the Ruby interpreter, 1 native thread is provided for 1 Ractor
and all Ruby threads are managed by the native thread.

From Ruby 1.9, the interpreter uses 1:1 thread scheduler which means
1 Ruby thread has 1 native thread. M:N scheduler change this strategy.

Because of compatibility issue (and stableness issue of the implementation)
main Ractor doesn't use M:N scheduler on default. On the other words,
threads on the main Ractor will be managed with 1:1 thread scheduler.

There are additional settings by environment variables:

`RUBY_MN_THREADS=1` enables M:N thread scheduler on the main ractor.
Note that non-main ractors use the M:N scheduler without this
configuration. With this configuration, single ractor applications
run threads on M:1 thread scheduler (green threads, user-level threads).

`RUBY_MAX_CPU=n` specifies maximum number of native threads for
M:N scheduler (default: 8).

This patch will be reverted soon if non-easy issues are found.

[Bug #19842]
2023-10-12 14:47:01 +09:00
Nobuyoshi Nakada 54f1d398d9 Make popcount bit-masks stricter
Each bit run is upto the right shift count, so the each mask does not
need more upper bits.
2023-10-05 20:03:54 +09:00
Aaron Patterson d3574c117a Move IO#readline to Ruby
This commit moves IO#readline to Ruby.  In order to call C functions,
keyword arguments must be converted to hashes.  Prior to this commit,
code like `io.readline(chomp: true)` would allocate a hash.  This
commits moves the keyword "denaturing" to Ruby, allowing us to send
positional arguments to the C API and avoiding the hash allocation.

Here is an allocation benchmark for the method:

```
x = GC.stat(:total_allocated_objects)
File.open("/usr/share/dict/words") do |f|
  f.readline(chomp: true) until f.eof?
end
p ALLOCATIONS: GC.stat(:total_allocated_objects) - x
```

Before this commit, the output was this:

```
$ make run
./miniruby -I./lib -I. -I.ext/common  -r./arm64-darwin22-fake  ./test.rb
{:ALLOCATIONS=>707939}
```

Now it is this:

```
$ make run
./miniruby -I./lib -I. -I.ext/common  -r./arm64-darwin22-fake  ./test.rb
{:ALLOCATIONS=>471962}
```

[Bug #19890] [ruby-core:114803]
2023-09-28 10:43:45 -07:00
Nobuyoshi Nakada bab01d284c [Feature #19790] Rename BUGREPORT_PATH as CRASH_REPORT 2023-09-25 22:57:28 +09:00
Nobuyoshi Nakada 70e8a08295 Add `--bugreport-path` option
It has precedence over the environment variable `RUBY_BUGREPORT_PATH`.
2023-09-25 22:57:28 +09:00
Nobuyoshi Nakada ac244938e8 Dump backtraces to an arbitrary stream 2023-09-25 22:57:28 +09:00
Peter Zhu f43dac0df2 Add rb_hash_free for the GC to use 2023-09-24 09:07:52 -04:00
Nobuyoshi Nakada 4634405f7c
Stop exposing FrozenCore in headers
Revert commit "Directly allocate FrozenCore as an ICLASS",
813a5f4fc4.
2023-09-19 14:08:05 +09:00
Peter Zhu 12102d101a Fix crash in WeakMap during compaction
WeakMap can crash during compaction because the st_insert could allocate
memory.
2023-09-06 14:20:23 -04:00
Peter Zhu 9a8398a18f Introduce rb_gc_remove_weak
If we're during incremental marking, then Ruby code can execute that
deallocates certain memory buffers that have been called with
rb_gc_mark_weak, which can cause use-after-free bugs.
2023-09-05 14:32:15 -04:00
John Hawthorn d89b15cdce Use end of char boundary in start_with?
Previously we used the next character following the found prefix to
determine if the match ended on a broken character.

This had caused surprising behaviour when a valid character was followed
by a UTF-8 continuation byte.

This commit changes the behaviour to instead look for the end of the
last character in the prefix.

[Bug #19784]

Co-authored-by: ywenc <ywenc@github.com>
Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
2023-09-01 16:23:28 -07:00
Matt Valentine-House 322548180d Prevent rb_gc_mark_values from pinning objects
This is an internal only function not exposed to the C extension API.
It's only use so far is from rb_vm_mark, where it's used to mark the
values in the vm->trap_list.cmd array.

There shouldn't be any reason why these cannot move.

This commit allows them to move by updating their references during the
reference updating step of compaction.

To do this we've introduced another internal function
rb_gc_update_values as a partner to rb_gc_mark_values.

This allows us to refactor rb_gc_mark_values to not pin
2023-08-31 19:31:18 +01:00
Nobuyoshi Nakada 00ac3a64ba Introduce `at_char_boundary` function 2023-08-26 08:58:02 +09:00
Peter Zhu bfb395c620 Implement weak references in the GC
[Feature #19783]

This commit adds support for weak references in the GC through the
function `rb_gc_mark_weak`. Unlike strong references, weak references
does not mark the object, but rather lets the GC know that an object
refers to another one. If the child object is freed, the pointer from
the parent object is overwritten with `Qundef`.

Co-Authored-By: Jean Boussier <byroot@ruby-lang.org>
2023-08-25 09:01:21 -04:00