Module#name shows up as a top C method callee in lobsters so probably
common enough. It's also easy to substitute thanks to rb_mod_name()
already having no GC yield points.
klass = BasicObject
50_000_000.times { klass.name }
Benchmark 1: /.rubies/post/bin/ruby --yjit mod_name.rb
Time (mean ± σ): 1.433 s ± 0.010 s [User: 1.410 s, System: 0.010 s]
Range (min … max): 1.421 s … 1.449 s 10 runs
Benchmark 2: /.rubies/mstr/bin/ruby --yjit mod_name.rb
Time (mean ± σ): 1.491 s ± 0.012 s [User: 1.468 s, System: 0.010 s]
Range (min … max): 1.470 s … 1.511 s 10 runs
Summary
/.rubies/post/bin/ruby --yjit mod_name.rb ran
1.04 ± 0.01 times faster than /.rubies/mstr/bin/ruby --yjit mod_name.rb
[Feature #20205]
Now that chilled strings no longer appear as frozen, there is no
need to offer an API to check for chilled strings.
We however need to change `rb_check_frozen_internal` to no
longer be a macro, as it needs to check for chilled strings.
This `st_table` is used to both mark and pin classes
defined from the C API. But `vm->mark_object_ary` already
does both much more efficiently.
Currently a Ruby process starts with 252 rooted classes,
which uses `7224B` in an `st_table` or `2016B` in an `RArray`.
So a baseline of 5kB saved, but since `mark_object_ary` is
preallocated with `1024` slots but only use `405` of them,
it's a net `7kB` save.
`vm->mark_object_ary` is also being refactored.
Prior to this changes, `mark_object_ary` was a regular `RArray`, but
since this allows for references to be moved, it was marked a second
time from `rb_vm_mark()` to pin these objects.
This has the detrimental effect of marking these references on every
minors even though it's a mostly append only list.
But using a custom TypedData we can save from having to mark
all the references on minor GC runs.
Addtionally, immediate values are now ignored and not appended
to `vm->mark_object_ary` as it's just wasted space.
This frees FL_USER0 on both T_MODULE and T_CLASS.
Note: prior to this, FL_SINGLETON was never set on T_MODULE,
so checking for `FL_SINGLETON` without first checking that
`FL_TYPE` was `T_CLASS` was valid. That's no longer the case.
If the autoload_data has autoload_const and the autoload_data is freed
before the autoload_const, then the autoload_data will leak.
This commit changes it so that when the autoload_data is freed, it will
clear the whole linked list of autoload_const so that the autoload_data
can be safely freed.
1000.times do |i|
str = "foo#{i}".freeze
autoload(:"B#{i}", str)
autoload(:"C#{i}", str)
end
Reports leaked memory with the macOS leaks tool:
12 ruby 0x1006398a4 rb_f_autoload + 96 load.c:1524
11 ruby 0x100639710 rb_mod_autoload + 112 load.c:1460
10 ruby 0x10080a914 rb_autoload_str + 224 variable.c:2666
9 ruby 0x1007c3308 rb_mutex_synchronize + 56 thread_sync.c:637
8 ruby 0x1005acb24 rb_ensure + 312 eval.c:1009
7 ruby 0x10080aac8 autoload_synchronized + 204 variable.c:2630
6 ruby 0x10080f8bc autoload_feature_lookup_or_create + 76 variable.c:2578
5 ruby 0x1005c29a4 rb_data_typed_object_zalloc + 232 gc.c:3186
4 ruby 0x1005c2774 ruby_xcalloc + 32 gc.c:14440
3 ruby 0x1005cddf4 ruby_xcalloc_body + 56 gc.c:12878
2 ruby 0x1005cde7c objspace_xcalloc + 124 gc.c:12871
1 ruby 0x1005c1990 calloc1 + 28 gc.c:1906
0 libsystem_malloc.dylib 0x18b2ebb78 _malloc_zone_calloc_instrumented_or_legacy + 100
when the RUBY_FREE_ON_SHUTDOWN environment variable is set, manually free memory at shutdown.
Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
Co-authored-by: Peter Zhu <peter@peterzhu.ca>
For this public API, the callback is declared to take
`(ID, VALUE, st_data_t)`, but it so happens that using
`(st_data_t, st_data_t, st_data_t)` also
type checks, because the underlying type is identical.
Use it as declared and get rid of some casts.
When generic instance variable has a shape, it is marked movable. If it
it transitions to too complex, it needs to update references otherwise
it may have incorrect references.
When evacuating generic instance variables, the instance variables exist
in both the array and the ST table. We need to ensure it has switched
to the ST table before performing any operations that can trigger GC
compaction.
The lookup in the table is using the wrong key when converting generic
instance variables to too complex, which means that it never looks up
the entry which leaks memory when the entry is overwritten.
When transitioning generic instance variable objects to too complex, we
set the shape first before performing inserting the new gen_ivtbl. The
st_insert for the new gen_ivtbl could allocate and cause a GC. If that
happens, then it will crash because the object will have a too complex
shape but not yet be backed by a st_table.
This commit changes the order so that the insert happens first before
the new shape is set.
The following script reproduces the issue:
```
o = []
o.instance_variable_set(:@a, 1)
i = 0
o = Object.new
while RubyVM::Shape.shapes_available > 0
o.instance_variable_set(:"@i#{i}", 1)
i += 1
end
ary = 1_000.times.map { [] }
GC.stress = true
ary.each do |o|
o.instance_variable_set(:@a, 1)
o.instance_variable_set(:@b, 1)
end
```
That function is a bit too low level to called from multiple
places. It's always used in tandem with `rb_shape_set_too_complex`
and both have to know how the object is laid out to update the
`iv_ptr`.
So instead we can provide two higher level function:
- `rb_obj_copy_ivs_to_hash_table` to prepare a `st_table` from an
arbitrary oject.
- `rb_obj_convert_to_too_complex` to assign the new `st_table`
to the old object, and safely free the old `iv_ptr`.
Unfortunately both can't be combined into one, because `rb_obj_copy_ivar`
need `rb_obj_copy_ivs_to_hash_table` to copy from one object
to another.
It's only used to allocate the table with the right size,
but in some case we were passing `rb_shape_get_shape_by_id(SHAPE_OBJ_TOO_COMPLEX)`
which `next_iv_index` is a bit undefined.
So overall we're better to just allocate a table the size of the existing
object, it should be close enough in the vast majority of cases,
and that's already a de-optimizaton path anyway.
Right now the `rb_shape_get_next` shape caller need to
first check if there is capacity left, and if not call
`rb_shape_transition_shape_capa` before it can call `rb_shape_get_next`.
And on each of these it needs to checks if we got a TOO_COMPLEX
back.
All this logic is duplicated in the interpreter, YJIT and RJIT.
Instead we can have `rb_shape_get_next` do the capacity transition
when needed. The caller can compare the old and new shapes capacity
to know if resizing is needed. It also can check for TOO_COMPLEX
only once.