When generic instance variable has a shape, it is marked movable. If it
it transitions to too complex, it needs to update references otherwise
it may have incorrect references.
When evacuating generic instance variables, the instance variables exist
in both the array and the ST table. We need to ensure it has switched
to the ST table before performing any operations that can trigger GC
compaction.
The lookup in the table is using the wrong key when converting generic
instance variables to too complex, which means that it never looks up
the entry which leaks memory when the entry is overwritten.
When transitioning generic instance variable objects to too complex, we
set the shape first before performing inserting the new gen_ivtbl. The
st_insert for the new gen_ivtbl could allocate and cause a GC. If that
happens, then it will crash because the object will have a too complex
shape but not yet be backed by a st_table.
This commit changes the order so that the insert happens first before
the new shape is set.
The following script reproduces the issue:
```
o = []
o.instance_variable_set(:@a, 1)
i = 0
o = Object.new
while RubyVM::Shape.shapes_available > 0
o.instance_variable_set(:"@i#{i}", 1)
i += 1
end
ary = 1_000.times.map { [] }
GC.stress = true
ary.each do |o|
o.instance_variable_set(:@a, 1)
o.instance_variable_set(:@b, 1)
end
```
That function is a bit too low level to called from multiple
places. It's always used in tandem with `rb_shape_set_too_complex`
and both have to know how the object is laid out to update the
`iv_ptr`.
So instead we can provide two higher level function:
- `rb_obj_copy_ivs_to_hash_table` to prepare a `st_table` from an
arbitrary oject.
- `rb_obj_convert_to_too_complex` to assign the new `st_table`
to the old object, and safely free the old `iv_ptr`.
Unfortunately both can't be combined into one, because `rb_obj_copy_ivar`
need `rb_obj_copy_ivs_to_hash_table` to copy from one object
to another.
It's only used to allocate the table with the right size,
but in some case we were passing `rb_shape_get_shape_by_id(SHAPE_OBJ_TOO_COMPLEX)`
which `next_iv_index` is a bit undefined.
So overall we're better to just allocate a table the size of the existing
object, it should be close enough in the vast majority of cases,
and that's already a de-optimizaton path anyway.
Right now the `rb_shape_get_next` shape caller need to
first check if there is capacity left, and if not call
`rb_shape_transition_shape_capa` before it can call `rb_shape_get_next`.
And on each of these it needs to checks if we got a TOO_COMPLEX
back.
All this logic is duplicated in the interpreter, YJIT and RJIT.
Instead we can have `rb_shape_get_next` do the capacity transition
when needed. The caller can compare the old and new shapes capacity
to know if resizing is needed. It also can check for TOO_COMPLEX
only once.
`remove_shape_recursive` wasn't considering that if we run out of
shapes, it might have to transition to SHAPE_TOO_COMPLEX.
When this happens, we now return with an error and the caller
initiates the evacuation.
On 32-bit systems, we must store the shape ID in the gen_ivtbl to not
lose the shape. If we directly store the ST table into the generic
ivar table, then we lose the shape. This makes it impossible to
determine the shape of the object and whether it is too complex or not.
We weren't taking in to account that objects with generic IV tables
could go "too complex" in the IV set code. This commit takes that in to
account and also ensures FL_EXIVAR is set when a geniv object
transitions to "too complex"
Co-Authored-By: Jean Boussier <byroot@ruby-lang.org>