Right now the `rb_shape_get_next` shape caller need to
first check if there is capacity left, and if not call
`rb_shape_transition_shape_capa` before it can call `rb_shape_get_next`.
And on each of these it needs to checks if we got a TOO_COMPLEX
back.
All this logic is duplicated in the interpreter, YJIT and RJIT.
Instead we can have `rb_shape_get_next` do the capacity transition
when needed. The caller can compare the old and new shapes capacity
to know if resizing is needed. It also can check for TOO_COMPLEX
only once.
When an inline cache misses, it is very likely that the stale shape_id
and the current instance shape_id have a close common ancestor.
For example if the instance variable is sometimes frozen sometimes
not, one of the two shape will be the direct parent of the other.
Another pattern that commonly cause IC misses is "memoization",
in such case the object will have a "base common shape" and then
a number of close descendants.
In addition, when we find a common ancestor, we store it in the
inline cache instead of the current shape. This help prevent the
cache from flip-flopping, ensuring the next lookup will be marginally
faster and more generally avoid writing in memory too much.
However, now that shapes have an ancestors index, we only check
for a few ancestors before falling back to use the index.
So overall this change speeds up what is assumed to be the more common
case, but makes what is assumed to be the less common case a bit slower.
```
compare-ruby: ruby 3.3.0dev (2023-10-26T05:30:17Z master 701ca070b4) [arm64-darwin22]
built-ruby: ruby 3.3.0dev (2023-10-26T09:25:09Z shapes_double_sear.. a723a85235) [arm64-darwin22]
warming up......
| |compare-ruby|built-ruby|
|:------------------------------------|-----------:|---------:|
|vm_ivar_stable_shape | 11.672M| 11.679M|
| | -| 1.00x|
|vm_ivar_memoize_unstable_shape | 7.551M| 10.506M|
| | -| 1.39x|
|vm_ivar_memoize_unstable_shape_miss | 11.591M| 11.624M|
| | -| 1.00x|
|vm_ivar_unstable_undef | 9.037M| 7.981M|
| | 1.13x| -|
|vm_ivar_divergent_shape | 8.034M| 6.657M|
| | 1.21x| -|
|vm_ivar_divergent_shape_imbalanced | 10.471M| 9.231M|
| | 1.13x| -|
```
Co-Authored-By: John Hawthorn <john@hawthorn.email>
`remove_shape_recursive` wasn't considering that if we run out of
shapes, it might have to transition to SHAPE_TOO_COMPLEX.
When this happens, we now return with an error and the caller
initiates the evacuation.
On 32-bit systems, we must store the shape ID in the gen_ivtbl to not
lose the shape. If we directly store the ST table into the generic
ivar table, then we lose the shape. This makes it impossible to
determine the shape of the object and whether it is too complex or not.
Since the check for MAX_SHAPE_ID was done before even checking
if the transition we're looking for even exists, as soon as the
max shape is reached, get_next_shape_internal would always return
`TOO_COMPLEX` regardless of whether the transition we're looking
for already exist or not.
In addition to entirely de-optimize all newly created objects, it
also made an assertion fail in `vm_setivar`:
```
vm_setivar:rb_shape_get_next_iv_shape(rb_shape_get_shape_by_id(source_shape_id), id) == dest_shape
```
When running tests in debug mode, we have tests that try to exhaust the
space used for shapes and the redblack cache. However, this can cause
Out of Memory issues on some machines, so this commit decreases the
cache sizes when RUBY_DEBUG is enabled
There is no longer a limit on the number of IVs you can store.
SHAPE_MAX_NUM_IVS was used to work around the IV10K problem (the well
known problem where setting 10k instance variables in a row would be too
slow). The redblack tree works well at any shape depth, even depths
greater than 80, and solves the IV10K problem.
This is an experimental commit that uses a functional red-black tree to
create an index of the ancestor shapes. It uses an Okasaki style
functional red black tree:
https://www.cs.tufts.edu/comp/150FP/archive/chris-okasaki/redblack99.pdf
This tree is advantageous because:
* It offers O(n log n) insertions and O(n log n) lookups.
* It shares memory with previous "versions" of the tree
When we insert a node in the tree, only the parts of the tree that need
to be rebalanced are newly allocated. Parts of the tree that don't need
to be rebalanced are not reallocated, so "new trees" are able to share
memory with old trees. This is in contrast to a sorted set where we
would have to duplicate the set, and also resort the set on each
insertion.
I've added a new stat to RubyVM.stat so we can understand how the red
black tree increases.
[Feature #19538]
Since that category is not enabled by default, making it a
verbose warning is redundant. Enabling performance warning should
work with the default verbosity level.
During compaction we must fix up shapes on objects who were extended but
then became embedded. `rb_shape_traverse_from_new_root` is supposed to
walk shape trees looking for a matching shape. When a shape has a
"single child" we weren't returning NULL when the edge names didn't
match.
In the case of a single outgoing edge, this patch returns NULL when the
child edge name doesn't match (similar to the case when a shape has a
hash of outgoing edges)
[Feature #19538]
This new `peformance` warning category is disabled by default.
It needs to be specifically enabled via `-W:performance` or `Warning[:performance] = true`
This patch lazily allocates id tables for shape children. If a shape
has only one single child, it tags the child with a bit. When we read
children, if the id table has the bit set, we know it's a single child.
If we need to add more children, then we create a new table and evacuate
the child to the new table.
Co-Authored-By: Matt Valentine-House <matt@eightbitraptor.com>
We can only allocate enough shapes to fit in the shape buffer.
MAX_SHAPE_ID was based on the theoretical maximum number of shapes we
could have, not on the amount of memory we can actually consume. This
commit changes the MAX_SHAPE_ID to be based on the amount of memory
we're allowed to consume.
Co-Authored-By: Jemma Issroff <jemmaissroff@gmail.com>
[Bug #19536]
When objects are moved between size pools, their frozen status is lost
in the shape. This will cause the frozen check to be bypassed when there
is an inline cache. For example, the following script should raise a
FrozenError, but doesn't on Ruby 3.2 and master.
class A
def add_ivars
@a = @b = @c = @d = 1
end
def set_a
@a = 10
end
end
a = A.new
a.add_ivars
a.freeze
b = A.new
b.add_ivars
b.set_a # Set the inline cache in set_a
GC.verify_compaction_references(expand_heap: true, toward: :empty)
a.set_a
This makes the behavior of classes and modules when there are too many instance variables match the behavior of objects with too many instance variables.
Create SHAPE_MAX_NUM_IVS (currently 50) and limit all shapes of
T_OBJECTS to that number of IVs. When a shape with a T_OBJECT has more than 50 IVs, fall back to the
obj_too_complex shape which uses hash lookup for ivs.
Note that a previous version of this commit
78fcc9847a was reverted in
88f2b94065 because it did not account for
non-T_OBJECTS