Introduce Universal Parser mode for the parser.
This commit includes these changes:
* Introduce `UNIVERSAL_PARSER` macro. All of CRuby related functions
are passed via `struct rb_parser_config_struct` when this macro is enabled.
* Add CI task with 'cppflags=-DUNIVERSAL_PARSER' for ubuntu.
Remove !USE_RVARGC code
[Feature #19579]
The Variable Width Allocation feature was turned on by default in Ruby
3.2. Since then, we haven't received bug reports or backports to the
non-Variable Width Allocation code paths, so we assume that nobody is
using it. We also don't plan on maintaining the non-Variable Width
Allocation code, so we are going to remove it.
[Feature #18885]
For now, the optimizations performed are:
- Run a major GC
- Compact the heap
- Promote all surviving objects to oldgen
Other optimizations may follow.
These classes don't belong in gc.c as they're not actually part of the
GC. This commit refactors the code by moving all the code into a
weakmap.c file.
This commit adds rb_gc_mark_and_move which takes a pointer to an object
and marks it during marking phase and updates references during compaction.
This allows for marking and reference updating to be combined into a
single function, which reduces code duplication and prevents bugs if
marking and reference updating goes out of sync.
This commit also implements rb_gc_mark_and_move on iseq as an example.
SIZE_POOL_COUNT is a GC macro, it should belong in gc.h and not shape.h.
SIZE_POOL_COUNT doesn't depend on shape.h so we can have shape.h depend
on gc.h.
Co-Authored-By: Matt Valentine-House <matt@eightbitraptor.com>
When an object becomes "too complex" (in other words it has too many
variations in the shape tree), we transition it to use a "too complex"
shape and use a hash for storing instance variables.
Without this patch, there were rare cases where shape tree growth could
"explode" and cause performance degradation on what would otherwise have
been cached fast paths.
This patch puts a limit on shape tree growth, and gracefully degrades in
the rare case where there could be a factorial growth in the shape tree.
For example:
```ruby
class NG; end
HUGE_NUMBER.times do
NG.new.instance_variable_set(:"@unique_ivar_#{_1}", 1)
end
```
We consider objects to be "too complex" when the object's class has more
than SHAPE_MAX_VARIATIONS (currently 8) leaf nodes in the shape tree and
the object introduces a new variation (a new leaf node) associated with
that class.
For example, new variations on instances of the following class would be
considered "too complex" because those instances create more than 8
leaves in the shape tree:
```ruby
class Foo; end
9.times { Foo.new.instance_variable_set(":@uniq_#{_1}", 1) }
```
However, the following class is *not* too complex because it only has
one leaf in the shape tree:
```ruby
class Foo
def initialize
@a = @b = @c = @d = @e = @f = @g = @h = @i = nil
end
end
9.times { Foo.new }
``
This case is rare, so we don't expect this change to impact performance
of most applications, but it needs to be handled.
Co-Authored-By: Aaron Patterson <tenderlove@ruby-lang.org>
This commit adds a `capacity` field to shapes, and adds shape
transitions whenever an object's capacity changes. Objects which are
allocated out of a bigger size pool will also make a transition from the
root shape to the shape with the correct capacity for their size pool
when they are allocated.
This commit will allow us to remove numiv from objects completely, and
will also mean we can guarantee that if two objects share shapes, their
IVs are in the same positions (an embedded and extended object cannot
share shapes). This will enable us to implement ivar sets in YJIT using
object shapes.
Co-Authored-By: Aaron Patterson <tenderlove@ruby-lang.org>
Having more size pools will allow us to allocate larger objects
through Variable Width Allocation.
I have attached some benchmark results below.
Discourse:
On Discourse, we don't see much change in response times. We do see
a small reduction in RSS.
Branch RSS: 377.8 MB
Master RSS: 396.3 MB
railsbench:
On railsbench, we don't see a big change in RPS or p99 performance.
We see a small increase in RSS.
Branch RPS: 815.38
Master RPS: 811.73
Branch p99: 1.69 ms
Master p99: 1.68 ms
Branch RSS: 90.6 MB
Master RSS: 89.4 MB
liquid:
We don't see a significant change in liquid performance.
Branch parse & render: 29.041 I/s
Master parse & render: 29.211 I/s
Currently, the number of incremental marking steps is calculated based
on the number of pooled pages available. This means that if we make Ruby
heap pages larger, it would run fewer incremental marking steps (which
would mean each incremental marking step takes longer).
This commit changes incremental marking to run after every
INCREMENTAL_MARK_STEP_ALLOCATIONS number of allocations. This means that
the behaviour of incremental marking remains the same regardless of the
Ruby heap page size.
I've benchmarked against discourse benchmarks and did not get a
significant change in response times beyond the margin of error. This is
expected as this new incremental marking algorithm behaves very
similarly to the previous one.
This commit adds a Ractor cache for every size pool. Previously, all VWA
allocated objects used the slowpath and locked the VM.
On a micro-benchmark that benchmarks String allocation:
VWA turned off:
29.196591 0.889709 30.086300 ( 9.434059)
VWA before this commit:
29.279486 41.477869 70.757355 ( 12.527379)
VWA after this commit:
16.782903 0.557117 17.340020 ( 4.255603)
This commits implements size classes in the GC for the Variable Width
Allocation feature. Unless `USE_RVARGC` compile flag is set, only a
single size class is created, maintaining current behaviour. See the
redmine ticket for more details.
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
This commit removes T_PAYLOAD since the new VWA implementation no longer
requires T_PAYLOAD types.
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
This commits implements size classes in the GC for the Variable Width
Allocation feature. Unless `USE_RVARGC` compile flag is set, only a
single size class is created, maintaining current behaviour. See the
redmine ticket for more details.
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
This commit removes T_PAYLOAD since the new VWA implementation no longer
requires T_PAYLOAD types.
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
When a Ractor is removed, the freelist in the Ractor cache is not
returned to the GC, leaving the freelist permanently lost. This commit
recycles the freelist when the Ractor is destroyed, preventing a memory
leak from occurring.
Previously, when an object is first initialized, ROBJECT_EMBED isn't
set. This means that for brand new objects, ROBJECT_NUMIV(obj) is 0 and
ROBJECT_IV_INDEX_TBL(obj) is NULL.
Previously, this combination meant that the inline cache would never be
initialized when setting an ivar on an object for the first time since
iv_index_tbl was NULL, and if it were it would never be used because
ROBJECT_NUMIV was 0. Both cases always fell through to the generic
rb_ivar_set which would then set the ROBJECT_EMBED flag and initialize
the ivar array.
This commit changes rb_class_allocate_instance to set the ROBJECT_EMBED
flag on the object initially and to initialize all members of the
embedded array to Qundef. This allows the inline cache to be set
correctly on first use and to be used on future uses.
This moves rb_class_allocate_instance to gc.c, so that it has access to
newobj_of. This seems appropriate given that there are other allocating
methods in this file (ex. rb_data_object_wrap, rb_imemo_new).
According to MSVC manual (*1), cl.exe can skip including a header file
when that:
- contains #pragma once, or
- starts with #ifndef, or
- starts with #if ! defined.
GCC has a similar trick (*2), but it acts more stricter (e. g. there
must be _no tokens_ outside of #ifndef...#endif).
Sun C lacked #pragma once for a looong time. Oracle Developer Studio
12.5 finally implemented it, but we cannot assume such recent version.
This changeset modifies header files so that each of them include
strictly one #ifndef...#endif. I believe this is the most portable way
to trigger compiler optimizations. [Bug #16770]
*1: https://docs.microsoft.com/en-us/cpp/preprocessor/once
*2: https://gcc.gnu.org/onlinedocs/cppinternals/Guard-Macros.html
Saves comitters' daily life by avoid #include-ing everything from
internal.h to make each file do so instead. This would significantly
speed up incremental builds.
We take the following inclusion order in this changeset:
1. "ruby/config.h", where _GNU_SOURCE is defined (must be the very
first thing among everything).
2. RUBY_EXTCONF_H if any.
3. Standard C headers, sorted alphabetically.
4. Other system headers, maybe guarded by #ifdef
5. Everything else, sorted alphabetically.
Exceptions are those win32-related headers, which tend not be self-
containing (headers have inclusion order dependencies).
Improved readability by reducing the use of macros. Also moved some
part of internal/compilers.h into this file, because it seems to be the
right place for them.