We don't need to remove the write-barrier protected status after
splicing an array. We can simply add it to the rememberset for marking
during the next GC.
The benchmark illustrates the performance impact on minor GC:
```
require "benchmark"
arys = 1_000_000.times.map do
ary = Array.new(50)
ary.insert(1, 3)
ary
end
4.times { GC.start }
puts(Benchmark.measure do
1000.times do
GC.start(full_mark: false)
end
end)
```
This branch:
```
1.309910 0.004342 1.314252 ( 1.314580)
```
Master branch:
```
54.376091 0.219037 54.595128 ( 54.742996)
```
This commit introduces a new instruction `opt_newarray_send` which is
used when there is an array literal followed by either the `hash`,
`min`, or `max` method.
```
[a, b, c].hash
```
Will emit an `opt_newarray_send` instruction. This instruction falls
back to a method call if the "interested" method has been monkey
patched.
Here are some examples of the instructions generated:
```
$ ./miniruby --dump=insns -e '[@a, @b].max'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,12)> (catch: FALSE)
0000 getinstancevariable :@a, <is:0> ( 1)[Li]
0003 getinstancevariable :@b, <is:1>
0006 opt_newarray_send 2, :max
0009 leave
$ ./miniruby --dump=insns -e '[@a, @b].min'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,12)> (catch: FALSE)
0000 getinstancevariable :@a, <is:0> ( 1)[Li]
0003 getinstancevariable :@b, <is:1>
0006 opt_newarray_send 2, :min
0009 leave
$ ./miniruby --dump=insns -e '[@a, @b].hash'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,13)> (catch: FALSE)
0000 getinstancevariable :@a, <is:0> ( 1)[Li]
0003 getinstancevariable :@b, <is:1>
0006 opt_newarray_send 2, :hash
0009 leave
```
[Feature #18897] [ruby-core:109147]
Co-authored-by: John Hawthorn <jhawthorn@github.com>
Remove !USE_RVARGC code
[Feature #19579]
The Variable Width Allocation feature was turned on by default in Ruby
3.2. Since then, we haven't received bug reports or backports to the
non-Variable Width Allocation code paths, so we assume that nobody is
using it. We also don't plan on maintaining the non-Variable Width
Allocation code, so we are going to remove it.
Sharing an array will cause it to be WB unprotected due to the use
of `RARRAY_PTR`. We don't need to WB unprotect the array because we're
not writing to the buffer of the array.
The following script demonstrates this issue:
```
ary = [1] * 1000
shared = ary[10..20]
puts ObjectSpace.dump(ary)
```
Freeing the memory of a Hash should be done by the garbage collector
and not by array functions. This could potentially leak memory if
ary_recycle_hash was not implemented properly.
Frozen arrays should not move from heap allocated to embedded because
frozen arrays could be shared roots for other (shared) arrays. If the
frozen array moves from heap allocated to embedded it would cause issues
since the shared array would no longer know where to set the pointer
in the shared root.
Prior to this commit the `OPTIMIZED_CMP` macro relied on a method lookup
to determine whether `<=>` was overridden. The result of the lookup was
cached, but only for the duration of the specific method that
initialized the cmp_opt_data cache structure.
With this method lookup, `[x,y].max` is slower than doing `x > y ?
x : y` even though there's an optimized instruction for "new array max".
(John noticed somebody a proposed micro-optimization based on this fact
in https://github.com/mastodon/mastodon/pull/19903.)
```rb
a, b = 1, 2
Benchmark.ips do |bm|
bm.report('conditional') { a > b ? a : b }
bm.report('method') { [a, b].max }
bm.compare!
end
```
Before:
```
Comparison:
conditional: 22603733.2 i/s
method: 19820412.7 i/s - 1.14x (± 0.00) slower
```
This commit replaces the method lookup with a new CMP basic op, which
gives the examples above equivalent performance.
After:
```
Comparison:
method: 24022466.5 i/s
conditional: 23851094.2 i/s - same-ish: difference falls within
error
```
Relevant benchmarks show an improvement to Array#max and Array#min when
not using the optimized newarray_max instruction as well. They are
noticeably faster for small arrays with the relevant types, and the same
or maybe a touch faster on larger arrays.
```
$ make benchmark COMPARE_RUBY=<master@5958c305> ITEM=array_min
$ make benchmark COMPARE_RUBY=<master@5958c305> ITEM=array_max
```
The benchmarks added in this commit also look generally improved.
Co-authored-by: John Hawthorn <jhawthorn@github.com>
If auto-compaction is enabled, then we have to set the capacity/shared
immediately after allocating a heap array. If compaction runs before
capacity/shared is set then it could cause the array to be re-embedded,
which can cause crashes.
RARRAY_PTR when called with a transient array detransients the array
before returning its pointer which allocates in the heap.
Because RARRAY_PTR was being used during compaction (when re-embedding
arrays that have moved between size pools) this introduces the
possibility that we can hit a malloc threshold, triggering GC, while in
the middle of compaction.
We should avoid this by using safer functions to get hold of the
pointer. Since we know that the array is not embedded here, we can use
ARY_HEAP_PTR and ARY_EMBED_PTR directly
* Fix Array#[] with ArithmeticSequence with negative steps
Previously, Array#[] when called with an ArithmeticSequence
with a negative step did not handle all cases correctly,
especially cases involving infinite ranges, inverted ranges,
and/or exclusive ends.
Fixes [Bug #18247]
* Add Array#slice tests for ArithmeticSequence with negative step to test_array
Add tests of rb_arithmetic_sequence_beg_len_step C-API function.
* Fix ext/-test-/arith_seq/beg_len_step/depend
* Rename local variables
* Fix a variable name
Co-authored-by: Kenta Murata <3959+mrkn@users.noreply.github.com>
rb_ary_tmp_new suggests that the array is temporary in some way, but
that's not true, it just creates an array that's hidden and not on the
transient heap. This commit renames it to rb_ary_hidden_new.
ary_discard should not be used as it should be handled by the GC. The
only user of ary_discard is rb_ary_product, which doesn't neeed to use
ary_discard.
The RARRAY_LITERAL_FLAG was added in commit
5871ecf956 to improve CoW performance for
array literals by not keeping track of reference counts.
This commit reverts that commit and has an alternate implementation that
is more generic for all frozen arrays. Since frozen arrays cannot be
modified, we don't need to set the RARRAY_SHARED_ROOT_FLAG and we don't
need to do reference counting.