* Remove 1 allocation in Enumerable#each_with_index
Previously, each call to Enumerable#each_with_index allocates 2
objects, one for the counting index, the other an imemo_ifunc passed
to `self.each` as a block.
Use `struct vm_ifunc::data` to hold the counting index directly to
remove 1 allocation.
* [DOC] Brief summary for usages of `struct vm_ifunc`
Enumerable#all?, #any?, #one?, and #none? do not use yielded arguments
as an Array. So they can use rb_block_call2 to omit array allocatoin.
Enumerable#find does not have to immediately accept yielded arguments as
an Array. It can delay array allocation until the predicate block
returns truthy.
(TODO: Enumerable#count and #find_all seem to be optimizable as well.)
assert does not print the bug report, only the file and line number of
the assertion that failed. RUBY_ASSERT prints the full bug report, which
makes it much easier to debug.
In most of case `sort_by` works on primitive type.
Using `qsort_r` with function pointer is much slower than compare data directly.
I implement an intro sort which compare primitive data directly for `sort_by`.
We can even afford an O(n) type check before primitive data sort.
It still go faster.
Prior to this commit the `OPTIMIZED_CMP` macro relied on a method lookup
to determine whether `<=>` was overridden. The result of the lookup was
cached, but only for the duration of the specific method that
initialized the cmp_opt_data cache structure.
With this method lookup, `[x,y].max` is slower than doing `x > y ?
x : y` even though there's an optimized instruction for "new array max".
(John noticed somebody a proposed micro-optimization based on this fact
in https://github.com/mastodon/mastodon/pull/19903.)
```rb
a, b = 1, 2
Benchmark.ips do |bm|
bm.report('conditional') { a > b ? a : b }
bm.report('method') { [a, b].max }
bm.compare!
end
```
Before:
```
Comparison:
conditional: 22603733.2 i/s
method: 19820412.7 i/s - 1.14x (± 0.00) slower
```
This commit replaces the method lookup with a new CMP basic op, which
gives the examples above equivalent performance.
After:
```
Comparison:
method: 24022466.5 i/s
conditional: 23851094.2 i/s - same-ish: difference falls within
error
```
Relevant benchmarks show an improvement to Array#max and Array#min when
not using the optimized newarray_max instruction as well. They are
noticeably faster for small arrays with the relevant types, and the same
or maybe a touch faster on larger arrays.
```
$ make benchmark COMPARE_RUBY=<master@5958c305> ITEM=array_min
$ make benchmark COMPARE_RUBY=<master@5958c305> ITEM=array_max
```
The benchmarks added in this commit also look generally improved.
Co-authored-by: John Hawthorn <jhawthorn@github.com>
rb_ary_tmp_new suggests that the array is temporary in some way, but
that's not true, it just creates an array that's hidden and not on the
transient heap. This commit renames it to rb_ary_hidden_new.
Previously, this would work as expected if the enumerable contained
0 or 1 element, and would raise LocalJumpError otherwise. That
inconsistent behavior is likely to lead to bugs.
Fixes [Bug #18635]
This behavior changed in dfb47bbd17,
but only for normal exit, not for early exit. Fix it for early
exit as well.
While here, fix example code in documentation so that it doesn't
indicate that the method returns nil.