Граф коммитов

437 Коммитов

Автор SHA1 Сообщение Дата
Jean Boussier 9e9f1d9301 Precompute embedded string literals hash code
With embedded strings we often have some space left in the slot, which
we can use to store the string Hash code.

It's probably only worth it for string literals, as they are the ones
likely to be used as hash keys.

We chose to store the Hash code right after the string terminator as to
make it easy/fast to compute, and not require one more union in RString.

```
compare-ruby: ruby 3.4.0dev (2024-04-22T06:32:21Z main f77618c1fa) [arm64-darwin23]
built-ruby: ruby 3.4.0dev (2024-04-22T10:13:03Z interned-string-ha.. 8a1a32331b) [arm64-darwin23]
last_commit=Precompute embedded string literals hash code

|            |compare-ruby|built-ruby|
|:-----------|-----------:|---------:|
|symbol      |     39.275M|   39.753M|
|            |           -|     1.01x|
|dyn_symbol  |     37.348M|   37.704M|
|            |           -|     1.01x|
|small_lit   |     29.514M|   33.948M|
|            |           -|     1.15x|
|frozen_lit  |     27.180M|   33.056M|
|            |           -|     1.22x|
|iseq_lit    |     27.391M|   32.242M|
|            |           -|     1.18x|
```

Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>
2024-05-28 07:32:41 +02:00
Aaron Patterson f86fb1eda2 add allocation benchmark 2024-04-15 11:29:48 -07:00
Takashi Kokubun 70de3b170b
Optimize Hash methods with Kernel#hash (#10160) 2024-03-01 11:16:31 -08:00
Jeremy Evans f446d68ba6 Add benchmarks for super and zsuper calls of different types
These show gains from the recent optimization commits:

```
                        arg_splat
            miniruby:   7346039.9 i/s
     miniruby-before:   4692240.8 i/s - 1.57x  slower

                  arg_splat_block
            miniruby:   6539749.6 i/s
     miniruby-before:   4358063.6 i/s - 1.50x  slower

                   splat_kw_splat
            miniruby:   5433641.5 i/s
     miniruby-before:   3851048.6 i/s - 1.41x  slower

             splat_kw_splat_block
            miniruby:   4916137.1 i/s
     miniruby-before:   3477090.1 i/s - 1.41x  slower

                   splat_kw_block
            miniruby:   2912829.5 i/s
     miniruby-before:   2465611.7 i/s - 1.18x  slower

                   arg_splat_post
            miniruby:   2195208.2 i/s
     miniruby-before:   1860204.3 i/s - 1.18x  slower
```

zsuper only speeds up in the post argument case, because
it was already set to use splatarray false in cases where
there were no post arguments.
2024-03-01 07:10:25 -08:00
Alan Wu e4272fd292 Avoid allocation when passing no keywords to anonymous kwrest methods
Thanks to the new semantics from [ruby-core:115808], `**nil` is now
equivalent to `**{}`. Since the only thing one could do with anonymous
keyword rest parameter is to delegate it with `**`, nil is just as good
as an empty hash. Using nil avoids allocating an empty hash.

This is particularly important for `...` methods since they now use
`**kwrest` under the hood after 4f77d8d328. Most calls don't pass
keywords.

    Comparison:
                             fw_no_kw
                    post:   9816800.9 i/s
                     pre:   8570297.0 i/s - 1.15x  slower
2024-02-13 11:05:26 -05:00
Jeremy Evans c20e819e8b Fix crash when passing large keyword splat to method accepting keywords and keyword splat
The following code previously caused a crash:

```ruby
h = {}
1000000.times{|i| h[i.to_s.to_sym] = i}
def f(kw: 1, **kws) end
f(**h)
```

Inside a thread or fiber, the size of the keyword splat could be much smaller
and still cause a crash.

I found this issue while optimizing method calling by reducing implicit
allocations.  Given the following code:

```ruby
def f(kw: , **kws) end
kw = {kw: 1}
f(**kw)
```

The `f(**kw)` call previously allocated two hashes callee side instead of a
single hash.  This is because `setup_parameters_complex` would extract the
keywords from the keyword splat hash to the C stack, to attempt to mirror
the case when literal keywords are passed without a keyword splat.  Then,
`make_rest_kw_hash` would build a new hash based on the extracted keywords
that weren't used for literal keywords.

Switch the implementation so that if a keyword splat is passed, literal keywords
are deleted from the keyword splat hash (or a copy of the hash if the hash is
not mutable).

In addition to avoiding the crash, this new approach is much more
efficient in all cases.  With the included benchmark:

```
                                1
            miniruby:   5247879.9 i/s
     miniruby-before:   2474050.2 i/s - 2.12x  slower

                        1_mutable
            miniruby:   1797036.5 i/s
     miniruby-before:   1239543.3 i/s - 1.45x  slower

                               10
            miniruby:   1094750.1 i/s
     miniruby-before:    365529.6 i/s - 2.99x  slower

                       10_mutable
            miniruby:    407781.7 i/s
     miniruby-before:    225364.0 i/s - 1.81x  slower

                              100
            miniruby:    100992.3 i/s
     miniruby-before:     32703.6 i/s - 3.09x  slower

                      100_mutable
            miniruby:     40092.3 i/s
     miniruby-before:     21266.9 i/s - 1.89x  slower

                             1000
            miniruby:     21694.2 i/s
     miniruby-before:      4949.8 i/s - 4.38x  slower

                     1000_mutable
            miniruby:      5819.5 i/s
     miniruby-before:      2995.0 i/s - 1.94x  slower
```
2024-02-11 22:48:38 -08:00
Takashi Kokubun 76f0eec20f Fix a benchmark to avoid leaving a garbage file 2024-02-08 17:08:23 -08:00
Jeremy Evans 2217e08340
Optimize compilation of large literal arrays
To avoid stack overflow, Ruby splits compilation of large arrays
into smaller arrays, and concatenates the small arrays together.
It previously used newarray/concatarray for this, which is
inefficient.  This switches the compilation to use pushtoarray,
which is much faster. This makes almost all literal arrays only
allocate a single array.

For cases where there is a large amount of static values in the
array, Ruby will statically compile subarrays, and previously
added them using concatarray.  This switches to concattoarray,
avoiding an array allocation for the append.

Keyword splats are also supported in arrays, and ignored if the
keyword splat is empty.  Previously, this used newarraykwsplat and
concatarray.  This still uses newarraykwsplat, but switches to
concattoarray to save an allocation.  So large arrays with keyword
splats can allocate 2 arrays instead of 1.

Previously, for the following array sizes (assuming local variable
access for each element), Ruby allocated the following number of
arrays:

  1000 elements: 7 arrays
 10000 elements: 79 arrays
100000 elements: 781 arrays

With these changes, only a single array is allocated (or 2 for a
large array with a keyword splat.

Results using the included benchmark:

```
                       array_1000
            miniruby:     34770.0 i/s
   ./miniruby-before:     10511.7 i/s - 3.31x  slower

                      array_10000
            miniruby:      4938.8 i/s
   ./miniruby-before:       483.8 i/s - 10.21x  slower

                     array_100000
            miniruby:       727.2 i/s
   ./miniruby-before:         4.1 i/s - 176.98x  slower
```

Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
2024-01-27 10:16:52 -08:00
Jeremy Evans 42d891be2c Add benchmark for implicit array/hash allocation reduction changes
Benchmark results:

```
named_multi_arg_splat
after:    5344097.6 i/s
before:   3088134.0 i/s - 1.73x  slower

named_post_splat
after:    5401882.3 i/s
before:   2629321.8 i/s - 2.05x  slower

anon_arg_splat
after:   12242780.9 i/s
before:   6845413.2 i/s - 1.79x  slower

anon_arg_kw_splat
after:   11277398.7 i/s
before:   4329509.4 i/s - 2.60x  slower

anon_multi_arg_splat
after:    5132699.5 i/s
before:   3018103.7 i/s - 1.70x  slower

anon_post_splat
after:    5602915.1 i/s
before:   2645185.5 i/s - 2.12x  slower

anon_kw_splat
after:   15403727.3 i/s
before:   6249504.6 i/s - 2.46x  slower

anon_fw_to_named_splat
after:    2985715.3 i/s
before:   2049159.9 i/s - 1.46x  slower

anon_fw_to_named_no_splat
after:    2941030.4 i/s
before:   2100380.0 i/s - 1.40x  slower

fw_to_named_splat
after:    2801008.7 i/s
before:   2012416.4 i/s - 1.39x  slower

fw_to_named_no_splat
after:    2742670.4 i/s
before:   1957707.2 i/s - 1.40x  slower

fw_to_anon_to_named_splat
after:    2309246.6 i/s
before:   1375924.6 i/s - 1.68x  slower

fw_to_anon_to_named_no_splat
after:    2193227.6 i/s
before:   1351184.1 i/s - 1.62x  slower
```
2024-01-24 18:25:55 -08:00
Takashi Kokubun c84237f953
Rewrite Array#each in Ruby using Primitive (#9533) 2024-01-23 20:09:57 +00:00
Takashi Kokubun 27c1dd8634
YJIT: Allow inlining ISEQ calls with a block (#9622)
* YJIT: Allow inlining ISEQ calls with a block

* Leave a TODO comment about u16 inline_block
2024-01-23 19:36:23 +00:00
Jeremy Evans f5a01b0916 Add benchmark for recent optimization to avoid implicit allocations 2023-12-07 11:27:55 -08:00
Jeremy Evans 3081c83169 Support tracing of struct member accessor methods
This follows the same approach used for attr_reader/attr_writer in
2d98593bf5, skipping the checking for
tracing after the first call using the call cache, and clearing the
call cache when tracing is turned on/off.

Fixes [Bug #18886]
2023-12-07 10:29:33 -08:00
Jean Boussier 83c385719d Specialize String#dup
`String#+@` is 2-3 times faster than `String#dup` because it can
directly go through `rb_str_dup` instead of using the generic
much slower `rb_obj_dup`.

This fact led to the existance of the ugly `Performance/UnfreezeString`
rubocop performance rule that encourage users to rewrite the much
more readable and convenient `"foo".dup` into the ugly `(+"foo")`.

Let's make that rubocop rule useless.

```
compare-ruby: ruby 3.3.0dev (2023-11-20T02:02:55Z master 701b0650de) [arm64-darwin22]
last_commit=[ruby/prism] feat: add encoding for IBM865 (https://github.com/ruby/prism/pull/1884)
built-ruby: ruby 3.3.0dev (2023-11-20T12:51:45Z faster-str-lit-dup 6b745bbc5d) [arm64-darwin22]
warming up..

|       |compare-ruby|built-ruby|
|:------|-----------:|---------:|
|uplus  |     16.312M|   16.332M|
|       |           -|     1.00x|
|dup    |      5.912M|   16.329M|
|       |           -|     2.76x|
```
2023-11-20 14:33:20 +01:00
Jean Boussier b92b9e1e9e vm_getivar: assume the cached shape_id like have a common ancestor
When an inline cache misses, it is very likely that the stale shape_id
and the current instance shape_id have a close common ancestor.

For example if the instance variable is sometimes frozen sometimes
not, one of the two shape will be the direct parent of the other.

Another pattern that commonly cause IC misses is "memoization",
in such case the object will have a "base common shape" and then
a number of close descendants.

In addition, when we find a common ancestor, we store it in the
inline cache instead of the current shape. This help prevent the
cache from flip-flopping, ensuring the next lookup will be marginally
faster and more generally avoid writing in memory too much.

However, now that shapes have an ancestors index, we only check
for a few ancestors before falling back to use the index.

So overall this change speeds up what is assumed to be the more common
case, but makes what is assumed to be the less common case a bit slower.

```
compare-ruby: ruby 3.3.0dev (2023-10-26T05:30:17Z master 701ca070b4) [arm64-darwin22]
built-ruby: ruby 3.3.0dev (2023-10-26T09:25:09Z shapes_double_sear.. a723a85235) [arm64-darwin22]
warming up......

|                                     |compare-ruby|built-ruby|
|:------------------------------------|-----------:|---------:|
|vm_ivar_stable_shape                 |     11.672M|   11.679M|
|                                     |           -|     1.00x|
|vm_ivar_memoize_unstable_shape       |      7.551M|   10.506M|
|                                     |           -|     1.39x|
|vm_ivar_memoize_unstable_shape_miss  |     11.591M|   11.624M|
|                                     |           -|     1.00x|
|vm_ivar_unstable_undef               |      9.037M|    7.981M|
|                                     |       1.13x|         -|
|vm_ivar_divergent_shape              |      8.034M|    6.657M|
|                                     |       1.21x|         -|
|vm_ivar_divergent_shape_imbalanced   |     10.471M|    9.231M|
|                                     |       1.13x|         -|
```

Co-Authored-By: John Hawthorn <john@hawthorn.email>
2023-11-03 12:47:43 +01:00
Aaron Patterson 884c3195d9 Update benchmark/vm_ivar_ic_miss.yml
Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
2023-10-24 10:52:06 -07:00
Aaron Patterson 84e4453436 Use a functional red-black tree for indexing the shapes
This is an experimental commit that uses a functional red-black tree to
create an index of the ancestor shapes.  It uses an Okasaki style
functional red black tree:

  https://www.cs.tufts.edu/comp/150FP/archive/chris-okasaki/redblack99.pdf

This tree is advantageous because:

* It offers O(n log n) insertions and O(n log n) lookups.
* It shares memory with previous "versions" of the tree

When we insert a node in the tree, only the parts of the tree that need
to be rebalanced are newly allocated.  Parts of the tree that don't need
to be rebalanced are not reallocated, so "new trees" are able to share
memory with old trees.  This is in contrast to a sorted set where we
would have to duplicate the set, and also resort the set on each
insertion.

I've added a new stat to RubyVM.stat so we can understand how the red
black tree increases.
2023-10-24 10:52:06 -07:00
Nobuyoshi Nakada ccd18d0557
Clean up temporary file, wc.input [ci skip] 2023-10-24 12:30:10 +09:00
Kouhei Yanagita 769f53eb7e Add benchmarks for Range#reverse_each 2023-10-12 17:34:49 +09:00
Kouhei Yanagita 6ae2996e29
Optimize `Range#count` by using `range_size` if possible 2023-10-05 00:19:55 +09:00
Kouhei Yanagita 91042ec0ae Add benchmarks for Range#bsearch 2023-09-26 17:31:10 +09:00
Kouhei Yanagita 7e350f5310 Optimize Range#bsearch for beginless/endless ranges within Fixnum 2023-09-21 10:30:58 +09:00
Nobuyoshi Nakada b4213a73b8 [Feature #19839] Fix `Range#overlap?` for empty ranges
Empty ranges do not overlap with any range.

Regarding benchmarks, PR#8242 is significantly faster in some cases,
but one of these two cases is a wrong result.

|                           |ActiveSupport| PR#8242|built-ruby|
|:--------------------------|------------:|-------:|---------:|
|(2..3).overlap?(1..1)      |       7.761M| 15.053M|   32.368M|
|                           |            -|   1.94x|     4.17x|
|(2..3).overlap?(2..4)      |      25.720M| 55.070M|   21.981M|
|                           |        1.17x|   2.51x|         -|
|(2..3).overlap?(4..5)      |       7.616M| 15.048M|   21.730M|
|                           |            -|   1.98x|     2.85x|
|(2..3).overlap?(2..1)      |      25.585M| 56.545M|   32.786M|
|                           |            -|   2.21x|     1.28x|
|(2..3).overlap?(0..1)      |       7.554M| 14.755M|   32.545M|
|                           |            -|   1.95x|     4.31x|
|(2..3).overlap?(...1)      |       6.681M|  5.843M|   32.255M|
|                           |        1.14x|       -|     5.52x|
|(2...3).overlap?(..2)      |       6.676M|  5.817M|   21.572M|
|                           |        1.15x|       -|     3.71x|
|(2...3).overlap?(3...)     |       7.392M| 14.755M|   31.805M|
|                           |            -|   2.00x|     4.30x|
|(2..3).overlap?('a'..'d')  |       3.675M|  3.482M|   17.009M|
|                           |        1.06x|       -|     4.89x|
2023-09-16 17:24:21 +09:00
Kouhei Yanagita 7d08dbd015
Optimize Range#bsearch for beginless/endless ranges
On Range#bsearch for endless ranges, we try positions at `begin + 2**i` (i = 0, 1, 2, ...)
to find a point that satisfies a given condition.
Subsequently, we perform binary searching with the interval `[begin, begin + 2**n]`.

However, the interval `[begin + 2**(n-1), begin + 2**n]` is sufficient for binary search
because `begin + 2**(n-1)` does not satisfy the condition.

The same applies to beginless ranges.
2023-09-16 12:10:09 +09:00
Nobuyoshi Nakada 5e79d5a560
Make `rb_str_rindex` return byte index
Leave callers to convert byte index to char index, as well as
`rb_str_index`, so that `rb_str_rpartition` does not need to
re-convert char index to byte index.
2023-07-09 16:39:28 +09:00
Nobuyoshi Nakada ab6eb3786c
Optimize `Regexp#dup` and `Regexp.new(/RE/)`
When copying from another regexp, copy already built `regex_t` instead
of re-compiling its source.
2023-06-09 20:22:30 +09:00
nekoyama32767 87217f26f1
[Feature #19643] Direct primitive compare sort for `Array#sort_by`
In most of case `sort_by` works on primitive type.
Using `qsort_r` with function pointer is much slower than compare data directly.

I implement an intro sort which compare primitive data directly for `sort_by`.
We can even afford an O(n) type check before primitive data sort.
It still go faster.
2023-05-20 19:40:27 +09:00
Jeremy Evans a82a24ed57 Optimize method_missing calls
CALLER_ARG_SPLAT is not necessary for method_missing.  We just need
to unshift the method name into the arguments.

This optimizes all method_missing calls:

* mm(recv) ~9%
* mm(recv, *args) ~215% for args.length == 200
* mm(recv, *args, **kw) ~55% for args.length == 200
* mm(recv, **kw) ~22%
* mm(recv, kw: 1) ~100%

Note that empty argument splats do get slower with this approach,
by about 30-40%.  Other than non-empty argument splats, other
argument splats are faster, with the speedup depending on the
number of arguments.
2023-04-25 08:06:16 -07:00
Jeremy Evans 583e9d24d4 Optimize symproc calls
Similar to the bmethod/send optimization, this avoids using
CALLER_ARG_SPLAT if not necessary.  As long as the receiver argument
can be shifted off, other arguments are passed through as-is.

This optimizes the following types of calls:

* symproc.(recv) ~5%
* symproc.(recv, *args) ~65% for args.length == 200
* symproc.(recv, *args, **kw) ~45% for args.length == 200
* symproc.(recv, **kw) ~30%
* symproc.(recv, kw: 1) ~100%

Note that empty argument splats do get slower with this approach,
by about 2-3%.  This is probably because iseq argument setup is
slower for empty argument splats than CALLER_SETUP_ARG is. Other
than non-empty argument splats, other argument splats are faster,
with the speedup depending on the number of arguments.

The following types of calls are not optimized:

* symproc.(*args)
* symproc.(*args, **kw)

This is because the you cannot shift the receiver argument off
without first splatting the arg.
2023-04-25 08:06:16 -07:00
Jeremy Evans 9b4bf02aa8 Optimize send calls
Similar to the bmethod optimization, this avoids using
CALLER_ARG_SPLAT if not necessary.  As long as the method argument
can be shifted off, other arguments are passed through as-is.

This optimizes the following types of calls:

* send(meth, arg) ~5%
* send(meth, *args) ~75% for args.length == 200
* send(meth, *args, **kw) ~50% for args.length == 200
* send(meth, **kw) ~25%
* send(meth, kw: 1) ~115%

Note that empty argument splats do get slower with this approach,
by about 20%.  This is probably because iseq argument setup is
slower for empty argument splats than CALLER_SETUP_ARG is. Other
than non-empty argument splats, other argument splats are faster,
with the speedup depending on the number of arguments.

The following types of calls are not optimized:

* send(*args)
* send(*args, **kw)

This is because the you cannot shift the method argument off
without first splatting the arg.
2023-04-25 08:06:16 -07:00
Jeremy Evans af2da6419a Optimize cfunc calls for f(*a) and f(*a, **kw) if kw is empty
This optimizes the following calls:

* ~10-15% for f(*a) when a does not end with a flagged keywords hash
* ~10-15% for f(*a) when a ends with an empty flagged keywords hash
* ~35-40% for f(*a, **kw) if kw is empty

This still copies the array contents to the VM stack, but avoids some
overhead. It would be faster to use the array pointer directly,
but that could cause problems if the array was modified during
the call to the function. You could do that optimization for frozen
arrays, but as splatting frozen arrays is uncommon, and the speedup
is minimal (<5%), it doesn't seem worth it.

The vm_send_cfunc benchmark has been updated to test additional cfunc
call types, and the numbers above were taken from the benchmark results.
2023-04-25 08:06:16 -07:00
Jeremy Evans f6254f77f7 Speed up calling iseq bmethods
Currently, bmethod arguments are copied from the VM stack to the
C stack in vm_call_bmethod, then copied from the C stack to the VM
stack later in invoke_iseq_block_from_c.  This is inefficient.

This adds vm_call_iseq_bmethod and vm_call_noniseq_bmethod.
vm_call_iseq_bmethod is an optimized method that skips stack
copies (though there is one copy to remove the receiver from
the stack), and avoids calling vm_call_bmethod_body,
rb_vm_invoke_bmethod, invoke_block_from_c_proc,
invoke_iseq_block_from_c, and vm_yield_setup_args.

Th vm_call_iseq_bmethod argument handling is similar to the
way normal iseq methods are called, and allows for similar
performance optimizations when using splats or keywords.
However, even in the no argument case it's still significantly
faster.

A benchmark is added for bmethod calling.  In my environment,
it improves bmethod calling performance by 38-59% for simple
bmethod calls, and up to 180% for bmethod calls passing
literal keywords on both sides.

```

./miniruby-iseq-bmethod:  18159792.6 i/s
          ./miniruby-m:  13174419.1 i/s - 1.38x  slower

                   bmethod_simple_1
./miniruby-iseq-bmethod:  15890745.4 i/s
          ./miniruby-m:  10008972.7 i/s - 1.59x  slower

             bmethod_simple_0_splat
./miniruby-iseq-bmethod:  13142804.3 i/s
          ./miniruby-m:  11168595.2 i/s - 1.18x  slower

             bmethod_simple_1_splat
./miniruby-iseq-bmethod:  12375791.0 i/s
          ./miniruby-m:   8491140.1 i/s - 1.46x  slower

                   bmethod_no_splat
./miniruby-iseq-bmethod:  10151258.8 i/s
          ./miniruby-m:   8716664.1 i/s - 1.16x  slower

                    bmethod_0_splat
./miniruby-iseq-bmethod:   8138802.5 i/s
          ./miniruby-m:   7515600.2 i/s - 1.08x  slower

                    bmethod_1_splat
./miniruby-iseq-bmethod:   8028372.7 i/s
          ./miniruby-m:   5947658.6 i/s - 1.35x  slower

                   bmethod_10_splat
./miniruby-iseq-bmethod:   6953514.1 i/s
          ./miniruby-m:   4840132.9 i/s - 1.44x  slower

                  bmethod_100_splat
./miniruby-iseq-bmethod:   5287288.4 i/s
          ./miniruby-m:   2243218.4 i/s - 2.36x  slower

                         bmethod_kw
./miniruby-iseq-bmethod:   8931358.2 i/s
          ./miniruby-m:   3185818.6 i/s - 2.80x  slower

                      bmethod_no_kw
./miniruby-iseq-bmethod:  12281287.4 i/s
          ./miniruby-m:  10041727.9 i/s - 1.22x  slower

                   bmethod_kw_splat
./miniruby-iseq-bmethod:   5618956.8 i/s
          ./miniruby-m:   3657549.5 i/s - 1.54x  slower
```
2023-04-25 08:06:16 -07:00
Takashi Kokubun 66c4dc1592 Remove MJIT-specific benchmarks 2023-03-06 22:36:57 -08:00
John Bampton c43fbe4ebd
Fix spelling (#7405) 2023-02-28 10:05:30 -08:00
Matt Valentine-House 2605615fe6 Benchmark String interpolation across size pools 2023-01-13 10:31:35 -05:00
Takashi Kokubun 509da028c2
Rewrite Kernel#loop in Ruby (#6983)
* Rewrite Kernel#loop in Ruby

* Use enum_for(:loop) { Float::INFINITY }

Co-authored-by: Ufuk Kayserilioglu <ufuk@paralaus.com>

* Limit the scope to rescue StopIteration

Co-authored-by: Ufuk Kayserilioglu <ufuk@paralaus.com>
2022-12-25 21:46:29 -08:00
Nobuyoshi Nakada 8c272f4481 [Feature #18033] Make Time.new parse time strings
`Time.new` now parses strings such as the result of `Time#inspect`
and restricted ISO-8601 formats.
2022-12-16 22:52:59 +09:00
Daniel Colson e69b91fae4 Introduce BOP_CMP for optimized comparison
Prior to this commit the `OPTIMIZED_CMP` macro relied on a method lookup
to determine whether `<=>` was overridden. The result of the lookup was
cached, but only for the duration of the specific method that
initialized the cmp_opt_data cache structure.

With this method lookup, `[x,y].max` is slower than doing `x > y ?
x : y` even though there's an optimized instruction for "new array max".
(John noticed somebody a proposed micro-optimization based on this fact
in https://github.com/mastodon/mastodon/pull/19903.)

```rb
a, b = 1, 2
Benchmark.ips do |bm|
  bm.report('conditional') { a > b ? a : b }
  bm.report('method') { [a, b].max }
  bm.compare!
end
```

Before:

```
Comparison:
         conditional: 22603733.2 i/s
              method: 19820412.7 i/s - 1.14x  (± 0.00) slower
```

This commit replaces the method lookup with a new CMP basic op, which
gives the examples above equivalent performance.

After:

```
Comparison:
              method: 24022466.5 i/s
         conditional: 23851094.2 i/s - same-ish: difference falls within
error
```

Relevant benchmarks show an improvement to Array#max and Array#min when
not using the optimized newarray_max instruction as well. They are
noticeably faster for small arrays with the relevant types, and the same
or maybe a touch faster on larger arrays.

```
$ make benchmark COMPARE_RUBY=<master@5958c305> ITEM=array_min
$ make benchmark COMPARE_RUBY=<master@5958c305> ITEM=array_max
```

The benchmarks added in this commit also look generally improved.

Co-authored-by: John Hawthorn <jhawthorn@github.com>
2022-12-06 12:37:23 -08:00
Takashi Kokubun d15d1c01c2
Rename --mjit-min-calls to --mjit-call-threshold (#6731)
for consistency with YJIT
2022-11-14 23:38:52 -08:00
Takashi Kokubun f276d5a7fe
Improve HTML escape benchmarks 2022-11-04 23:54:25 -07:00
S.H c6f439a6a8
Improve performance some `Integer` and `Float` methods [Feature #19085] (#6638)
* Improve some Integer and Float methods

* Using alias and Remove unnecessary code

* Remove commentout code
2022-10-27 09:13:16 -07:00
Samuel Williams 025b8701c0
Add several new methods for getting and setting buffer contents. (#6434) 2022-09-26 18:06:12 +13:00
Jemma Issroff b5c459d57a Adds a benchmark to measure freezing objects 2022-09-22 10:29:43 -07:00
HParker fbaac837cf avoid extra dup and pop in compile_op_asgn2
Co-authored-by: John Hawthorn <jhawthorn@github.com>
2022-09-22 09:47:13 -07:00
Jemma Issroff aecb57ceb0
Fix style on vm_ivar benchmarks (#6379) 2022-09-15 09:39:39 +09:00
Jemma Issroff 513a11b477 Add vm_ivar get, get_unitialized, and lazy_set benchmarks 2022-09-14 13:50:47 -07:00
Jean Boussier cd1724bdde rb_str_concat_literals: use rb_str_buf_append
That's about 1.30x faster.
2022-09-08 15:02:21 +02:00
John Hawthorn 679ef34586 New constant caching insn: opt_getconstant_path
Previously YARV bytecode implemented constant caching by having a pair
of instructions, opt_getinlinecache and opt_setinlinecache, wrapping a
series of getconstant calls (with putobject providing supporting
arguments).

This commit replaces that pattern with a new instruction,
opt_getconstant_path, handling both getting/setting the inline cache and
fetching the constant on a cache miss.

This is implemented by storing the full constant path as a
null-terminated array of IDs inside of the IC structure. idNULL is used
to signal an absolute constant reference.

    $ ./miniruby --dump=insns -e '::Foo::Bar::Baz'
    == disasm: #<ISeq:<main>@-e:1 (1,0)-(1,13)> (catch: FALSE)
    0000 opt_getconstant_path                   <ic:0 ::Foo::Bar::Baz>      (   1)[Li]
    0002 leave

The motivation for this is that we had increasingly found the need to
disassemble the instructions between the opt_getinlinecache and
opt_setinlinecache in order to determine the constant we are fetching,
or otherwise store metadata.

This disassembly was done:
* In opt_setinlinecache, to register the IC against the constant names
  it is using for granular invalidation.
* In rb_iseq_free, to unregister the IC from the invalidation table.
* In YJIT to find the position of a opt_getinlinecache instruction to
  invalidate it when the cache is populated
* In YJIT to register the constant names being used for invalidation.

With this change we no longe need disassemly for these (in fact
rb_iseq_each is now unused), as the list of constant names being
referenced is held in the IC. This should also make it possible to make
more optimizations in the future.

This may also reduce the size of iseqs, as previously each segment
required 32 bytes (on 64-bit platforms) for each constant segment. This
implementation only stores one ID per-segment.

There should be no significant performance change between this and the
previous implementation. Previously opt_getinlinecache was a "leaf"
instruction, but it included a jump (almost always to a separate cache
line). Now opt_getconstant_path is a non-leaf (it may
raise/autoload/call const_missing) but it does not jump. These seem to
even out.
2022-09-01 15:20:49 -07:00
Takashi Kokubun 9f3140a42e
Remove mjit_exec benchmarks
Now that mjit_exec doesn't exist, those files feel old. I'll probably
change how I benchmark it when I add benchmarks for it again.
2022-08-21 11:35:40 -07:00
Takashi Kokubun a60507f616
Rename mjit_compile.c to mjit_compiler.c
I'm planning to introduce mjit_compiler.rb, and I want to make this
consistent with it. Consistency with compile.c doesn't seem important
for MJIT anyway.
2022-08-21 11:33:06 -07:00