Граф коммитов

413 Коммитов

Автор SHA1 Сообщение Дата
Nobuyoshi Nakada 5e79d5a560
Make `rb_str_rindex` return byte index
Leave callers to convert byte index to char index, as well as
`rb_str_index`, so that `rb_str_rpartition` does not need to
re-convert char index to byte index.
2023-07-09 16:39:28 +09:00
Nobuyoshi Nakada ab6eb3786c
Optimize `Regexp#dup` and `Regexp.new(/RE/)`
When copying from another regexp, copy already built `regex_t` instead
of re-compiling its source.
2023-06-09 20:22:30 +09:00
nekoyama32767 87217f26f1
[Feature #19643] Direct primitive compare sort for `Array#sort_by`
In most of case `sort_by` works on primitive type.
Using `qsort_r` with function pointer is much slower than compare data directly.

I implement an intro sort which compare primitive data directly for `sort_by`.
We can even afford an O(n) type check before primitive data sort.
It still go faster.
2023-05-20 19:40:27 +09:00
Jeremy Evans a82a24ed57 Optimize method_missing calls
CALLER_ARG_SPLAT is not necessary for method_missing.  We just need
to unshift the method name into the arguments.

This optimizes all method_missing calls:

* mm(recv) ~9%
* mm(recv, *args) ~215% for args.length == 200
* mm(recv, *args, **kw) ~55% for args.length == 200
* mm(recv, **kw) ~22%
* mm(recv, kw: 1) ~100%

Note that empty argument splats do get slower with this approach,
by about 30-40%.  Other than non-empty argument splats, other
argument splats are faster, with the speedup depending on the
number of arguments.
2023-04-25 08:06:16 -07:00
Jeremy Evans 583e9d24d4 Optimize symproc calls
Similar to the bmethod/send optimization, this avoids using
CALLER_ARG_SPLAT if not necessary.  As long as the receiver argument
can be shifted off, other arguments are passed through as-is.

This optimizes the following types of calls:

* symproc.(recv) ~5%
* symproc.(recv, *args) ~65% for args.length == 200
* symproc.(recv, *args, **kw) ~45% for args.length == 200
* symproc.(recv, **kw) ~30%
* symproc.(recv, kw: 1) ~100%

Note that empty argument splats do get slower with this approach,
by about 2-3%.  This is probably because iseq argument setup is
slower for empty argument splats than CALLER_SETUP_ARG is. Other
than non-empty argument splats, other argument splats are faster,
with the speedup depending on the number of arguments.

The following types of calls are not optimized:

* symproc.(*args)
* symproc.(*args, **kw)

This is because the you cannot shift the receiver argument off
without first splatting the arg.
2023-04-25 08:06:16 -07:00
Jeremy Evans 9b4bf02aa8 Optimize send calls
Similar to the bmethod optimization, this avoids using
CALLER_ARG_SPLAT if not necessary.  As long as the method argument
can be shifted off, other arguments are passed through as-is.

This optimizes the following types of calls:

* send(meth, arg) ~5%
* send(meth, *args) ~75% for args.length == 200
* send(meth, *args, **kw) ~50% for args.length == 200
* send(meth, **kw) ~25%
* send(meth, kw: 1) ~115%

Note that empty argument splats do get slower with this approach,
by about 20%.  This is probably because iseq argument setup is
slower for empty argument splats than CALLER_SETUP_ARG is. Other
than non-empty argument splats, other argument splats are faster,
with the speedup depending on the number of arguments.

The following types of calls are not optimized:

* send(*args)
* send(*args, **kw)

This is because the you cannot shift the method argument off
without first splatting the arg.
2023-04-25 08:06:16 -07:00
Jeremy Evans af2da6419a Optimize cfunc calls for f(*a) and f(*a, **kw) if kw is empty
This optimizes the following calls:

* ~10-15% for f(*a) when a does not end with a flagged keywords hash
* ~10-15% for f(*a) when a ends with an empty flagged keywords hash
* ~35-40% for f(*a, **kw) if kw is empty

This still copies the array contents to the VM stack, but avoids some
overhead. It would be faster to use the array pointer directly,
but that could cause problems if the array was modified during
the call to the function. You could do that optimization for frozen
arrays, but as splatting frozen arrays is uncommon, and the speedup
is minimal (<5%), it doesn't seem worth it.

The vm_send_cfunc benchmark has been updated to test additional cfunc
call types, and the numbers above were taken from the benchmark results.
2023-04-25 08:06:16 -07:00
Jeremy Evans f6254f77f7 Speed up calling iseq bmethods
Currently, bmethod arguments are copied from the VM stack to the
C stack in vm_call_bmethod, then copied from the C stack to the VM
stack later in invoke_iseq_block_from_c.  This is inefficient.

This adds vm_call_iseq_bmethod and vm_call_noniseq_bmethod.
vm_call_iseq_bmethod is an optimized method that skips stack
copies (though there is one copy to remove the receiver from
the stack), and avoids calling vm_call_bmethod_body,
rb_vm_invoke_bmethod, invoke_block_from_c_proc,
invoke_iseq_block_from_c, and vm_yield_setup_args.

Th vm_call_iseq_bmethod argument handling is similar to the
way normal iseq methods are called, and allows for similar
performance optimizations when using splats or keywords.
However, even in the no argument case it's still significantly
faster.

A benchmark is added for bmethod calling.  In my environment,
it improves bmethod calling performance by 38-59% for simple
bmethod calls, and up to 180% for bmethod calls passing
literal keywords on both sides.

```

./miniruby-iseq-bmethod:  18159792.6 i/s
          ./miniruby-m:  13174419.1 i/s - 1.38x  slower

                   bmethod_simple_1
./miniruby-iseq-bmethod:  15890745.4 i/s
          ./miniruby-m:  10008972.7 i/s - 1.59x  slower

             bmethod_simple_0_splat
./miniruby-iseq-bmethod:  13142804.3 i/s
          ./miniruby-m:  11168595.2 i/s - 1.18x  slower

             bmethod_simple_1_splat
./miniruby-iseq-bmethod:  12375791.0 i/s
          ./miniruby-m:   8491140.1 i/s - 1.46x  slower

                   bmethod_no_splat
./miniruby-iseq-bmethod:  10151258.8 i/s
          ./miniruby-m:   8716664.1 i/s - 1.16x  slower

                    bmethod_0_splat
./miniruby-iseq-bmethod:   8138802.5 i/s
          ./miniruby-m:   7515600.2 i/s - 1.08x  slower

                    bmethod_1_splat
./miniruby-iseq-bmethod:   8028372.7 i/s
          ./miniruby-m:   5947658.6 i/s - 1.35x  slower

                   bmethod_10_splat
./miniruby-iseq-bmethod:   6953514.1 i/s
          ./miniruby-m:   4840132.9 i/s - 1.44x  slower

                  bmethod_100_splat
./miniruby-iseq-bmethod:   5287288.4 i/s
          ./miniruby-m:   2243218.4 i/s - 2.36x  slower

                         bmethod_kw
./miniruby-iseq-bmethod:   8931358.2 i/s
          ./miniruby-m:   3185818.6 i/s - 2.80x  slower

                      bmethod_no_kw
./miniruby-iseq-bmethod:  12281287.4 i/s
          ./miniruby-m:  10041727.9 i/s - 1.22x  slower

                   bmethod_kw_splat
./miniruby-iseq-bmethod:   5618956.8 i/s
          ./miniruby-m:   3657549.5 i/s - 1.54x  slower
```
2023-04-25 08:06:16 -07:00
Takashi Kokubun 66c4dc1592 Remove MJIT-specific benchmarks 2023-03-06 22:36:57 -08:00
John Bampton c43fbe4ebd
Fix spelling (#7405) 2023-02-28 10:05:30 -08:00
Matt Valentine-House 2605615fe6 Benchmark String interpolation across size pools 2023-01-13 10:31:35 -05:00
Takashi Kokubun 509da028c2
Rewrite Kernel#loop in Ruby (#6983)
* Rewrite Kernel#loop in Ruby

* Use enum_for(:loop) { Float::INFINITY }

Co-authored-by: Ufuk Kayserilioglu <ufuk@paralaus.com>

* Limit the scope to rescue StopIteration

Co-authored-by: Ufuk Kayserilioglu <ufuk@paralaus.com>
2022-12-25 21:46:29 -08:00
Nobuyoshi Nakada 8c272f4481 [Feature #18033] Make Time.new parse time strings
`Time.new` now parses strings such as the result of `Time#inspect`
and restricted ISO-8601 formats.
2022-12-16 22:52:59 +09:00
Daniel Colson e69b91fae4 Introduce BOP_CMP for optimized comparison
Prior to this commit the `OPTIMIZED_CMP` macro relied on a method lookup
to determine whether `<=>` was overridden. The result of the lookup was
cached, but only for the duration of the specific method that
initialized the cmp_opt_data cache structure.

With this method lookup, `[x,y].max` is slower than doing `x > y ?
x : y` even though there's an optimized instruction for "new array max".
(John noticed somebody a proposed micro-optimization based on this fact
in https://github.com/mastodon/mastodon/pull/19903.)

```rb
a, b = 1, 2
Benchmark.ips do |bm|
  bm.report('conditional') { a > b ? a : b }
  bm.report('method') { [a, b].max }
  bm.compare!
end
```

Before:

```
Comparison:
         conditional: 22603733.2 i/s
              method: 19820412.7 i/s - 1.14x  (± 0.00) slower
```

This commit replaces the method lookup with a new CMP basic op, which
gives the examples above equivalent performance.

After:

```
Comparison:
              method: 24022466.5 i/s
         conditional: 23851094.2 i/s - same-ish: difference falls within
error
```

Relevant benchmarks show an improvement to Array#max and Array#min when
not using the optimized newarray_max instruction as well. They are
noticeably faster for small arrays with the relevant types, and the same
or maybe a touch faster on larger arrays.

```
$ make benchmark COMPARE_RUBY=<master@5958c305> ITEM=array_min
$ make benchmark COMPARE_RUBY=<master@5958c305> ITEM=array_max
```

The benchmarks added in this commit also look generally improved.

Co-authored-by: John Hawthorn <jhawthorn@github.com>
2022-12-06 12:37:23 -08:00
Takashi Kokubun d15d1c01c2
Rename --mjit-min-calls to --mjit-call-threshold (#6731)
for consistency with YJIT
2022-11-14 23:38:52 -08:00
Takashi Kokubun f276d5a7fe
Improve HTML escape benchmarks 2022-11-04 23:54:25 -07:00
S.H c6f439a6a8
Improve performance some `Integer` and `Float` methods [Feature #19085] (#6638)
* Improve some Integer and Float methods

* Using alias and Remove unnecessary code

* Remove commentout code
2022-10-27 09:13:16 -07:00
Samuel Williams 025b8701c0
Add several new methods for getting and setting buffer contents. (#6434) 2022-09-26 18:06:12 +13:00
Jemma Issroff b5c459d57a Adds a benchmark to measure freezing objects 2022-09-22 10:29:43 -07:00
HParker fbaac837cf avoid extra dup and pop in compile_op_asgn2
Co-authored-by: John Hawthorn <jhawthorn@github.com>
2022-09-22 09:47:13 -07:00
Jemma Issroff aecb57ceb0
Fix style on vm_ivar benchmarks (#6379) 2022-09-15 09:39:39 +09:00
Jemma Issroff 513a11b477 Add vm_ivar get, get_unitialized, and lazy_set benchmarks 2022-09-14 13:50:47 -07:00
Jean Boussier cd1724bdde rb_str_concat_literals: use rb_str_buf_append
That's about 1.30x faster.
2022-09-08 15:02:21 +02:00
John Hawthorn 679ef34586 New constant caching insn: opt_getconstant_path
Previously YARV bytecode implemented constant caching by having a pair
of instructions, opt_getinlinecache and opt_setinlinecache, wrapping a
series of getconstant calls (with putobject providing supporting
arguments).

This commit replaces that pattern with a new instruction,
opt_getconstant_path, handling both getting/setting the inline cache and
fetching the constant on a cache miss.

This is implemented by storing the full constant path as a
null-terminated array of IDs inside of the IC structure. idNULL is used
to signal an absolute constant reference.

    $ ./miniruby --dump=insns -e '::Foo::Bar::Baz'
    == disasm: #<ISeq:<main>@-e:1 (1,0)-(1,13)> (catch: FALSE)
    0000 opt_getconstant_path                   <ic:0 ::Foo::Bar::Baz>      (   1)[Li]
    0002 leave

The motivation for this is that we had increasingly found the need to
disassemble the instructions between the opt_getinlinecache and
opt_setinlinecache in order to determine the constant we are fetching,
or otherwise store metadata.

This disassembly was done:
* In opt_setinlinecache, to register the IC against the constant names
  it is using for granular invalidation.
* In rb_iseq_free, to unregister the IC from the invalidation table.
* In YJIT to find the position of a opt_getinlinecache instruction to
  invalidate it when the cache is populated
* In YJIT to register the constant names being used for invalidation.

With this change we no longe need disassemly for these (in fact
rb_iseq_each is now unused), as the list of constant names being
referenced is held in the IC. This should also make it possible to make
more optimizations in the future.

This may also reduce the size of iseqs, as previously each segment
required 32 bytes (on 64-bit platforms) for each constant segment. This
implementation only stores one ID per-segment.

There should be no significant performance change between this and the
previous implementation. Previously opt_getinlinecache was a "leaf"
instruction, but it included a jump (almost always to a separate cache
line). Now opt_getconstant_path is a non-leaf (it may
raise/autoload/call const_missing) but it does not jump. These seem to
even out.
2022-09-01 15:20:49 -07:00
Takashi Kokubun 9f3140a42e
Remove mjit_exec benchmarks
Now that mjit_exec doesn't exist, those files feel old. I'll probably
change how I benchmark it when I add benchmarks for it again.
2022-08-21 11:35:40 -07:00
Takashi Kokubun a60507f616
Rename mjit_compile.c to mjit_compiler.c
I'm planning to introduce mjit_compiler.rb, and I want to make this
consistent with it. Consistency with compile.c doesn't seem important
for MJIT anyway.
2022-08-21 11:33:06 -07:00
Takashi Kokubun 485019c2bd
Rename mjit_exec to jit_exec (#6262)
* Rename mjit_exec to jit_exec

* Rename mjit_exec_slowpath to mjit_check_iseq

* Remove mjit_exec references from comments
2022-08-19 23:57:17 -07:00
Takashi Kokubun fc4acf8cae
Make benchmark indentation consistent
Related to https://github.com/Shopify/yjit-bench/pull/109
2022-08-19 14:44:08 -07:00
Jemma Issroff b4539dba7a Added vm setivar benchmark from yjit-bench 2022-08-17 10:26:28 -07:00
John Hawthorn 0608a9a086
Optimize Marshal dump/load for large (> 31-bit) FIXNUM (#6229)
* Optimize Marshal dump of large fixnum

Marshal's FIXNUM type only supports 31-bit fixnums, so on 64-bit
platforms the 63-bit fixnums need to be represented in Marshal's
BIGNUM.

Previously this was done by converting to a bugnum and serializing the
bignum object.

This commit avoids allocating the intermediate bignum object, instead
outputting the T_FIXNUM directly to a Marshal bignum. This maintains the
same representation as the previous implementation, including not using
LINKs for these large fixnums (an artifact of the previous
implementation always allocating a new BIGNUM).

This commit also avoids unnecessary st_lookups on immediate values,
which we know will not be in that table.

* Fastpath for loading FIXNUM from Marshal bignum

* Run update-deps
2022-08-15 16:14:12 -07:00
Jeremy Evans 7922fd65e3 Update multiple assignment benchmarks to include non-literal array cases
This allows them to show the effect of the previous newarray/expandarray
to swap/opt_reverse optimization.  This shows an 35-83% performance
improvement in the four multiple assignment benchmarks that use this
optimization.
2022-08-09 22:19:46 -07:00
Jean Boussier 1cb77f2304 Update IO::Buffer#get_value benchmark
- The method was renamed from `get` to `get_value`
- Comparing to `String#unpack` isn't quite equivalent, `unpack1` is closer.
- Use frozen_string_literal to avoid allocating a format string every time.
- Use `N` format which is equivalent to `:U32` (`uint_32_t` big-endian).
- Disable experimental warnings to not mess up the output.
2022-08-08 15:15:33 +02:00
Jean Boussier 31a5586d1e rb_str_buf_append: add a fast path for ENC_CODERANGE_VALID
If the RHS has valid encoding, and both strings have the same
encoding, we can use the fast path.

However we need to update the LHS coderange.

```
compare-ruby: ruby 3.2.0dev (2022-07-21T14:46:32Z master cdbb9b8555) [arm64-darwin21]
built-ruby: ruby 3.2.0dev (2022-07-25T07:25:41Z string-concat-vali.. 11a2772bdd) [arm64-darwin21]
warming up...

|                    |compare-ruby|built-ruby|
|:-------------------|-----------:|---------:|
|binary_concat_7bit  |    554.816k|  556.460k|
|                    |           -|     1.00x|
|utf8_concat_7bit    |    556.367k|  555.101k|
|                    |       1.00x|         -|
|utf8_concat_UTF8    |    412.555k|  556.824k|
|                    |           -|     1.35x|
```
2022-07-25 14:18:52 +02:00
Jean Boussier f954c5dae4 string.c: use str_enc_fastpath in TERM_LEN
Not having to fetch the rb_encoding save a significant
amount of time.

Additionally, even when we have to fetch it, we can do
it faster using `ENCODING_GET` rather than `rb_enc_get`.

```
compare-ruby: ruby 3.2.0dev (2022-07-19T08:41:40Z master cb9fd920a3) [arm64-darwin21]
built-ruby: ruby 3.2.0dev (2022-07-21T11:16:16Z faster-buffer-conc.. 4f001f0748) [arm64-darwin21]
warming up...

|                      |compare-ruby|built-ruby|
|:---------------------|-----------:|---------:|
|binary_concat_utf8    |    510.580k|  565.600k|
|                      |           -|     1.11x|
|binary_concat_binary  |    512.653k|  571.483k|
|                      |           -|     1.11x|
|utf8_concat_utf8      |    511.396k|  566.879k|
|                      |           -|     1.11x|
```
2022-07-21 15:06:50 +02:00
Jean Boussier 0ae8dbbee0 rb_str_buf_append: fastpath to str_buf_cat
If the LHS is ASCII compatible and the RHS is 7BIT
we can directly concat without being concerned about
anything else.

Benchmark:
```
compare-ruby: ruby 3.2.0dev (2022-07-12T15:01:11Z master 71aec68566) [arm64-darwin21]
built-ruby: ruby 3.2.0dev (2022-07-13T10:13:53Z faster-buffer-conc.. a04c10476d) [arm64-darwin21]
warming up...

|                      |compare-ruby|built-ruby|
|:---------------------|-----------:|---------:|
|binary_append_utf8    |    385.315k|  573.663k|
|                      |           -|     1.49x|
|binary_append_binary  |    446.579k|  574.898k|
|                      |           -|     1.29x|
|utf8_append_utf8      |    430.936k|  573.394k|
|                      |           -|     1.33x|
```

Note that in the benchmark, the RHS always have a precomputed
coderange. So the benchmark never enter the slowpath of having to
scan the RHS. However it's extremly likely that we'll end
up scanning it anyway in rb_enc_cr_str_buf_cat
2022-07-19 10:41:40 +02:00
Jemma Issroff f375280d5a Add benchmarks for setting / getting ivars on generics 2022-07-15 13:39:02 -07:00
Jemma Issroff c53439294e Fixes ivar benchmarks to not depend on object allocation
Prior to this change, we were measuring object allocation as well
as setting instance variables within ivar benchmarks. With this
change, we now only measure setting instance variables within
ivar benchmarks.
2022-07-15 10:29:42 -04:00
Jean Boussier 906f7cb3e7 vm_opt_ltlt: call rb_str_buf_append directly if RHS is a String
`rb_str_concat` does a lot of type checking we can easily bypass.

```

|               |compare-ruby|built-ruby|
|:--------------|-----------:|---------:|
|string_concat  |    362.007k|  398.965k|
|               |           -|     1.10x|
```
2022-07-06 17:25:58 +02:00
Jemma Issroff af425b6d66 Added vm_ivar benchmark for initializing an embedded obj 2022-06-16 08:47:19 -07:00
Takashi Kokubun 790825db44
Update the help message on /benchmark
I wanted to point out there's --output=all.
2022-06-07 21:30:28 -07:00
Samuel Williams 216593f59b Add IO write throughput/locking overhead benchmark. 2022-05-28 15:44:18 +12:00
Kevin Newton 6068da8937 Finer-grained constant cache invalidation (take 2)
This commit reintroduces finer-grained constant cache invalidation.
After 8008fb7 got merged, it was causing issues on token-threaded
builds (such as on Windows).

The issue was that when you're iterating through instruction sequences
and using the translator functions to get back the instruction structs,
you're either using `rb_vm_insn_null_translator` or
`rb_vm_insn_addr2insn2` depending if it's a direct-threading build.
`rb_vm_insn_addr2insn2` does some normalization to always return to
you the non-trace version of whatever instruction you're looking at.
`rb_vm_insn_null_translator` does not do that normalization.

This means that when you're looping through the instructions if you're
trying to do an opcode comparison, it can change depending on the type
of threading that you're using. This can be very confusing. So, this
commit creates a new translator function
`rb_vm_insn_normalizing_translator` to always return the non-trace
version so that opcode comparisons don't have to worry about different
configurations.

[Feature #18589]
2022-04-01 14:48:22 -04:00
Nobuyoshi Nakada 69967ee64e
Revert "Finer-grained inline constant cache invalidation"
This reverts commits for [Feature #18589]:
* 8008fb7352
  "Update formatting per feedback"
* 8f6eaca2e1
  "Delete ID from constant cache table if it becomes empty on ISEQ free"
* 629908586b
  "Finer-grained inline constant cache invalidation"

MSWin builds on AppVeyor have been crashing since the merger.
2022-03-25 20:29:09 +09:00
Kevin Newton 629908586b Finer-grained inline constant cache invalidation
Current behavior - caches depend on a global counter. All constant mutations cause caches to be invalidated.

```ruby
class A
  B = 1
end

def foo
  A::B # inline cache depends on global counter
end

foo # populate inline cache
foo # hit inline cache

C = 1 # global counter increments, all caches are invalidated

foo # misses inline cache due to `C = 1`
```

Proposed behavior - caches depend on name components. Only constant mutations with corresponding names will invalidate the cache.

```ruby
class A
  B = 1
end

def foo
  A::B # inline cache depends constants named "A" and "B"
end

foo # populate inline cache
foo # hit inline cache

C = 1 # caches that depend on the name "C" are invalidated

foo # hits inline cache because IC only depends on "A" and "B"
```

Examples of breaking the new cache:

```ruby
module C
  # Breaks `foo` cache because "A" constant is set and the cache in foo depends
  # on "A" and "B"
  class A; end
end

B = 1
```

We expect the new cache scheme to be invalidated less often because names aren't frequently reused. With the cache being invalidated less, we can rely on its stability more to keep our constant references fast and reduce the need to throw away generated code in YJIT.
2022-03-24 09:14:38 -07:00
John Hawthorn b13a7c8e36 Constant time class to class ancestor lookup
Previously when checking ancestors, we would walk all the way up the
ancestry chain checking each parent for a matching class or module.

I believe this was especially unfriendly to CPU cache since for each
step we need to check two cache lines (the class and class ext).

This check is used quite often in:
* case statements
* rescue statements
* Calling protected methods
* Class#is_a?
* Module#===
* Module#<=>

I believe it's most common to check a class against a parent class, to
this commit aims to improve that (unfortunately does not help checking
for an included Module).

This is done by storing on each class the number and an array of all
parent classes, in order (BasicObject is at index 0). Using this we can
check whether a class is a subclass of another in constant time since we
know the location to expect it in the hierarchy.
2022-02-23 19:57:42 -08:00
John Hawthorn 2f71f6bb82 Speed up and avoid kwarg hash alloc in Time.now
Previously Time.now was switched to use Time.new as it added support for
the in: argument. Unfortunately because Class#new is a cfunc this
requires always allocating a Hash.

This commit switches Time.now back to using a builtin time_s_now. This
avoids the extra Hash allocation and is about 3x faster.

    $ benchmark-driver -e './ruby;3.1::~/.rubies/ruby-3.1.0/bin/ruby;3.0::~/.rubies/ruby-3.0.2/bin/ruby' benchmark/time_now.yml
    Warming up --------------------------------------
                  Time.now     6.704M i/s -      6.710M times in 1.000814s (149.16ns/i, 328clocks/i)
    Time.now(in: "+09:00")     2.003M i/s -      2.112M times in 1.054330s (499.31ns/i)
    Calculating -------------------------------------
                               ./ruby         3.1         3.0
                  Time.now     7.693M      2.763M      6.394M i/s -     20.113M times in 2.614428s 7.278710s 3.145572s
    Time.now(in: "+09:00")     2.030M      1.260M      1.617M i/s -      6.008M times in 2.960132s 4.769378s 3.716537s

    Comparison:
                               Time.now
                    ./ruby:   7693129.7 i/s
                       3.0:   6394109.2 i/s - 1.20x  slower
                       3.1:   2763282.5 i/s - 2.78x  slower

                 Time.now(in: "+09:00")
                    ./ruby:   2029757.4 i/s
                       3.0:   1616652.3 i/s - 1.26x  slower
                       3.1:   1259776.2 i/s - 1.61x  slower
2022-01-12 12:55:14 -08:00
Takashi Kokubun 1a63468831
Prepare for removing RubyVM::JIT (#5262) 2021-12-13 23:07:46 -08:00
Jeremy Evans b08dacfea3
Optimize dynamic string interpolation for symbol/true/false/nil/0-9
This provides a significant speedup for symbol, true, false,
nil, and 0-9, class/module, and a small speedup in most other cases.

Speedups (using included benchmarks):
:symbol        :: 60%
0-9            :: 50%
Class/Module   :: 50%
nil/true/false :: 20%
integer        :: 10%
[]             :: 10%
""             :: 3%

One reason this approach is faster is it reduces the number of
VM instructions for each interpolated value.

Initial idea, approach, and benchmarks from Eric Wong. I applied
the same approach against the master branch, updating it to handle
the significant internal changes since this was first proposed 4
years ago (such as CALL_INFO/CALL_CACHE -> CALL_DATA). I also
expanded it to optimize true/false/nil/0-9/class/module, and added
handling of missing methods, refined methods, and RUBY_DEBUG.

This renames the tostring insn to anytostring, and adds an
objtostring insn that implements the optimization. This requires
making a few functions non-static, and adding some non-static
functions.

This disables 4 YJIT tests.  Those tests should be reenabled after
YJIT optimizes the new objtostring insn.

Implements [Feature #13715]

Co-authored-by: Eric Wong <e@80x24.org>
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
Co-authored-by: Yusuke Endoh <mame@ruby-lang.org>
Co-authored-by: Koichi Sasada <ko1@atdot.net>
2021-11-18 15:10:20 -08:00
Takashi Kokubun 021255f1e7
Skip string allocation in benchmark/time_at.yml
and also drop a weird newline from benchmark/array_sample.yml.
2021-11-14 23:25:25 -08:00
Koichi Sasada f943264565 add benchmark/time_at.yml
```
                                ruby_2_6    ruby_2_7    ruby_3_0      master    modified
                   Time.at(0)    12.362M     11.015M      9.499M      6.615M      9.000M i/s -     32.115M times in 2.597946s 2.915517s 3.380725s 4.854651s 3.568234s
              Time.at(0, 500)     7.542M      7.136M      8.252M      5.707M      5.646M i/s -     20.713M times in 2.746279s 2.902556s 2.510166s 3.629644s 3.668854s
     Time.at(0, in: "+09:00")     1.426M      1.346M      1.565M      1.674M      1.667M i/s -      4.240M times in 2.974049s 3.149753s 2.709416s 2.533043s 2.542853s
```

```
ruby_2_6: ruby 2.6.7p150 (2020-12-09 revision 67888) [x86_64-linux]
ruby_2_7: ruby 2.7.3p140 (2020-12-09 revision 9b884df6dd) [x86_64-linux]
ruby_3_0: ruby 3.0.3p150 (2021-11-06 revision 6d540c1b98) [x86_64-linux]
master: ruby 3.1.0dev (2021-11-13T20:48:57Z master fc456adc6a) [x86_64-linux]
modified: ruby 3.1.0dev (2021-11-15T01:12:51Z mandatory_only_bui.. b0228446db) [x86_64-linux]
```
2021-11-15 15:58:56 +09:00