Граф коммитов

456 Коммитов

Автор SHA1 Сообщение Дата
Jeremy Evans f6254f77f7 Speed up calling iseq bmethods
Currently, bmethod arguments are copied from the VM stack to the
C stack in vm_call_bmethod, then copied from the C stack to the VM
stack later in invoke_iseq_block_from_c.  This is inefficient.

This adds vm_call_iseq_bmethod and vm_call_noniseq_bmethod.
vm_call_iseq_bmethod is an optimized method that skips stack
copies (though there is one copy to remove the receiver from
the stack), and avoids calling vm_call_bmethod_body,
rb_vm_invoke_bmethod, invoke_block_from_c_proc,
invoke_iseq_block_from_c, and vm_yield_setup_args.

Th vm_call_iseq_bmethod argument handling is similar to the
way normal iseq methods are called, and allows for similar
performance optimizations when using splats or keywords.
However, even in the no argument case it's still significantly
faster.

A benchmark is added for bmethod calling.  In my environment,
it improves bmethod calling performance by 38-59% for simple
bmethod calls, and up to 180% for bmethod calls passing
literal keywords on both sides.

```

./miniruby-iseq-bmethod:  18159792.6 i/s
          ./miniruby-m:  13174419.1 i/s - 1.38x  slower

                   bmethod_simple_1
./miniruby-iseq-bmethod:  15890745.4 i/s
          ./miniruby-m:  10008972.7 i/s - 1.59x  slower

             bmethod_simple_0_splat
./miniruby-iseq-bmethod:  13142804.3 i/s
          ./miniruby-m:  11168595.2 i/s - 1.18x  slower

             bmethod_simple_1_splat
./miniruby-iseq-bmethod:  12375791.0 i/s
          ./miniruby-m:   8491140.1 i/s - 1.46x  slower

                   bmethod_no_splat
./miniruby-iseq-bmethod:  10151258.8 i/s
          ./miniruby-m:   8716664.1 i/s - 1.16x  slower

                    bmethod_0_splat
./miniruby-iseq-bmethod:   8138802.5 i/s
          ./miniruby-m:   7515600.2 i/s - 1.08x  slower

                    bmethod_1_splat
./miniruby-iseq-bmethod:   8028372.7 i/s
          ./miniruby-m:   5947658.6 i/s - 1.35x  slower

                   bmethod_10_splat
./miniruby-iseq-bmethod:   6953514.1 i/s
          ./miniruby-m:   4840132.9 i/s - 1.44x  slower

                  bmethod_100_splat
./miniruby-iseq-bmethod:   5287288.4 i/s
          ./miniruby-m:   2243218.4 i/s - 2.36x  slower

                         bmethod_kw
./miniruby-iseq-bmethod:   8931358.2 i/s
          ./miniruby-m:   3185818.6 i/s - 2.80x  slower

                      bmethod_no_kw
./miniruby-iseq-bmethod:  12281287.4 i/s
          ./miniruby-m:  10041727.9 i/s - 1.22x  slower

                   bmethod_kw_splat
./miniruby-iseq-bmethod:   5618956.8 i/s
          ./miniruby-m:   3657549.5 i/s - 1.54x  slower
```
2023-04-25 08:06:16 -07:00
Takashi Kokubun 66c4dc1592 Remove MJIT-specific benchmarks 2023-03-06 22:36:57 -08:00
John Bampton c43fbe4ebd
Fix spelling (#7405) 2023-02-28 10:05:30 -08:00
Matt Valentine-House 2605615fe6 Benchmark String interpolation across size pools 2023-01-13 10:31:35 -05:00
Takashi Kokubun 509da028c2
Rewrite Kernel#loop in Ruby (#6983)
* Rewrite Kernel#loop in Ruby

* Use enum_for(:loop) { Float::INFINITY }

Co-authored-by: Ufuk Kayserilioglu <ufuk@paralaus.com>

* Limit the scope to rescue StopIteration

Co-authored-by: Ufuk Kayserilioglu <ufuk@paralaus.com>
2022-12-25 21:46:29 -08:00
Nobuyoshi Nakada 8c272f4481 [Feature #18033] Make Time.new parse time strings
`Time.new` now parses strings such as the result of `Time#inspect`
and restricted ISO-8601 formats.
2022-12-16 22:52:59 +09:00
Daniel Colson e69b91fae4 Introduce BOP_CMP for optimized comparison
Prior to this commit the `OPTIMIZED_CMP` macro relied on a method lookup
to determine whether `<=>` was overridden. The result of the lookup was
cached, but only for the duration of the specific method that
initialized the cmp_opt_data cache structure.

With this method lookup, `[x,y].max` is slower than doing `x > y ?
x : y` even though there's an optimized instruction for "new array max".
(John noticed somebody a proposed micro-optimization based on this fact
in https://github.com/mastodon/mastodon/pull/19903.)

```rb
a, b = 1, 2
Benchmark.ips do |bm|
  bm.report('conditional') { a > b ? a : b }
  bm.report('method') { [a, b].max }
  bm.compare!
end
```

Before:

```
Comparison:
         conditional: 22603733.2 i/s
              method: 19820412.7 i/s - 1.14x  (± 0.00) slower
```

This commit replaces the method lookup with a new CMP basic op, which
gives the examples above equivalent performance.

After:

```
Comparison:
              method: 24022466.5 i/s
         conditional: 23851094.2 i/s - same-ish: difference falls within
error
```

Relevant benchmarks show an improvement to Array#max and Array#min when
not using the optimized newarray_max instruction as well. They are
noticeably faster for small arrays with the relevant types, and the same
or maybe a touch faster on larger arrays.

```
$ make benchmark COMPARE_RUBY=<master@5958c305> ITEM=array_min
$ make benchmark COMPARE_RUBY=<master@5958c305> ITEM=array_max
```

The benchmarks added in this commit also look generally improved.

Co-authored-by: John Hawthorn <jhawthorn@github.com>
2022-12-06 12:37:23 -08:00
Takashi Kokubun d15d1c01c2
Rename --mjit-min-calls to --mjit-call-threshold (#6731)
for consistency with YJIT
2022-11-14 23:38:52 -08:00
Takashi Kokubun f276d5a7fe
Improve HTML escape benchmarks 2022-11-04 23:54:25 -07:00
S.H c6f439a6a8
Improve performance some `Integer` and `Float` methods [Feature #19085] (#6638)
* Improve some Integer and Float methods

* Using alias and Remove unnecessary code

* Remove commentout code
2022-10-27 09:13:16 -07:00
Samuel Williams 025b8701c0
Add several new methods for getting and setting buffer contents. (#6434) 2022-09-26 18:06:12 +13:00
Jemma Issroff b5c459d57a Adds a benchmark to measure freezing objects 2022-09-22 10:29:43 -07:00
HParker fbaac837cf avoid extra dup and pop in compile_op_asgn2
Co-authored-by: John Hawthorn <jhawthorn@github.com>
2022-09-22 09:47:13 -07:00
Jemma Issroff aecb57ceb0
Fix style on vm_ivar benchmarks (#6379) 2022-09-15 09:39:39 +09:00
Jemma Issroff 513a11b477 Add vm_ivar get, get_unitialized, and lazy_set benchmarks 2022-09-14 13:50:47 -07:00
Jean Boussier cd1724bdde rb_str_concat_literals: use rb_str_buf_append
That's about 1.30x faster.
2022-09-08 15:02:21 +02:00
John Hawthorn 679ef34586 New constant caching insn: opt_getconstant_path
Previously YARV bytecode implemented constant caching by having a pair
of instructions, opt_getinlinecache and opt_setinlinecache, wrapping a
series of getconstant calls (with putobject providing supporting
arguments).

This commit replaces that pattern with a new instruction,
opt_getconstant_path, handling both getting/setting the inline cache and
fetching the constant on a cache miss.

This is implemented by storing the full constant path as a
null-terminated array of IDs inside of the IC structure. idNULL is used
to signal an absolute constant reference.

    $ ./miniruby --dump=insns -e '::Foo::Bar::Baz'
    == disasm: #<ISeq:<main>@-e:1 (1,0)-(1,13)> (catch: FALSE)
    0000 opt_getconstant_path                   <ic:0 ::Foo::Bar::Baz>      (   1)[Li]
    0002 leave

The motivation for this is that we had increasingly found the need to
disassemble the instructions between the opt_getinlinecache and
opt_setinlinecache in order to determine the constant we are fetching,
or otherwise store metadata.

This disassembly was done:
* In opt_setinlinecache, to register the IC against the constant names
  it is using for granular invalidation.
* In rb_iseq_free, to unregister the IC from the invalidation table.
* In YJIT to find the position of a opt_getinlinecache instruction to
  invalidate it when the cache is populated
* In YJIT to register the constant names being used for invalidation.

With this change we no longe need disassemly for these (in fact
rb_iseq_each is now unused), as the list of constant names being
referenced is held in the IC. This should also make it possible to make
more optimizations in the future.

This may also reduce the size of iseqs, as previously each segment
required 32 bytes (on 64-bit platforms) for each constant segment. This
implementation only stores one ID per-segment.

There should be no significant performance change between this and the
previous implementation. Previously opt_getinlinecache was a "leaf"
instruction, but it included a jump (almost always to a separate cache
line). Now opt_getconstant_path is a non-leaf (it may
raise/autoload/call const_missing) but it does not jump. These seem to
even out.
2022-09-01 15:20:49 -07:00
Takashi Kokubun 9f3140a42e
Remove mjit_exec benchmarks
Now that mjit_exec doesn't exist, those files feel old. I'll probably
change how I benchmark it when I add benchmarks for it again.
2022-08-21 11:35:40 -07:00
Takashi Kokubun a60507f616
Rename mjit_compile.c to mjit_compiler.c
I'm planning to introduce mjit_compiler.rb, and I want to make this
consistent with it. Consistency with compile.c doesn't seem important
for MJIT anyway.
2022-08-21 11:33:06 -07:00
Takashi Kokubun 485019c2bd
Rename mjit_exec to jit_exec (#6262)
* Rename mjit_exec to jit_exec

* Rename mjit_exec_slowpath to mjit_check_iseq

* Remove mjit_exec references from comments
2022-08-19 23:57:17 -07:00
Takashi Kokubun fc4acf8cae
Make benchmark indentation consistent
Related to https://github.com/Shopify/yjit-bench/pull/109
2022-08-19 14:44:08 -07:00
Jemma Issroff b4539dba7a Added vm setivar benchmark from yjit-bench 2022-08-17 10:26:28 -07:00
John Hawthorn 0608a9a086
Optimize Marshal dump/load for large (> 31-bit) FIXNUM (#6229)
* Optimize Marshal dump of large fixnum

Marshal's FIXNUM type only supports 31-bit fixnums, so on 64-bit
platforms the 63-bit fixnums need to be represented in Marshal's
BIGNUM.

Previously this was done by converting to a bugnum and serializing the
bignum object.

This commit avoids allocating the intermediate bignum object, instead
outputting the T_FIXNUM directly to a Marshal bignum. This maintains the
same representation as the previous implementation, including not using
LINKs for these large fixnums (an artifact of the previous
implementation always allocating a new BIGNUM).

This commit also avoids unnecessary st_lookups on immediate values,
which we know will not be in that table.

* Fastpath for loading FIXNUM from Marshal bignum

* Run update-deps
2022-08-15 16:14:12 -07:00
Jeremy Evans 7922fd65e3 Update multiple assignment benchmarks to include non-literal array cases
This allows them to show the effect of the previous newarray/expandarray
to swap/opt_reverse optimization.  This shows an 35-83% performance
improvement in the four multiple assignment benchmarks that use this
optimization.
2022-08-09 22:19:46 -07:00
Jean Boussier 1cb77f2304 Update IO::Buffer#get_value benchmark
- The method was renamed from `get` to `get_value`
- Comparing to `String#unpack` isn't quite equivalent, `unpack1` is closer.
- Use frozen_string_literal to avoid allocating a format string every time.
- Use `N` format which is equivalent to `:U32` (`uint_32_t` big-endian).
- Disable experimental warnings to not mess up the output.
2022-08-08 15:15:33 +02:00
Jean Boussier 31a5586d1e rb_str_buf_append: add a fast path for ENC_CODERANGE_VALID
If the RHS has valid encoding, and both strings have the same
encoding, we can use the fast path.

However we need to update the LHS coderange.

```
compare-ruby: ruby 3.2.0dev (2022-07-21T14:46:32Z master cdbb9b8555) [arm64-darwin21]
built-ruby: ruby 3.2.0dev (2022-07-25T07:25:41Z string-concat-vali.. 11a2772bdd) [arm64-darwin21]
warming up...

|                    |compare-ruby|built-ruby|
|:-------------------|-----------:|---------:|
|binary_concat_7bit  |    554.816k|  556.460k|
|                    |           -|     1.00x|
|utf8_concat_7bit    |    556.367k|  555.101k|
|                    |       1.00x|         -|
|utf8_concat_UTF8    |    412.555k|  556.824k|
|                    |           -|     1.35x|
```
2022-07-25 14:18:52 +02:00
Jean Boussier f954c5dae4 string.c: use str_enc_fastpath in TERM_LEN
Not having to fetch the rb_encoding save a significant
amount of time.

Additionally, even when we have to fetch it, we can do
it faster using `ENCODING_GET` rather than `rb_enc_get`.

```
compare-ruby: ruby 3.2.0dev (2022-07-19T08:41:40Z master cb9fd920a3) [arm64-darwin21]
built-ruby: ruby 3.2.0dev (2022-07-21T11:16:16Z faster-buffer-conc.. 4f001f0748) [arm64-darwin21]
warming up...

|                      |compare-ruby|built-ruby|
|:---------------------|-----------:|---------:|
|binary_concat_utf8    |    510.580k|  565.600k|
|                      |           -|     1.11x|
|binary_concat_binary  |    512.653k|  571.483k|
|                      |           -|     1.11x|
|utf8_concat_utf8      |    511.396k|  566.879k|
|                      |           -|     1.11x|
```
2022-07-21 15:06:50 +02:00
Jean Boussier 0ae8dbbee0 rb_str_buf_append: fastpath to str_buf_cat
If the LHS is ASCII compatible and the RHS is 7BIT
we can directly concat without being concerned about
anything else.

Benchmark:
```
compare-ruby: ruby 3.2.0dev (2022-07-12T15:01:11Z master 71aec68566) [arm64-darwin21]
built-ruby: ruby 3.2.0dev (2022-07-13T10:13:53Z faster-buffer-conc.. a04c10476d) [arm64-darwin21]
warming up...

|                      |compare-ruby|built-ruby|
|:---------------------|-----------:|---------:|
|binary_append_utf8    |    385.315k|  573.663k|
|                      |           -|     1.49x|
|binary_append_binary  |    446.579k|  574.898k|
|                      |           -|     1.29x|
|utf8_append_utf8      |    430.936k|  573.394k|
|                      |           -|     1.33x|
```

Note that in the benchmark, the RHS always have a precomputed
coderange. So the benchmark never enter the slowpath of having to
scan the RHS. However it's extremly likely that we'll end
up scanning it anyway in rb_enc_cr_str_buf_cat
2022-07-19 10:41:40 +02:00
Jemma Issroff f375280d5a Add benchmarks for setting / getting ivars on generics 2022-07-15 13:39:02 -07:00
Jemma Issroff c53439294e Fixes ivar benchmarks to not depend on object allocation
Prior to this change, we were measuring object allocation as well
as setting instance variables within ivar benchmarks. With this
change, we now only measure setting instance variables within
ivar benchmarks.
2022-07-15 10:29:42 -04:00
Jean Boussier 906f7cb3e7 vm_opt_ltlt: call rb_str_buf_append directly if RHS is a String
`rb_str_concat` does a lot of type checking we can easily bypass.

```

|               |compare-ruby|built-ruby|
|:--------------|-----------:|---------:|
|string_concat  |    362.007k|  398.965k|
|               |           -|     1.10x|
```
2022-07-06 17:25:58 +02:00
Jemma Issroff af425b6d66 Added vm_ivar benchmark for initializing an embedded obj 2022-06-16 08:47:19 -07:00
Takashi Kokubun 790825db44
Update the help message on /benchmark
I wanted to point out there's --output=all.
2022-06-07 21:30:28 -07:00
Samuel Williams 216593f59b Add IO write throughput/locking overhead benchmark. 2022-05-28 15:44:18 +12:00
Kevin Newton 6068da8937 Finer-grained constant cache invalidation (take 2)
This commit reintroduces finer-grained constant cache invalidation.
After 8008fb7 got merged, it was causing issues on token-threaded
builds (such as on Windows).

The issue was that when you're iterating through instruction sequences
and using the translator functions to get back the instruction structs,
you're either using `rb_vm_insn_null_translator` or
`rb_vm_insn_addr2insn2` depending if it's a direct-threading build.
`rb_vm_insn_addr2insn2` does some normalization to always return to
you the non-trace version of whatever instruction you're looking at.
`rb_vm_insn_null_translator` does not do that normalization.

This means that when you're looping through the instructions if you're
trying to do an opcode comparison, it can change depending on the type
of threading that you're using. This can be very confusing. So, this
commit creates a new translator function
`rb_vm_insn_normalizing_translator` to always return the non-trace
version so that opcode comparisons don't have to worry about different
configurations.

[Feature #18589]
2022-04-01 14:48:22 -04:00
Nobuyoshi Nakada 69967ee64e
Revert "Finer-grained inline constant cache invalidation"
This reverts commits for [Feature #18589]:
* 8008fb7352
  "Update formatting per feedback"
* 8f6eaca2e1
  "Delete ID from constant cache table if it becomes empty on ISEQ free"
* 629908586b
  "Finer-grained inline constant cache invalidation"

MSWin builds on AppVeyor have been crashing since the merger.
2022-03-25 20:29:09 +09:00
Kevin Newton 629908586b Finer-grained inline constant cache invalidation
Current behavior - caches depend on a global counter. All constant mutations cause caches to be invalidated.

```ruby
class A
  B = 1
end

def foo
  A::B # inline cache depends on global counter
end

foo # populate inline cache
foo # hit inline cache

C = 1 # global counter increments, all caches are invalidated

foo # misses inline cache due to `C = 1`
```

Proposed behavior - caches depend on name components. Only constant mutations with corresponding names will invalidate the cache.

```ruby
class A
  B = 1
end

def foo
  A::B # inline cache depends constants named "A" and "B"
end

foo # populate inline cache
foo # hit inline cache

C = 1 # caches that depend on the name "C" are invalidated

foo # hits inline cache because IC only depends on "A" and "B"
```

Examples of breaking the new cache:

```ruby
module C
  # Breaks `foo` cache because "A" constant is set and the cache in foo depends
  # on "A" and "B"
  class A; end
end

B = 1
```

We expect the new cache scheme to be invalidated less often because names aren't frequently reused. With the cache being invalidated less, we can rely on its stability more to keep our constant references fast and reduce the need to throw away generated code in YJIT.
2022-03-24 09:14:38 -07:00
John Hawthorn b13a7c8e36 Constant time class to class ancestor lookup
Previously when checking ancestors, we would walk all the way up the
ancestry chain checking each parent for a matching class or module.

I believe this was especially unfriendly to CPU cache since for each
step we need to check two cache lines (the class and class ext).

This check is used quite often in:
* case statements
* rescue statements
* Calling protected methods
* Class#is_a?
* Module#===
* Module#<=>

I believe it's most common to check a class against a parent class, to
this commit aims to improve that (unfortunately does not help checking
for an included Module).

This is done by storing on each class the number and an array of all
parent classes, in order (BasicObject is at index 0). Using this we can
check whether a class is a subclass of another in constant time since we
know the location to expect it in the hierarchy.
2022-02-23 19:57:42 -08:00
John Hawthorn 2f71f6bb82 Speed up and avoid kwarg hash alloc in Time.now
Previously Time.now was switched to use Time.new as it added support for
the in: argument. Unfortunately because Class#new is a cfunc this
requires always allocating a Hash.

This commit switches Time.now back to using a builtin time_s_now. This
avoids the extra Hash allocation and is about 3x faster.

    $ benchmark-driver -e './ruby;3.1::~/.rubies/ruby-3.1.0/bin/ruby;3.0::~/.rubies/ruby-3.0.2/bin/ruby' benchmark/time_now.yml
    Warming up --------------------------------------
                  Time.now     6.704M i/s -      6.710M times in 1.000814s (149.16ns/i, 328clocks/i)
    Time.now(in: "+09:00")     2.003M i/s -      2.112M times in 1.054330s (499.31ns/i)
    Calculating -------------------------------------
                               ./ruby         3.1         3.0
                  Time.now     7.693M      2.763M      6.394M i/s -     20.113M times in 2.614428s 7.278710s 3.145572s
    Time.now(in: "+09:00")     2.030M      1.260M      1.617M i/s -      6.008M times in 2.960132s 4.769378s 3.716537s

    Comparison:
                               Time.now
                    ./ruby:   7693129.7 i/s
                       3.0:   6394109.2 i/s - 1.20x  slower
                       3.1:   2763282.5 i/s - 2.78x  slower

                 Time.now(in: "+09:00")
                    ./ruby:   2029757.4 i/s
                       3.0:   1616652.3 i/s - 1.26x  slower
                       3.1:   1259776.2 i/s - 1.61x  slower
2022-01-12 12:55:14 -08:00
Takashi Kokubun 1a63468831
Prepare for removing RubyVM::JIT (#5262) 2021-12-13 23:07:46 -08:00
Jeremy Evans b08dacfea3
Optimize dynamic string interpolation for symbol/true/false/nil/0-9
This provides a significant speedup for symbol, true, false,
nil, and 0-9, class/module, and a small speedup in most other cases.

Speedups (using included benchmarks):
:symbol        :: 60%
0-9            :: 50%
Class/Module   :: 50%
nil/true/false :: 20%
integer        :: 10%
[]             :: 10%
""             :: 3%

One reason this approach is faster is it reduces the number of
VM instructions for each interpolated value.

Initial idea, approach, and benchmarks from Eric Wong. I applied
the same approach against the master branch, updating it to handle
the significant internal changes since this was first proposed 4
years ago (such as CALL_INFO/CALL_CACHE -> CALL_DATA). I also
expanded it to optimize true/false/nil/0-9/class/module, and added
handling of missing methods, refined methods, and RUBY_DEBUG.

This renames the tostring insn to anytostring, and adds an
objtostring insn that implements the optimization. This requires
making a few functions non-static, and adding some non-static
functions.

This disables 4 YJIT tests.  Those tests should be reenabled after
YJIT optimizes the new objtostring insn.

Implements [Feature #13715]

Co-authored-by: Eric Wong <e@80x24.org>
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
Co-authored-by: Yusuke Endoh <mame@ruby-lang.org>
Co-authored-by: Koichi Sasada <ko1@atdot.net>
2021-11-18 15:10:20 -08:00
Takashi Kokubun 021255f1e7
Skip string allocation in benchmark/time_at.yml
and also drop a weird newline from benchmark/array_sample.yml.
2021-11-14 23:25:25 -08:00
Koichi Sasada f943264565 add benchmark/time_at.yml
```
                                ruby_2_6    ruby_2_7    ruby_3_0      master    modified
                   Time.at(0)    12.362M     11.015M      9.499M      6.615M      9.000M i/s -     32.115M times in 2.597946s 2.915517s 3.380725s 4.854651s 3.568234s
              Time.at(0, 500)     7.542M      7.136M      8.252M      5.707M      5.646M i/s -     20.713M times in 2.746279s 2.902556s 2.510166s 3.629644s 3.668854s
     Time.at(0, in: "+09:00")     1.426M      1.346M      1.565M      1.674M      1.667M i/s -      4.240M times in 2.974049s 3.149753s 2.709416s 2.533043s 2.542853s
```

```
ruby_2_6: ruby 2.6.7p150 (2020-12-09 revision 67888) [x86_64-linux]
ruby_2_7: ruby 2.7.3p140 (2020-12-09 revision 9b884df6dd) [x86_64-linux]
ruby_3_0: ruby 3.0.3p150 (2021-11-06 revision 6d540c1b98) [x86_64-linux]
master: ruby 3.1.0dev (2021-11-13T20:48:57Z master fc456adc6a) [x86_64-linux]
modified: ruby 3.1.0dev (2021-11-15T01:12:51Z mandatory_only_bui.. b0228446db) [x86_64-linux]
```
2021-11-15 15:58:56 +09:00
Koichi Sasada dde010c974 add benchmark/array_sample.yml
```
                      ruby_2_6    ruby_2_7    ruby_3_0      master    modified
         ary.sample    32.113M     30.146M     11.162M     10.539M     26.620M i/s -     64.882M times in 2.020428s 2.152296s 5.812981s 6.156398s 2.437325s
      ary.sample(2)     9.420M      8.987M      7.500M      6.973M      7.191M i/s -     25.170M times in 2.672085s 2.800616s 3.355896s 3.609534s 3.500108s
```

```
ruby_2_6: ruby 2.6.7p150 (2020-12-09 revision 67888) [x86_64-linux]
ruby_2_7: ruby 2.7.3p140 (2020-12-09 revision 9b884df6dd) [x86_64-linux]
ruby_3_0: ruby 3.0.3p150 (2021-11-06 revision 6d540c1b98) [x86_64-linux]
master: ruby 3.1.0dev (2021-11-13T20:48:57Z master fc456adc6a) [x86_64-linux]
modified: ruby 3.1.0dev (2021-11-15T01:12:51Z mandatory_only_bui.. b0228446db) [x86_64-linux]
```
2021-11-15 15:58:56 +09:00
Samuel Williams 4b89034218 IO::Buffer for scheduler interface. 2021-11-10 19:21:05 +13:00
Koichi Sasada a7776077be add vm_ivar_of_class_set
benchmark for a class's ivar setter
2021-10-23 01:32:55 +09:00
Koichi Sasada acb23454e5 allow to access ivars of classes/modules
if an ivar of a class/module refer to a shareable object, this ivar
can be read from non-main Ractors.
2021-10-23 01:32:55 +09:00
John Hawthorn bb488a1a7f Use faster any_hash logic in rb_hash
From the documentation of rb_obj_hash:

> Certain core classes such as Integer use built-in hash calculations and
> do not call the #hash method when used as a hash key.

So if you override, say, Integer#hash it won't be used from rb_hash_aref
and similar. This avoids method lookups in many common cases.

This commit uses the same optimization in rb_hash, a method used
internally and in the C API to get the hash value of an object. Usually
this is used to build the hash of an object based on its elements.
Previously it would always do a method lookup for 'hash'.

This is primarily intended to speed up hashing of Arrays and Hashes,
which call rb_hash for each element.

    compare-ruby: ruby 3.0.1p64 (2021-04-05 revision 0fb782ee38) [x86_64-linux]
    built-ruby: ruby 3.1.0dev (2021-09-29T02:13:24Z fast_hash d670bf88b2) [x86_64-linux]
    # Iteration per second (i/s)

    |                 |compare-ruby|built-ruby|
    |:----------------|-----------:|---------:|
    |hash_aref_array  |       1.008|     1.769|
    |                 |           -|     1.76x|
2021-09-30 13:06:53 -07:00
Nobuyoshi Nakada 11fd3fec53
Add benchmarks to create Time instances 2021-09-12 18:44:53 +09:00
Jeremy Evans 2d98593bf5 Support tracing of attr_reader and attr_writer
In vm_call_method_each_type, check for c_call and c_return events before
dispatching to vm_call_ivar and vm_call_attrset.  With this approach, the
call cache will still dispatch directly to those functions, so this
change will only decrease performance for the first (uncached) call, and
even then, the performance decrease is very minimal.

This approach requires that we clear the call caches when tracing is
enabled or disabled.  The approach currently switches all vm_call_ivar
and vm_call_attrset call caches to vm_call_general any time tracing is
enabled or disabled. So it could theoretically result in a slowdown for
code that constantly enables or disables tracing.

This approach does not handle targeted tracepoints, but from my testing,
c_call and c_return events are not supported for targeted tracepoints,
so that shouldn't matter.

This includes a benchmark showing the performance decrease is minimal
if detectable at all.

Fixes [Bug #16383]
Fixes [Bug #10470]

Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
2021-08-29 07:23:39 -07:00
Nobuyoshi Nakada 9eae8cdefb
Prefer qualified names under Thread 2021-06-29 11:41:10 +09:00
eileencodes b91b3bc771 Add a cache for class variables
Redo of 34a2acdac788602c14bf05fb616215187badd504 and
931138b00696419945dc03e10f033b1f53cd50f3 which were reverted.

GitHub PR #4340.

This change implements a cache for class variables. Previously there was
no cache for cvars. Cvar access is slow due to needing to travel all the
way up th ancestor tree before returning the cvar value. The deeper the
ancestor tree the slower cvar access will be.

The benefits of the cache are more visible with a higher number of
included modules due to the way Ruby looks up class variables. The
benchmark here includes 26 modules and shows with the cache, this branch
is 6.5x faster when accessing class variables.

```
compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105c) [x86_64-darwin19]
built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be009) [x86_64-darwin19]

|         |compare-ruby|built-ruby|
|:--------|-----------:|---------:|
|vm_cvar  |      5.681M|   36.980M|
|         |           -|     6.51x|
```

Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails
application. ActiveRecord::Base.logger has 71 ancestors. The more
ancestors a tree has, the more clear the speed increase. IE if Base had
only one ancestor we'd see no improvement. This benchmark is run on a
vanilla Rails application.

Benchmark code:

```ruby
require "benchmark/ips"
require_relative "config/environment"

Benchmark.ips do |x|
  x.report "logger" do
    ActiveRecord::Base.logger
  end
end
```

Ruby 3.0 master / Rails 6.1:

```
Warming up --------------------------------------
              logger   155.251k i/100ms
Calculating -------------------------------------
```

Ruby 3.0 with cvar cache /  Rails 6.1:

```
Warming up --------------------------------------
              logger     1.546M i/100ms
Calculating -------------------------------------
              logger     14.857M (± 4.8%) i/s -     74.198M in   5.006202s
```

Lastly we ran a benchmark to demonstate the difference between master
and our cache when the number of modules increases. This benchmark
measures 1 ancestor, 30 ancestors, and 100 ancestors.

Ruby 3.0 master:

```
Warming up --------------------------------------
            1 module     1.231M i/100ms
          30 modules   432.020k i/100ms
         100 modules   145.399k i/100ms
Calculating -------------------------------------
            1 module     12.210M (± 2.1%) i/s -     61.553M in   5.043400s
          30 modules      4.354M (± 2.7%) i/s -     22.033M in   5.063839s
         100 modules      1.434M (± 2.9%) i/s -      7.270M in   5.072531s

Comparison:
            1 module: 12209958.3 i/s
          30 modules:  4354217.8 i/s - 2.80x  (± 0.00) slower
         100 modules:  1434447.3 i/s - 8.51x  (± 0.00) slower
```

Ruby 3.0 with cvar cache:

```
Warming up --------------------------------------
            1 module     1.641M i/100ms
          30 modules     1.655M i/100ms
         100 modules     1.620M i/100ms
Calculating -------------------------------------
            1 module     16.279M (± 3.8%) i/s -     82.038M in   5.046923s
          30 modules     15.891M (± 3.9%) i/s -     79.459M in   5.007958s
         100 modules     16.087M (± 3.6%) i/s -     81.005M in   5.041931s

Comparison:
            1 module: 16279458.0 i/s
         100 modules: 16087484.6 i/s - same-ish: difference falls within error
          30 modules: 15891406.2 i/s - same-ish: difference falls within error
```

Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
2021-06-18 10:02:44 -07:00
S.H 3208a5df2d
Improve perfomance for Integer#size method [Feature #17135] (#3476)
* Improve perfomance for Integer#size method [Feature #17135]

* re-run ci

* Let MJIT frame skip work for Integer#size

Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
2021-06-04 21:57:21 -07:00
S.H 28b481938b
Implemented some NilClass method in Ruby code is faster [Feature #17054] (#3366) 2021-06-02 20:04:56 -07:00
Alan Wu 5ada23ac12 compile.c: Emit send for === calls in when statements
The checkmatch instruction with VM_CHECKMATCH_TYPE_CASE calls
=== without a call cache. Emit a send instruction to make the call
instead. It includes a call cache.

The call cache improves throughput of using when statements to check the
class of a given object. This is useful for say, JSON serialization.

Use of a regular send instead of checkmatch also avoids taking the VM
lock every time, which is good for multi-ractor workloads.

    Calculating -------------------------------------
                             master        post
         vm_case_classes    11.013M     16.172M i/s -      6.000M times in 0.544795s 0.371009s
             vm_case_lit      2.296       2.263 i/s -       1.000 times in 0.435606s 0.441826s
                 vm_case    74.098M     64.338M i/s -      6.000M times in 0.080974s 0.093257s

    Comparison:
                      vm_case_classes
                    post:  16172114.4 i/s
                  master:  11013316.9 i/s - 1.47x  slower

                          vm_case_lit
                  master:         2.3 i/s
                    post:         2.3 i/s - 1.01x  slower

                              vm_case
                  master:  74097858.6 i/s
                    post:  64338333.9 i/s - 1.15x  slower

The vm_case benchmark is a bit slower post patch, possibily due to the
larger instruction sequence. The benchmark dispatches using
opt_case_dispatch so was not running checkmatch and does not make the
=== call post patch.
2021-05-28 12:34:03 -04:00
Aaron Patterson 07f055bb13
Revert "Filling cache values on cvar write"
This reverts commit 08de37f9fa.
This reverts commit e8ae922b62.
2021-05-11 13:31:00 -07:00
eileencodes e8ae922b62 Add a cache for class variables
This change implements a cache for class variables. Previously there was
no cache for cvars. Cvar access is slow due to needing to travel all the
way up th ancestor tree before returning the cvar value. The deeper the
ancestor tree the slower cvar access will be.

The benefits of the cache are more visible with a higher number of
included modules due to the way Ruby looks up class variables. The
benchmark here includes 26 modules and shows with the cache, this branch
is 6.5x faster when accessing class variables.

```
compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105ca45) [x86_64-darwin19]
built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be0093ae) [x86_64-darwin19]

|         |compare-ruby|built-ruby|
|:--------|-----------:|---------:|
|vm_cvar  |      5.681M|   36.980M|
|         |           -|     6.51x|
```

Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails
application. ActiveRecord::Base.logger has 71 ancestors. The more
ancestors a tree has, the more clear the speed increase. IE if Base had
only one ancestor we'd see no improvement. This benchmark is run on a
vanilla Rails application.

Benchmark code:

```ruby
require "benchmark/ips"
require_relative "config/environment"

Benchmark.ips do |x|
  x.report "logger" do
    ActiveRecord::Base.logger
  end
end
```

Ruby 3.0 master / Rails 6.1:

```
Warming up --------------------------------------
              logger   155.251k i/100ms
Calculating -------------------------------------
```

Ruby 3.0 with cvar cache /  Rails 6.1:

```
Warming up --------------------------------------
              logger     1.546M i/100ms
Calculating -------------------------------------
              logger     14.857M (± 4.8%) i/s -     74.198M in   5.006202s
```

Lastly we ran a benchmark to demonstate the difference between master
and our cache when the number of modules increases. This benchmark
measures 1 ancestor, 30 ancestors, and 100 ancestors.

Ruby 3.0 master:

```
Warming up --------------------------------------
            1 module     1.231M i/100ms
          30 modules   432.020k i/100ms
         100 modules   145.399k i/100ms
Calculating -------------------------------------
            1 module     12.210M (± 2.1%) i/s -     61.553M in   5.043400s
          30 modules      4.354M (± 2.7%) i/s -     22.033M in   5.063839s
         100 modules      1.434M (± 2.9%) i/s -      7.270M in   5.072531s

Comparison:
            1 module: 12209958.3 i/s
          30 modules:  4354217.8 i/s - 2.80x  (± 0.00) slower
         100 modules:  1434447.3 i/s - 8.51x  (± 0.00) slower
```

Ruby 3.0 with cvar cache:

```
Warming up --------------------------------------
            1 module     1.641M i/100ms
          30 modules     1.655M i/100ms
         100 modules     1.620M i/100ms
Calculating -------------------------------------
            1 module     16.279M (± 3.8%) i/s -     82.038M in   5.046923s
          30 modules     15.891M (± 3.9%) i/s -     79.459M in   5.007958s
         100 modules     16.087M (± 3.6%) i/s -     81.005M in   5.041931s

Comparison:
            1 module: 16279458.0 i/s
         100 modules: 16087484.6 i/s - same-ish: difference falls within error
          30 modules: 15891406.2 i/s - same-ish: difference falls within error
```

Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
2021-05-11 12:04:27 -07:00
Aaron Patterson 9a6226c61e Eagerly allocate instance variable tables along with object
This allows us to allocate the right size for the object in advance,
meaning that we don't have to pay the cost of ivar table extension
later.  The idea is that if an object type ever became "extended" at
some point, then it is very likely it will become extended again.  So we
may as well allocate the ivar table up front.
2021-05-03 14:11:48 -07:00
wonda-tea-coffee cc5bab80e4 [Doc] Fix a typo s/visilibity/visibility/ 2021-04-25 19:46:37 +12:00
Jeremy Evans 50c54d40a8
Evaluate multiple assignment left hand side before right hand side
In regular assignment, Ruby evaluates the left hand side before
the right hand side.  For example:

```ruby
foo[0] = bar
```

Calls `foo`, then `bar`, then `[]=` on the result of `foo`.

Previously, multiple assignment didn't work this way.  If you did:

```ruby
abc.def, foo[0] = bar, baz
```

Ruby would previously call `bar`, then `baz`, then `abc`, then
`def=` on the result of `abc`, then `foo`, then `[]=` on the
result of `foo`.

This change makes multiple assignment similar to single assignment,
changing the evaluation order of the above multiple assignment code
to calling `abc`, then `foo`, then `bar`, then `baz`, then `def=` on
the result of `abc`, then `[]=` on the result of `foo`.

Implementing this is challenging with the stack-based virtual machine.
We need to keep track of all of the left hand side attribute setter
receivers and setter arguments, and then keep track of the stack level
while handling the assignment processing, so we can issue the
appropriate topn instructions to get the receiver.  Here's an example
of how the multiple assignment is executed, showing the stack and
instructions:

```
self                                      # putself
abc                                       # send
abc, self                                 # putself
abc, foo                                  # send
abc, foo, 0                               # putobject 0
abc, foo, 0, [bar, baz]                   # evaluate RHS
abc, foo, 0, [bar, baz], baz, bar         # expandarray
abc, foo, 0, [bar, baz], baz, bar, abc    # topn 5
abc, foo, 0, [bar, baz], baz, abc, bar    # swap
abc, foo, 0, [bar, baz], baz, def=        # send
abc, foo, 0, [bar, baz], baz              # pop
abc, foo, 0, [bar, baz], baz, foo         # topn 3
abc, foo, 0, [bar, baz], baz, foo, 0      # topn 3
abc, foo, 0, [bar, baz], baz, foo, 0, baz # topn 2
abc, foo, 0, [bar, baz], baz, []=         # send
abc, foo, 0, [bar, baz], baz              # pop
abc, foo, 0, [bar, baz]                   # pop
[bar, baz], foo, 0, [bar, baz]            # setn 3
[bar, baz], foo, 0                        # pop
[bar, baz], foo                           # pop
[bar, baz]                                # pop
```

As multiple assignment must deal with splats, post args, and any level
of nesting, it gets quite a bit more complex than this in non-trivial
cases. To handle this, struct masgn_state is added to keep
track of the overall state of the mass assignment, which stores a linked
list of struct masgn_attrasgn, one for each assigned attribute.

This adds a new optimization that replaces a topn 1/pop instruction
combination with a single swap instruction for multiple assignment
to non-aref attributes.

This new approach isn't compatible with one of the optimizations
previously used, in the case where the multiple assignment return value
was not needed, there was no lhs splat, and one of the left hand side
used an attribute setter.  This removes that optimization. Removing
the optimization allowed for removing the POP_ELEMENT and adjust_stack
functions.

This adds a benchmark to measure how much slower multiple
assignment is with the correct evaluation order.

This benchmark shows:

* 4-9% decrease for attribute sets
* 14-23% decrease for array member sets
* Basically same speed for local variable sets

Importantly, it shows no significant difference between the popped
(where return value of the multiple assignment is not needed) and
!popped (where return value of the multiple assignment is needed)
cases for attribute and array member sets.  This indicates the
previous optimization, which was dropped in the evaluation
order fix and only affected the popped case, is not important to
performance.

Fixes [Bug #4443]
2021-04-21 10:49:19 -07:00
tompng (tomoya ishida) 9f9045123e
st.c: skip all deleted entries [Bug #17779]
Update the start entry skipping all already deleted entries.
Fixes performance issue of `Hash#first` in a certain case.
2021-04-11 19:05:26 +09:00
Nobuyoshi Nakada 382d3a4516
Improve Enumerable#tally performance
Iteration per second (i/s)

|       |compare-ruby|built-ruby|
|:------|-----------:|---------:|
|tally  |      52.814|   114.936|
|       |           -|     2.18x|
2021-03-16 23:06:41 +09:00
Jean Boussier 1041bff3b2 Add a benchmark for RubyVM::InstructionSequence.load_from_binary 2021-03-10 13:44:07 -08:00
Jean Boussier a03653d386 proc.c: make bind_call use existing callable method entry when possible
The most common use case for `bind_call` is to protect from core
methods being redefined, for instance a typical use:

```ruby
UNBOUND_METHOD_MODULE_NAME = Module.instance_method(:name)
def real_mod_name(mod)
  UNBOUND_METHOD_MODULE_NAME.bind_call(mod)
end
```

But it's extremely common that the method wasn't actually redefined.
In such case we can avoid creating a new callable method entry,
and simply delegate to the receiver.

This result in a 1.5-2X speed-up for the fast path, and little to
no impact on the slowpath:

```
compare-ruby: ruby 3.1.0dev (2021-02-05T06:33:00Z master b2674c1fd7) [x86_64-darwin19]
built-ruby: ruby 3.1.0dev (2021-02-15T10:35:17Z bind-call-fastpath d687e06615) [x86_64-darwin19]

|          |compare-ruby|built-ruby|
|:---------|-----------:|---------:|
|fastpath  |     11.325M|   16.393M|
|          |           -|     1.45x|
|slowpath  |     10.488M|   10.242M|
|          |       1.02x|         -|
```
2021-03-10 13:43:22 -08:00
S.H efd19badf4
Improve performance some Numeric methods [Feature #17632] (#4190) 2021-02-19 11:11:19 -08:00
Takashi Kokubun a0216b1acf
Do not run File.write while Ractors are running
also make sure all local variables have the __bmdv_ prefix.
2021-02-11 00:25:46 -08:00
Takashi Kokubun 27382eb9fc
Add a benchmark-driver runner for Ractor (#4172)
* Add a benchmark-driver runner for Ractor

* Process.clock_gettime(Process:CLOCK_MONOTONIC) could be slow

in Ruby 3.0 Ractor

* Fetching Time could also be slow

* Fix a comment

* Assert overriding a private method
2021-02-10 21:24:25 -08:00
Nobuyoshi Nakada ad2c7f8a1e
Simple benchmark of Float#to_s 2021-02-10 19:42:00 +09:00
S.H fad7908a5d
Improve performance Float#positive? and Float#negative? [Feature #17614] (#4160) 2021-02-08 20:29:42 -08:00
Takashi Kokubun e1fee7f949
Rename RubyVM::MJIT to RubyVM::JIT
because the name "MJIT" is an internal code name, it's inconsistent with
--jit while they are related to each other, and I want to discourage future
JIT implementation-specific (e.g. MJIT-specific) APIs by this rename.

[Feature #17490]
2021-01-13 22:46:51 -08:00
S.H daec5f9edc
Improve performance some Float methods [Feature #17498] (#4018) 2021-01-01 18:39:07 -08:00
Takashi Kokubun dbb4f19969
Allow inlining Integer#-@ and #~
```
$ benchmark-driver -v --rbenv 'before --jit;after --jit' benchmark/mjit_integer.yml --filter '(comp|uminus)'
before --jit: ruby 3.0.0dev (2020-12-23T05:41:44Z master 0dd4896175) +JIT [x86_64-linux]
after --jit: ruby 3.0.0dev (2020-12-23T06:25:41Z master 8887d78992) +JIT [x86_64-linux]
last_commit=Allow inlining Integer#-@ and #~
Calculating -------------------------------------
                     before --jit  after --jit
        mjit_comp(1)      44.006M      70.417M i/s -     40.000M times in 0.908967s 0.568042s
      mjit_uminus(1)      44.333M      68.422M i/s -     40.000M times in 0.902255s 0.584603s

Comparison:
                     mjit_comp(1)
         after --jit:  70417331.4 i/s
        before --jit:  44005980.4 i/s - 1.60x  slower

                   mjit_uminus(1)
         after --jit:  68422468.8 i/s
        before --jit:  44333371.0 i/s - 1.54x  slower
```
2020-12-22 22:32:19 -08:00
Koichi Sasada dff67c809f fix duplicated name 2020-12-16 10:29:48 +09:00
Benoit Daloze b4ec4a41c2 Guard all accesses to RubyVM::MJIT with defined?(RubyVM::MJIT) &&
* Otherwise those tests, etc cannot run on alternative Ruby implementations.
2020-12-04 16:45:54 +01:00
Alan Wu 68ffc8db08 Set allocator on class creation
Allocating an instance of a class uses the allocator for the class. When
the class has no allocator set, Ruby looks for it in the super class
(see rb_get_alloc_func()).

It's uncommon for classes created from Ruby code to ever have an
allocator set, so it's common during the allocation process to search
all the way to BasicObject from the class with which the allocation is
being performed. This makes creating instances of classes that have
long ancestry chains more expensive than creating instances of classes
have that shorter ancestry chains.

Setting the allocator at class creation time removes the need to perform
a search for the alloctor during allocation.

This is a breaking change for C-extensions that assume that classes
created from Ruby code have no allocator set. Libraries that setup a
class hierarchy in Ruby code and then set the allocator on some parent
class, for example, can experience breakage. This seems like an unusual
use case and hopefully it is rare or non-existent in practice.

Rails has many classes that have upwards of 60 elements in the ancestry
chain and benchmark shows a significant improvement for allocating with
a class that includes 64 modules.

```
pre: ruby 3.0.0dev (2020-11-12T14:39:27Z master 6325866421)
post: ruby 3.0.0dev (2020-11-12T20:15:30Z cut-allocator-lookup)

Comparison:
                  allocate_8_deep
                post:  10336985.6 i/s
                 pre:   8691873.1 i/s - 1.19x  slower

                 allocate_32_deep
                post:  10423181.2 i/s
                 pre:   6264879.1 i/s - 1.66x  slower

                 allocate_64_deep
                post:  10541851.2 i/s
                 pre:   4936321.5 i/s - 2.14x  slower

                allocate_128_deep
                post:  10451505.0 i/s
                 pre:   3031313.5 i/s - 3.45x  slower
```
2020-11-16 17:41:40 -05:00
Aaron Patterson d7581370fd Add a benchmark for polymorphic ivar setting
This benchmark demonstrates the performance of setting an instance
variable when the type of object is constantly changing.  This benchmark
should give us an idea of the performance of ivar setting in a
polymorphic environment
2020-11-09 14:05:41 -08:00
Aaron Patterson eb229994e5 eagerly initialize ivar table when index is small enough
When the inline cache is written, the iv table will contain an entry for
the instance variable.  If we get an inline cache hit, then we know the
iv table must contain a value for the index written to the inline cache.

If the index in the inline cache is larger than the list on the object,
but *smaller* than the iv index table on the class, then we can just
eagerly allocate the iv list to be the same size as the iv index table.

This avoids duplicate work of checking frozen as well as looking up the
index for the particular instance variable name.
2020-11-09 09:44:16 -08:00
Nobuyoshi Nakada 8f9c113f35
Added benchmark of vm_send by variable [ci skip] 2020-10-28 09:47:46 +09:00
eileencodes 637d1cc0c0 Improve the performance of super
This PR improves the performance of `super` calls. While working on some
Rails optimizations jhawthorn discovered that `super` calls were slower
than expected.

The changes here do the following:

1) Adds a check for whether the call frame is not equal to the method
entry iseq. This avoids the `rb_obj_is_kind_of` check on the next line
which is quite slow. If the current call frame is equal to the method
entry we know we can't have an instance eval, etc.
2) Changes `FL_TEST` to `FL_TEST_RAW`. This is safe because we've
already done the check for `T_ICLASS` above.
3) Adds a benchmark for `T_ICLASS` super calls.
4) Note: makes a chage for `method_entry_cref` to use `const`.

On master the benchmarks showed that `super` is 1.76x slower. Our
changes improved the performance so that it is now only 1.36x slower.

Benchmark IPS:

```
Warming up --------------------------------------
               super   244.918k i/100ms
         method call   383.007k i/100ms
Calculating -------------------------------------
               super      2.280M (± 6.7%) i/s -     11.511M in   5.071758s
         method call      3.834M (± 4.9%) i/s -     19.150M in   5.008444s

Comparison:
         method call:  3833648.3 i/s
               super:  2279837.9 i/s - 1.68x  (± 0.00) slower
```

With changes:

```
Warming up --------------------------------------
               super   308.777k i/100ms
         method call   375.051k i/100ms
Calculating -------------------------------------
               super      2.951M (± 5.4%) i/s -     14.821M in   5.039592s
         method call      3.551M (± 4.9%) i/s -     18.002M in   5.081695s

Comparison:
         method call:  3551372.7 i/s
               super:  2950557.9 i/s - 1.20x  (± 0.00) slower
```

Ruby VM benchmarks also showed an improvement:

Existing `vm_super` benchmark`.

```
$ make benchmark ITEM=vm_super

|          |compare-ruby|built-ruby|
|:---------|-----------:|---------:|
|vm_super  |     21.555M|   37.819M|
|          |           -|     1.75x|
```

New `vm_iclass_super` benchmark:

```
$ make benchmark ITEM=vm_iclass_super

|                 |compare-ruby|built-ruby|
|:----------------|-----------:|---------:|
|vm_iclass_super  |      1.669M|    3.683M|
|                 |           -|     2.21x|
```

This is the benchmark script used for the benchmark-ips benchmarks:

```ruby
require "benchmark/ips"

class Foo
  def zuper; end
  def top; end

  last_method = "top"

  ("A".."M").each do |module_name|
    eval <<-EOM
    module #{module_name}
      def zuper; super; end
      def #{module_name.downcase}
        #{last_method}
      end
    end
    prepend #{module_name}
    EOM
    last_method = module_name.downcase
  end
end

foo = Foo.new

Benchmark.ips do |x|
  x.report "super" do
    foo.zuper
  end

  x.report "method call" do
    foo.m
  end

  x.compare!
end
```

Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
Co-authored-by: John Hawthorn <john@hawthorn.email>
2020-09-23 11:52:36 -07:00
Jean Boussier 5001cc4716 Optimize ObjectSpace.dump_all
The two main optimization are:
  - buffer writes for improved performance
  - avoid formatting functions when possible

```

|                   |compare-ruby|built-ruby|
|:------------------|-----------:|---------:|
|dump_all_string    |       1.038|   195.925|
|                   |           -|   188.77x|
|dump_all_file      |      33.453|   139.645|
|                   |           -|     4.17x|
|dump_all_dev_null  |      44.030|   278.552|
|                   |           -|     6.33x|
```
2020-09-09 11:11:36 -07:00
Nobuyoshi Nakada 54acb3dd52
Improved Enumerable::Lazy#zip
|                    |compare-ruby|built-ruby|
|:-------------------|-----------:|---------:|
|first_ary           |    290.514k|  296.331k|
|                    |           -|     1.02x|
|first_nonary        |    166.954k|  169.178k|
|                    |           -|     1.01x|
|first_noarg         |    299.547k|  305.358k|
|                    |           -|     1.02x|
|take3_ary           |    129.388k|  188.360k|
|                    |           -|     1.46x|
|take3_nonary        |     90.684k|  112.688k|
|                    |           -|     1.24x|
|take3_noarg         |    131.940k|  189.471k|
|                    |           -|     1.44x|
|chain-first_ary     |    195.913k|  286.194k|
|                    |           -|     1.46x|
|chain-first_nonary  |    127.483k|  168.716k|
|                    |           -|     1.32x|
|chain-first_noarg   |    201.252k|  298.562k|
|                    |           -|     1.48x|
|chain-take3_ary     |    101.189k|  183.188k|
|                    |           -|     1.81x|
|chain-take3_nonary  |     75.381k|  112.301k|
|                    |           -|     1.49x|
|chain-take3_noarg   |    101.483k|  192.148k|
|                    |           -|     1.89x|
|block               |    296.696k|  292.877k|
|                    |       1.01x|         -|
2020-07-23 16:57:26 +09:00
Nobuyoshi Nakada 6b3cff12f6
Improved Enumerable::Lazy#flat_map
|        |compare-ruby|built-ruby|
|:-------|-----------:|---------:|
|num3    |     96.333k|  160.732k|
|        |           -|     1.67x|
|num10   |     96.615k|  159.150k|
|        |           -|     1.65x|
|ary2    |    103.836k|  172.787k|
|        |           -|     1.66x|
|ary10   |    109.249k|  177.252k|
|        |           -|     1.62x|
|ary20   |    106.628k|  177.371k|
|        |           -|     1.66x|
|ary50   |    107.135k|  162.282k|
|        |           -|     1.51x|
|ary100  |    106.513k|  177.626k|
|        |           -|     1.67x|
2020-07-23 16:57:26 +09:00
Kenta Murata b4e784434c
Optimize Array#min (#3324)
The benchmark result is below:

|                |compare-ruby|built-ruby|
|:---------------|-----------:|---------:|
|ary2.min        |     39.105M|   39.442M|
|                |           -|     1.01x|
|ary10.min       |     23.995M|   30.762M|
|                |           -|     1.28x|
|ary100.min      |      6.249M|   10.783M|
|                |           -|     1.73x|
|ary500.min      |      1.408M|    2.714M|
|                |           -|     1.93x|
|ary1000.min     |    828.397k|    1.465M|
|                |           -|     1.77x|
|ary2000.min     |    332.256k|  570.504k|
|                |           -|     1.72x|
|ary3000.min     |    338.079k|  573.868k|
|                |           -|     1.70x|
|ary5000.min     |    168.217k|  286.114k|
|                |           -|     1.70x|
|ary10000.min    |     85.512k|  143.551k|
|                |           -|     1.68x|
|ary20000.min    |     43.264k|   71.935k|
|                |           -|     1.66x|
|ary50000.min    |     17.317k|   29.107k|
|                |           -|     1.68x|
|ary100000.min   |      9.072k|   14.540k|
|                |           -|     1.60x|
|ary1000000.min  |     872.930|    1.436k|
|                |           -|     1.64x|

compare-ruby is 9f4b7fc82e.
2020-07-18 23:45:25 +09:00
Kenta Murata a63f520971
Optimize Array#max (#3325)
The benchmark result is below:

|                |compare-ruby|built-ruby|
|:---------------|-----------:|---------:|
|ary2.max        |     38.837M|   40.830M|
|                |           -|     1.05x|
|ary10.max       |     23.035M|   32.626M|
|                |           -|     1.42x|
|ary100.max      |      5.490M|   11.020M|
|                |           -|     2.01x|
|ary500.max      |      1.324M|    2.679M|
|                |           -|     2.02x|
|ary1000.max     |    699.167k|    1.403M|
|                |           -|     2.01x|
|ary2000.max     |    284.321k|  570.446k|
|                |           -|     2.01x|
|ary3000.max     |    282.613k|  571.683k|
|                |           -|     2.02x|
|ary5000.max     |    145.120k|  285.546k|
|                |           -|     1.97x|
|ary10000.max    |     72.102k|  142.831k|
|                |           -|     1.98x|
|ary20000.max    |     36.065k|   72.077k|
|                |           -|     2.00x|
|ary50000.max    |     14.343k|   29.139k|
|                |           -|     2.03x|
|ary100000.max   |      7.586k|   14.472k|
|                |           -|     1.91x|
|ary1000000.max  |     726.915|    1.495k|
|                |           -|     2.06x|
2020-07-18 23:45:00 +09:00
Takashi Kokubun 167d139487
Inline builtin struct aref
We don't do this for aset because it might raise a FrozenError.

```
$ benchmark-driver -v --rbenv 'before;after;before --jit;after --jit' benchmark/mjit_struct_aref.yml --repeat-count=4
before: ruby 2.8.0dev (2020-07-06T01:47:11Z master d94ef7c6b6) [x86_64-linux]
after: ruby 2.8.0dev (2020-07-06T07:11:51Z master 85425168f4) [x86_64-linux]
last_commit=Inline builtin struct aref
before --jit: ruby 2.8.0dev (2020-07-06T01:47:11Z master d94ef7c6b6) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-07-06T07:11:51Z master 85425168f4) +JIT [x86_64-linux]
last_commit=Inline builtin struct aref
Calculating -------------------------------------
                             before       after  before --jit  after --jit
mjit_struct_aref(struct)    34.783M     34.810M       48.321M      58.378M i/s -     40.000M times in 1.149996s 1.149097s 0.827794s 0.685192s

Comparison:
             mjit_struct_aref(struct)
             after --jit:  58377836.7 i/s
            before --jit:  48321205.7 i/s - 1.21x  slower
                   after:  34809935.5 i/s - 1.68x  slower
                  before:  34782736.5 i/s - 1.68x  slower
```
2020-07-06 00:14:00 -07:00
Takashi Kokubun 24fa37d87a
Make Kernel#then, #yield_self, #frozen? builtin (#3283)
* Make Kernel#then, #yield_self, #frozen? builtin

* Fix test_jit
2020-07-03 18:02:43 -07:00
Takashi Kokubun f3a0d7a203
Rewrite Kernel#tap with Ruby (#3281)
* Rewrite Kernel#tap with Ruby

This was good for VM too, but of course my intention is to unblock JIT's inlining of a block over yield
(inlining invokeyield has not been committed though).

* Fix test_settracefunc

About the :tap deletions, the :tap events are actually traced (we already have a TracePoint test for builtin methods),
but it's filtered out by tp.path == "xyzzy" (it became "<internal:kernel>"). We could trace tp.path == "<internal:kernel>"
cases too, but the lineno is impacted by kernel.rb changes and I didn't want to make it fragile for kernel.rb lineno changes.
2020-07-03 09:52:35 -07:00
Takashi Kokubun 0703e01471
Mark some Integer methods as inline (#3264) 2020-06-27 10:07:47 -07:00
Vladimir Dementyev 6770d8f1b0 Add pattern matching with arrays benchmark 2020-06-27 13:51:03 +09:00
Takashi Kokubun 7982dc1dfd
Decide JIT-ed insn based on cached cfunc
for opt_* insns.

opt_eq handles rb_obj_equal inside opt_eq, and all other cfunc is
handled by opt_send_without_block. Therefore we can't decide which insn
should be generated by checking whether it's cfunc cc or not.

```
$ benchmark-driver -v --rbenv 'before --jit;after --jit' benchmark/mjit_opt_cc_insns.yml --repeat-count=4
before --jit: ruby 2.8.0dev (2020-06-26T05:21:43Z master 9dbc2294a6) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-06-26T06:30:18Z master 75cece1b0b) +JIT [x86_64-linux]
last_commit=Decide JIT-ed insn based on cached cfunc
Calculating -------------------------------------
                     before --jit  after --jit
        mjit_nil?(1)      73.878M      74.021M i/s -     40.000M times in 0.541432s 0.540391s
         mjit_not(1)      72.635M      74.601M i/s -     40.000M times in 0.550702s 0.536187s
     mjit_eq(1, nil)       7.331M       7.445M i/s -      8.000M times in 1.091211s 1.074596s
     mjit_eq(nil, 1)      49.450M      64.711M i/s -      8.000M times in 0.161781s 0.123627s

Comparison:
                     mjit_nil?(1)
         after --jit:  74020528.4 i/s
        before --jit:  73878185.9 i/s - 1.00x  slower

                      mjit_not(1)
         after --jit:  74600882.0 i/s
        before --jit:  72634507.6 i/s - 1.03x  slower

                  mjit_eq(1, nil)
         after --jit:   7444657.4 i/s
        before --jit:   7331304.3 i/s - 1.02x  slower

                  mjit_eq(nil, 1)
         after --jit:  64710790.6 i/s
        before --jit:  49449507.4 i/s - 1.31x  slower
```
2020-06-25 23:33:08 -07:00
Takashi Kokubun 946e5cc668
Annotate Kernel#class as inline (#3250)
```
$ benchmark-driver -v --rbenv 'before;after;before --jit;after --jit' benchmark/mjit_class.yml --repeat-count=4
before: ruby 2.8.0dev (2020-06-23T07:09:54Z master 37a2e48d76) [x86_64-linux]
after: ruby 2.8.0dev (2020-06-23T17:29:56Z inline-class 0ff147c007) [x86_64-linux]
before --jit: ruby 2.8.0dev (2020-06-23T07:09:54Z master 37a2e48d76) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-06-23T17:29:56Z inline-class 0ff147c007) +JIT [x86_64-linux]
Calculating -------------------------------------
                         before       after  before --jit  after --jit
    mjit_class(self)    39.219M     40.060M       53.502M      69.202M i/s -     40.000M times in 1.019915s 0.998495s 0.747631s 0.578021s
       mjit_class(1)    39.567M     41.242M       52.100M      68.895M i/s -     40.000M times in 1.010935s 0.969885s 0.767749s 0.580591s

Comparison:
                 mjit_class(self)
         after --jit:  69201690.7 i/s
        before --jit:  53502336.4 i/s - 1.29x  slower
               after:  40060289.1 i/s - 1.73x  slower
              before:  39218939.2 i/s - 1.76x  slower

                    mjit_class(1)
         after --jit:  68895358.6 i/s
        before --jit:  52100353.0 i/s - 1.32x  slower
               after:  41241993.6 i/s - 1.67x  slower
              before:  39567314.0 i/s - 1.74x  slower
```
2020-06-23 23:49:03 -07:00
Takashi Kokubun 78352fb52e
Compile opt_send for opt_* only when cc has ISeq
because opt_nil/opt_not/opt_eq populates cc even when it doesn't
fallback to opt_send_without_block because of vm_method_cfunc_is.

```
$ benchmark-driver -v --rbenv 'before --jit;after --jit' benchmark/mjit_opt_cc_insns.yml --repeat-count=4
before --jit: ruby 2.8.0dev (2020-06-22T08:11:24Z master d231b8f95b) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-06-22T08:53:27Z master e1125879ed) +JIT [x86_64-linux]
last_commit=Compile opt_send for opt_* only when cc has ISeq
Calculating -------------------------------------
                     before --jit  after --jit
        mjit_nil?(1)      54.106M      73.693M i/s -     40.000M times in 0.739288s 0.542795s
         mjit_not(1)      53.398M      74.477M i/s -     40.000M times in 0.749090s 0.537075s
     mjit_eq(1, nil)       7.427M       6.497M i/s -      8.000M times in 1.077136s 1.231326s

Comparison:
                     mjit_nil?(1)
         after --jit:  73692594.3 i/s
        before --jit:  54106108.4 i/s - 1.36x  slower

                      mjit_not(1)
         after --jit:  74477487.9 i/s
        before --jit:  53398125.0 i/s - 1.39x  slower

                  mjit_eq(1, nil)
        before --jit:   7427105.9 i/s
         after --jit:   6497063.0 i/s - 1.14x  slower
```

Actually opt_eq becomes slower by this. Maybe it's indeed using
opt_send_without_block, but I'll approach that one in another commit.
2020-06-22 02:08:21 -07:00
Takashi Kokubun 4c5780e51e
Share warmup logic across MJIT benchmarks 2020-06-22 00:54:27 -07:00
Takashi Kokubun faf93e4545
The RUBYOPT= comment is no longer needed 2020-06-22 00:20:30 -07:00
Takashi Kokubun 8838600c1e
Stop relying on `make benchmark`'s `-I$(srcdir)/benchmark/lib`
These days I don't use `make benchmark`. The YAML files should be
executable with bare `benchmark-driver` CLI without passing
`RUBYOPT=-Ibenchmark/lib`.
2020-06-22 00:17:10 -07:00
Takashi Kokubun 7561db8c00
Introduce Primitive.attr! to annotate 'inline' (#3242)
[Feature #15589]
2020-06-20 17:13:03 -07:00
Takashi Kokubun 95b0fed371
Make Integer#zero? a separated method and builtin (#3226)
A prerequisite to fix https://bugs.ruby-lang.org/issues/15589 with JIT.
This commit alone doesn't make a significant difference yet, but I thought
this commit should be committed independently.

This method override was discussed in [Misc #16961].
2020-06-20 14:55:09 -07:00
Ryuta Kamizono 9d24ddbb53 Fix `make benchmark` example
`make benchmark ARGS=../benchmark/erb_render.yml` does not work.

```
% make benchmark ARGS=../benchmark/erb_render.yml
/Users/kamipo/.rbenv/shims/ruby --disable=gems -rrubygems -I./benchmark/lib ./benchmark/benchmark-driver/exe/benchmark-driver \
	            --executables="compare-ruby::/Users/kamipo/.rbenv/shims/ruby --disable=gems -I.ext/common --disable-gem" \
	            --executables="built-ruby::./miniruby -I./lib -I. -I.ext/common  ./tool/runruby.rb --extout=.ext  -- --disable-gems --disable-gem" \
	            ../benchmark/erb_render.yml 
Traceback (most recent call last):
	6: from ./benchmark/benchmark-driver/exe/benchmark-driver:112:in `<main>'
	5: from ./benchmark/benchmark-driver/exe/benchmark-driver:112:in `flat_map'
	4: from ./benchmark/benchmark-driver/exe/benchmark-driver:112:in `each'
	3: from ./benchmark/benchmark-driver/exe/benchmark-driver:122:in `block in <main>'
	2: from /Users/kamipo/.rbenv/versions/2.6.6/lib/ruby/2.6.0/psych.rb:577:in `load_file'
	1: from /Users/kamipo/.rbenv/versions/2.6.6/lib/ruby/2.6.0/psych.rb:577:in `open'
/Users/kamipo/.rbenv/versions/2.6.6/lib/ruby/2.6.0/psych.rb:577:in `initialize': No such file or directory @ rb_sysopen - ../benchmark/erb_render.yml (Errno::ENOENT)
make: *** [benchmark] Error 1

% make benchmark ARGS=benchmark/erb_render.yml
/Users/kamipo/.rbenv/shims/ruby --disable=gems -rrubygems -I./benchmark/lib ./benchmark/benchmark-driver/exe/benchmark-driver \
	            --executables="compare-ruby::/Users/kamipo/.rbenv/shims/ruby --disable=gems -I.ext/common --disable-gem" \
	            --executables="built-ruby::./miniruby -I./lib -I. -I.ext/common  ./tool/runruby.rb --extout=.ext  -- --disable-gems --disable-gem" \
	            benchmark/erb_render.yml 
Calculating -------------------------------------
                     compare-ruby  built-ruby 
          erb_render     825.454k    783.664k i/s -      1.500M times in 1.817181s 1.914086s

Comparison:
                       erb_render
        compare-ruby:    825454.4 i/s 
          built-ruby:    783663.8 i/s - 1.05x  slower

```
2020-06-07 10:33:14 +09:00
卜部昌平 d4015cfee3 add benchmark for different block handlers 2020-06-03 16:13:47 +09:00
Nobuyoshi Nakada 02cb643ddb
Added String#split benchmark for empty regexp
|               |compare-ruby|built-ruby|
|:--------------|-----------:|---------:|
|re_chars-1     |    169.230k|  973.855k|
|               |           -|     5.75x|
|re_chars-10    |     25.536k|  107.598k|
|               |           -|     4.21x|
|re_chars-100   |      2.621k|   11.207k|
|               |           -|     4.28x|
|re_chars-1000  |     259.098|    1.133k|
|               |           -|     4.37x|
2020-05-12 22:59:58 +09:00