Граф коммитов

401 Коммитов

Автор SHA1 Сообщение Дата
Aaron Patterson 07f055bb13
Revert "Filling cache values on cvar write"
This reverts commit 08de37f9fa.
This reverts commit e8ae922b62.
2021-05-11 13:31:00 -07:00
eileencodes e8ae922b62 Add a cache for class variables
This change implements a cache for class variables. Previously there was
no cache for cvars. Cvar access is slow due to needing to travel all the
way up th ancestor tree before returning the cvar value. The deeper the
ancestor tree the slower cvar access will be.

The benefits of the cache are more visible with a higher number of
included modules due to the way Ruby looks up class variables. The
benchmark here includes 26 modules and shows with the cache, this branch
is 6.5x faster when accessing class variables.

```
compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105ca45) [x86_64-darwin19]
built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be0093ae) [x86_64-darwin19]

|         |compare-ruby|built-ruby|
|:--------|-----------:|---------:|
|vm_cvar  |      5.681M|   36.980M|
|         |           -|     6.51x|
```

Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails
application. ActiveRecord::Base.logger has 71 ancestors. The more
ancestors a tree has, the more clear the speed increase. IE if Base had
only one ancestor we'd see no improvement. This benchmark is run on a
vanilla Rails application.

Benchmark code:

```ruby
require "benchmark/ips"
require_relative "config/environment"

Benchmark.ips do |x|
  x.report "logger" do
    ActiveRecord::Base.logger
  end
end
```

Ruby 3.0 master / Rails 6.1:

```
Warming up --------------------------------------
              logger   155.251k i/100ms
Calculating -------------------------------------
```

Ruby 3.0 with cvar cache /  Rails 6.1:

```
Warming up --------------------------------------
              logger     1.546M i/100ms
Calculating -------------------------------------
              logger     14.857M (± 4.8%) i/s -     74.198M in   5.006202s
```

Lastly we ran a benchmark to demonstate the difference between master
and our cache when the number of modules increases. This benchmark
measures 1 ancestor, 30 ancestors, and 100 ancestors.

Ruby 3.0 master:

```
Warming up --------------------------------------
            1 module     1.231M i/100ms
          30 modules   432.020k i/100ms
         100 modules   145.399k i/100ms
Calculating -------------------------------------
            1 module     12.210M (± 2.1%) i/s -     61.553M in   5.043400s
          30 modules      4.354M (± 2.7%) i/s -     22.033M in   5.063839s
         100 modules      1.434M (± 2.9%) i/s -      7.270M in   5.072531s

Comparison:
            1 module: 12209958.3 i/s
          30 modules:  4354217.8 i/s - 2.80x  (± 0.00) slower
         100 modules:  1434447.3 i/s - 8.51x  (± 0.00) slower
```

Ruby 3.0 with cvar cache:

```
Warming up --------------------------------------
            1 module     1.641M i/100ms
          30 modules     1.655M i/100ms
         100 modules     1.620M i/100ms
Calculating -------------------------------------
            1 module     16.279M (± 3.8%) i/s -     82.038M in   5.046923s
          30 modules     15.891M (± 3.9%) i/s -     79.459M in   5.007958s
         100 modules     16.087M (± 3.6%) i/s -     81.005M in   5.041931s

Comparison:
            1 module: 16279458.0 i/s
         100 modules: 16087484.6 i/s - same-ish: difference falls within error
          30 modules: 15891406.2 i/s - same-ish: difference falls within error
```

Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
2021-05-11 12:04:27 -07:00
Aaron Patterson 9a6226c61e Eagerly allocate instance variable tables along with object
This allows us to allocate the right size for the object in advance,
meaning that we don't have to pay the cost of ivar table extension
later.  The idea is that if an object type ever became "extended" at
some point, then it is very likely it will become extended again.  So we
may as well allocate the ivar table up front.
2021-05-03 14:11:48 -07:00
wonda-tea-coffee cc5bab80e4 [Doc] Fix a typo s/visilibity/visibility/ 2021-04-25 19:46:37 +12:00
Jeremy Evans 50c54d40a8
Evaluate multiple assignment left hand side before right hand side
In regular assignment, Ruby evaluates the left hand side before
the right hand side.  For example:

```ruby
foo[0] = bar
```

Calls `foo`, then `bar`, then `[]=` on the result of `foo`.

Previously, multiple assignment didn't work this way.  If you did:

```ruby
abc.def, foo[0] = bar, baz
```

Ruby would previously call `bar`, then `baz`, then `abc`, then
`def=` on the result of `abc`, then `foo`, then `[]=` on the
result of `foo`.

This change makes multiple assignment similar to single assignment,
changing the evaluation order of the above multiple assignment code
to calling `abc`, then `foo`, then `bar`, then `baz`, then `def=` on
the result of `abc`, then `[]=` on the result of `foo`.

Implementing this is challenging with the stack-based virtual machine.
We need to keep track of all of the left hand side attribute setter
receivers and setter arguments, and then keep track of the stack level
while handling the assignment processing, so we can issue the
appropriate topn instructions to get the receiver.  Here's an example
of how the multiple assignment is executed, showing the stack and
instructions:

```
self                                      # putself
abc                                       # send
abc, self                                 # putself
abc, foo                                  # send
abc, foo, 0                               # putobject 0
abc, foo, 0, [bar, baz]                   # evaluate RHS
abc, foo, 0, [bar, baz], baz, bar         # expandarray
abc, foo, 0, [bar, baz], baz, bar, abc    # topn 5
abc, foo, 0, [bar, baz], baz, abc, bar    # swap
abc, foo, 0, [bar, baz], baz, def=        # send
abc, foo, 0, [bar, baz], baz              # pop
abc, foo, 0, [bar, baz], baz, foo         # topn 3
abc, foo, 0, [bar, baz], baz, foo, 0      # topn 3
abc, foo, 0, [bar, baz], baz, foo, 0, baz # topn 2
abc, foo, 0, [bar, baz], baz, []=         # send
abc, foo, 0, [bar, baz], baz              # pop
abc, foo, 0, [bar, baz]                   # pop
[bar, baz], foo, 0, [bar, baz]            # setn 3
[bar, baz], foo, 0                        # pop
[bar, baz], foo                           # pop
[bar, baz]                                # pop
```

As multiple assignment must deal with splats, post args, and any level
of nesting, it gets quite a bit more complex than this in non-trivial
cases. To handle this, struct masgn_state is added to keep
track of the overall state of the mass assignment, which stores a linked
list of struct masgn_attrasgn, one for each assigned attribute.

This adds a new optimization that replaces a topn 1/pop instruction
combination with a single swap instruction for multiple assignment
to non-aref attributes.

This new approach isn't compatible with one of the optimizations
previously used, in the case where the multiple assignment return value
was not needed, there was no lhs splat, and one of the left hand side
used an attribute setter.  This removes that optimization. Removing
the optimization allowed for removing the POP_ELEMENT and adjust_stack
functions.

This adds a benchmark to measure how much slower multiple
assignment is with the correct evaluation order.

This benchmark shows:

* 4-9% decrease for attribute sets
* 14-23% decrease for array member sets
* Basically same speed for local variable sets

Importantly, it shows no significant difference between the popped
(where return value of the multiple assignment is not needed) and
!popped (where return value of the multiple assignment is needed)
cases for attribute and array member sets.  This indicates the
previous optimization, which was dropped in the evaluation
order fix and only affected the popped case, is not important to
performance.

Fixes [Bug #4443]
2021-04-21 10:49:19 -07:00
tompng (tomoya ishida) 9f9045123e
st.c: skip all deleted entries [Bug #17779]
Update the start entry skipping all already deleted entries.
Fixes performance issue of `Hash#first` in a certain case.
2021-04-11 19:05:26 +09:00
Nobuyoshi Nakada 382d3a4516
Improve Enumerable#tally performance
Iteration per second (i/s)

|       |compare-ruby|built-ruby|
|:------|-----------:|---------:|
|tally  |      52.814|   114.936|
|       |           -|     2.18x|
2021-03-16 23:06:41 +09:00
Jean Boussier 1041bff3b2 Add a benchmark for RubyVM::InstructionSequence.load_from_binary 2021-03-10 13:44:07 -08:00
Jean Boussier a03653d386 proc.c: make bind_call use existing callable method entry when possible
The most common use case for `bind_call` is to protect from core
methods being redefined, for instance a typical use:

```ruby
UNBOUND_METHOD_MODULE_NAME = Module.instance_method(:name)
def real_mod_name(mod)
  UNBOUND_METHOD_MODULE_NAME.bind_call(mod)
end
```

But it's extremely common that the method wasn't actually redefined.
In such case we can avoid creating a new callable method entry,
and simply delegate to the receiver.

This result in a 1.5-2X speed-up for the fast path, and little to
no impact on the slowpath:

```
compare-ruby: ruby 3.1.0dev (2021-02-05T06:33:00Z master b2674c1fd7) [x86_64-darwin19]
built-ruby: ruby 3.1.0dev (2021-02-15T10:35:17Z bind-call-fastpath d687e06615) [x86_64-darwin19]

|          |compare-ruby|built-ruby|
|:---------|-----------:|---------:|
|fastpath  |     11.325M|   16.393M|
|          |           -|     1.45x|
|slowpath  |     10.488M|   10.242M|
|          |       1.02x|         -|
```
2021-03-10 13:43:22 -08:00
S.H efd19badf4
Improve performance some Numeric methods [Feature #17632] (#4190) 2021-02-19 11:11:19 -08:00
Takashi Kokubun a0216b1acf
Do not run File.write while Ractors are running
also make sure all local variables have the __bmdv_ prefix.
2021-02-11 00:25:46 -08:00
Takashi Kokubun 27382eb9fc
Add a benchmark-driver runner for Ractor (#4172)
* Add a benchmark-driver runner for Ractor

* Process.clock_gettime(Process:CLOCK_MONOTONIC) could be slow

in Ruby 3.0 Ractor

* Fetching Time could also be slow

* Fix a comment

* Assert overriding a private method
2021-02-10 21:24:25 -08:00
Nobuyoshi Nakada ad2c7f8a1e
Simple benchmark of Float#to_s 2021-02-10 19:42:00 +09:00
S.H fad7908a5d
Improve performance Float#positive? and Float#negative? [Feature #17614] (#4160) 2021-02-08 20:29:42 -08:00
Takashi Kokubun e1fee7f949
Rename RubyVM::MJIT to RubyVM::JIT
because the name "MJIT" is an internal code name, it's inconsistent with
--jit while they are related to each other, and I want to discourage future
JIT implementation-specific (e.g. MJIT-specific) APIs by this rename.

[Feature #17490]
2021-01-13 22:46:51 -08:00
S.H daec5f9edc
Improve performance some Float methods [Feature #17498] (#4018) 2021-01-01 18:39:07 -08:00
Takashi Kokubun dbb4f19969
Allow inlining Integer#-@ and #~
```
$ benchmark-driver -v --rbenv 'before --jit;after --jit' benchmark/mjit_integer.yml --filter '(comp|uminus)'
before --jit: ruby 3.0.0dev (2020-12-23T05:41:44Z master 0dd4896175) +JIT [x86_64-linux]
after --jit: ruby 3.0.0dev (2020-12-23T06:25:41Z master 8887d78992) +JIT [x86_64-linux]
last_commit=Allow inlining Integer#-@ and #~
Calculating -------------------------------------
                     before --jit  after --jit
        mjit_comp(1)      44.006M      70.417M i/s -     40.000M times in 0.908967s 0.568042s
      mjit_uminus(1)      44.333M      68.422M i/s -     40.000M times in 0.902255s 0.584603s

Comparison:
                     mjit_comp(1)
         after --jit:  70417331.4 i/s
        before --jit:  44005980.4 i/s - 1.60x  slower

                   mjit_uminus(1)
         after --jit:  68422468.8 i/s
        before --jit:  44333371.0 i/s - 1.54x  slower
```
2020-12-22 22:32:19 -08:00
Koichi Sasada dff67c809f fix duplicated name 2020-12-16 10:29:48 +09:00
Benoit Daloze b4ec4a41c2 Guard all accesses to RubyVM::MJIT with defined?(RubyVM::MJIT) &&
* Otherwise those tests, etc cannot run on alternative Ruby implementations.
2020-12-04 16:45:54 +01:00
Alan Wu 68ffc8db08 Set allocator on class creation
Allocating an instance of a class uses the allocator for the class. When
the class has no allocator set, Ruby looks for it in the super class
(see rb_get_alloc_func()).

It's uncommon for classes created from Ruby code to ever have an
allocator set, so it's common during the allocation process to search
all the way to BasicObject from the class with which the allocation is
being performed. This makes creating instances of classes that have
long ancestry chains more expensive than creating instances of classes
have that shorter ancestry chains.

Setting the allocator at class creation time removes the need to perform
a search for the alloctor during allocation.

This is a breaking change for C-extensions that assume that classes
created from Ruby code have no allocator set. Libraries that setup a
class hierarchy in Ruby code and then set the allocator on some parent
class, for example, can experience breakage. This seems like an unusual
use case and hopefully it is rare or non-existent in practice.

Rails has many classes that have upwards of 60 elements in the ancestry
chain and benchmark shows a significant improvement for allocating with
a class that includes 64 modules.

```
pre: ruby 3.0.0dev (2020-11-12T14:39:27Z master 6325866421)
post: ruby 3.0.0dev (2020-11-12T20:15:30Z cut-allocator-lookup)

Comparison:
                  allocate_8_deep
                post:  10336985.6 i/s
                 pre:   8691873.1 i/s - 1.19x  slower

                 allocate_32_deep
                post:  10423181.2 i/s
                 pre:   6264879.1 i/s - 1.66x  slower

                 allocate_64_deep
                post:  10541851.2 i/s
                 pre:   4936321.5 i/s - 2.14x  slower

                allocate_128_deep
                post:  10451505.0 i/s
                 pre:   3031313.5 i/s - 3.45x  slower
```
2020-11-16 17:41:40 -05:00
Aaron Patterson d7581370fd Add a benchmark for polymorphic ivar setting
This benchmark demonstrates the performance of setting an instance
variable when the type of object is constantly changing.  This benchmark
should give us an idea of the performance of ivar setting in a
polymorphic environment
2020-11-09 14:05:41 -08:00
Aaron Patterson eb229994e5 eagerly initialize ivar table when index is small enough
When the inline cache is written, the iv table will contain an entry for
the instance variable.  If we get an inline cache hit, then we know the
iv table must contain a value for the index written to the inline cache.

If the index in the inline cache is larger than the list on the object,
but *smaller* than the iv index table on the class, then we can just
eagerly allocate the iv list to be the same size as the iv index table.

This avoids duplicate work of checking frozen as well as looking up the
index for the particular instance variable name.
2020-11-09 09:44:16 -08:00
Nobuyoshi Nakada 8f9c113f35
Added benchmark of vm_send by variable [ci skip] 2020-10-28 09:47:46 +09:00
eileencodes 637d1cc0c0 Improve the performance of super
This PR improves the performance of `super` calls. While working on some
Rails optimizations jhawthorn discovered that `super` calls were slower
than expected.

The changes here do the following:

1) Adds a check for whether the call frame is not equal to the method
entry iseq. This avoids the `rb_obj_is_kind_of` check on the next line
which is quite slow. If the current call frame is equal to the method
entry we know we can't have an instance eval, etc.
2) Changes `FL_TEST` to `FL_TEST_RAW`. This is safe because we've
already done the check for `T_ICLASS` above.
3) Adds a benchmark for `T_ICLASS` super calls.
4) Note: makes a chage for `method_entry_cref` to use `const`.

On master the benchmarks showed that `super` is 1.76x slower. Our
changes improved the performance so that it is now only 1.36x slower.

Benchmark IPS:

```
Warming up --------------------------------------
               super   244.918k i/100ms
         method call   383.007k i/100ms
Calculating -------------------------------------
               super      2.280M (± 6.7%) i/s -     11.511M in   5.071758s
         method call      3.834M (± 4.9%) i/s -     19.150M in   5.008444s

Comparison:
         method call:  3833648.3 i/s
               super:  2279837.9 i/s - 1.68x  (± 0.00) slower
```

With changes:

```
Warming up --------------------------------------
               super   308.777k i/100ms
         method call   375.051k i/100ms
Calculating -------------------------------------
               super      2.951M (± 5.4%) i/s -     14.821M in   5.039592s
         method call      3.551M (± 4.9%) i/s -     18.002M in   5.081695s

Comparison:
         method call:  3551372.7 i/s
               super:  2950557.9 i/s - 1.20x  (± 0.00) slower
```

Ruby VM benchmarks also showed an improvement:

Existing `vm_super` benchmark`.

```
$ make benchmark ITEM=vm_super

|          |compare-ruby|built-ruby|
|:---------|-----------:|---------:|
|vm_super  |     21.555M|   37.819M|
|          |           -|     1.75x|
```

New `vm_iclass_super` benchmark:

```
$ make benchmark ITEM=vm_iclass_super

|                 |compare-ruby|built-ruby|
|:----------------|-----------:|---------:|
|vm_iclass_super  |      1.669M|    3.683M|
|                 |           -|     2.21x|
```

This is the benchmark script used for the benchmark-ips benchmarks:

```ruby
require "benchmark/ips"

class Foo
  def zuper; end
  def top; end

  last_method = "top"

  ("A".."M").each do |module_name|
    eval <<-EOM
    module #{module_name}
      def zuper; super; end
      def #{module_name.downcase}
        #{last_method}
      end
    end
    prepend #{module_name}
    EOM
    last_method = module_name.downcase
  end
end

foo = Foo.new

Benchmark.ips do |x|
  x.report "super" do
    foo.zuper
  end

  x.report "method call" do
    foo.m
  end

  x.compare!
end
```

Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
Co-authored-by: John Hawthorn <john@hawthorn.email>
2020-09-23 11:52:36 -07:00
Jean Boussier 5001cc4716 Optimize ObjectSpace.dump_all
The two main optimization are:
  - buffer writes for improved performance
  - avoid formatting functions when possible

```

|                   |compare-ruby|built-ruby|
|:------------------|-----------:|---------:|
|dump_all_string    |       1.038|   195.925|
|                   |           -|   188.77x|
|dump_all_file      |      33.453|   139.645|
|                   |           -|     4.17x|
|dump_all_dev_null  |      44.030|   278.552|
|                   |           -|     6.33x|
```
2020-09-09 11:11:36 -07:00
Nobuyoshi Nakada 54acb3dd52
Improved Enumerable::Lazy#zip
|                    |compare-ruby|built-ruby|
|:-------------------|-----------:|---------:|
|first_ary           |    290.514k|  296.331k|
|                    |           -|     1.02x|
|first_nonary        |    166.954k|  169.178k|
|                    |           -|     1.01x|
|first_noarg         |    299.547k|  305.358k|
|                    |           -|     1.02x|
|take3_ary           |    129.388k|  188.360k|
|                    |           -|     1.46x|
|take3_nonary        |     90.684k|  112.688k|
|                    |           -|     1.24x|
|take3_noarg         |    131.940k|  189.471k|
|                    |           -|     1.44x|
|chain-first_ary     |    195.913k|  286.194k|
|                    |           -|     1.46x|
|chain-first_nonary  |    127.483k|  168.716k|
|                    |           -|     1.32x|
|chain-first_noarg   |    201.252k|  298.562k|
|                    |           -|     1.48x|
|chain-take3_ary     |    101.189k|  183.188k|
|                    |           -|     1.81x|
|chain-take3_nonary  |     75.381k|  112.301k|
|                    |           -|     1.49x|
|chain-take3_noarg   |    101.483k|  192.148k|
|                    |           -|     1.89x|
|block               |    296.696k|  292.877k|
|                    |       1.01x|         -|
2020-07-23 16:57:26 +09:00
Nobuyoshi Nakada 6b3cff12f6
Improved Enumerable::Lazy#flat_map
|        |compare-ruby|built-ruby|
|:-------|-----------:|---------:|
|num3    |     96.333k|  160.732k|
|        |           -|     1.67x|
|num10   |     96.615k|  159.150k|
|        |           -|     1.65x|
|ary2    |    103.836k|  172.787k|
|        |           -|     1.66x|
|ary10   |    109.249k|  177.252k|
|        |           -|     1.62x|
|ary20   |    106.628k|  177.371k|
|        |           -|     1.66x|
|ary50   |    107.135k|  162.282k|
|        |           -|     1.51x|
|ary100  |    106.513k|  177.626k|
|        |           -|     1.67x|
2020-07-23 16:57:26 +09:00
Kenta Murata b4e784434c
Optimize Array#min (#3324)
The benchmark result is below:

|                |compare-ruby|built-ruby|
|:---------------|-----------:|---------:|
|ary2.min        |     39.105M|   39.442M|
|                |           -|     1.01x|
|ary10.min       |     23.995M|   30.762M|
|                |           -|     1.28x|
|ary100.min      |      6.249M|   10.783M|
|                |           -|     1.73x|
|ary500.min      |      1.408M|    2.714M|
|                |           -|     1.93x|
|ary1000.min     |    828.397k|    1.465M|
|                |           -|     1.77x|
|ary2000.min     |    332.256k|  570.504k|
|                |           -|     1.72x|
|ary3000.min     |    338.079k|  573.868k|
|                |           -|     1.70x|
|ary5000.min     |    168.217k|  286.114k|
|                |           -|     1.70x|
|ary10000.min    |     85.512k|  143.551k|
|                |           -|     1.68x|
|ary20000.min    |     43.264k|   71.935k|
|                |           -|     1.66x|
|ary50000.min    |     17.317k|   29.107k|
|                |           -|     1.68x|
|ary100000.min   |      9.072k|   14.540k|
|                |           -|     1.60x|
|ary1000000.min  |     872.930|    1.436k|
|                |           -|     1.64x|

compare-ruby is 9f4b7fc82e.
2020-07-18 23:45:25 +09:00
Kenta Murata a63f520971
Optimize Array#max (#3325)
The benchmark result is below:

|                |compare-ruby|built-ruby|
|:---------------|-----------:|---------:|
|ary2.max        |     38.837M|   40.830M|
|                |           -|     1.05x|
|ary10.max       |     23.035M|   32.626M|
|                |           -|     1.42x|
|ary100.max      |      5.490M|   11.020M|
|                |           -|     2.01x|
|ary500.max      |      1.324M|    2.679M|
|                |           -|     2.02x|
|ary1000.max     |    699.167k|    1.403M|
|                |           -|     2.01x|
|ary2000.max     |    284.321k|  570.446k|
|                |           -|     2.01x|
|ary3000.max     |    282.613k|  571.683k|
|                |           -|     2.02x|
|ary5000.max     |    145.120k|  285.546k|
|                |           -|     1.97x|
|ary10000.max    |     72.102k|  142.831k|
|                |           -|     1.98x|
|ary20000.max    |     36.065k|   72.077k|
|                |           -|     2.00x|
|ary50000.max    |     14.343k|   29.139k|
|                |           -|     2.03x|
|ary100000.max   |      7.586k|   14.472k|
|                |           -|     1.91x|
|ary1000000.max  |     726.915|    1.495k|
|                |           -|     2.06x|
2020-07-18 23:45:00 +09:00
Takashi Kokubun 167d139487
Inline builtin struct aref
We don't do this for aset because it might raise a FrozenError.

```
$ benchmark-driver -v --rbenv 'before;after;before --jit;after --jit' benchmark/mjit_struct_aref.yml --repeat-count=4
before: ruby 2.8.0dev (2020-07-06T01:47:11Z master d94ef7c6b6) [x86_64-linux]
after: ruby 2.8.0dev (2020-07-06T07:11:51Z master 85425168f4) [x86_64-linux]
last_commit=Inline builtin struct aref
before --jit: ruby 2.8.0dev (2020-07-06T01:47:11Z master d94ef7c6b6) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-07-06T07:11:51Z master 85425168f4) +JIT [x86_64-linux]
last_commit=Inline builtin struct aref
Calculating -------------------------------------
                             before       after  before --jit  after --jit
mjit_struct_aref(struct)    34.783M     34.810M       48.321M      58.378M i/s -     40.000M times in 1.149996s 1.149097s 0.827794s 0.685192s

Comparison:
             mjit_struct_aref(struct)
             after --jit:  58377836.7 i/s
            before --jit:  48321205.7 i/s - 1.21x  slower
                   after:  34809935.5 i/s - 1.68x  slower
                  before:  34782736.5 i/s - 1.68x  slower
```
2020-07-06 00:14:00 -07:00
Takashi Kokubun 24fa37d87a
Make Kernel#then, #yield_self, #frozen? builtin (#3283)
* Make Kernel#then, #yield_self, #frozen? builtin

* Fix test_jit
2020-07-03 18:02:43 -07:00
Takashi Kokubun f3a0d7a203
Rewrite Kernel#tap with Ruby (#3281)
* Rewrite Kernel#tap with Ruby

This was good for VM too, but of course my intention is to unblock JIT's inlining of a block over yield
(inlining invokeyield has not been committed though).

* Fix test_settracefunc

About the :tap deletions, the :tap events are actually traced (we already have a TracePoint test for builtin methods),
but it's filtered out by tp.path == "xyzzy" (it became "<internal:kernel>"). We could trace tp.path == "<internal:kernel>"
cases too, but the lineno is impacted by kernel.rb changes and I didn't want to make it fragile for kernel.rb lineno changes.
2020-07-03 09:52:35 -07:00
Takashi Kokubun 0703e01471
Mark some Integer methods as inline (#3264) 2020-06-27 10:07:47 -07:00
Vladimir Dementyev 6770d8f1b0 Add pattern matching with arrays benchmark 2020-06-27 13:51:03 +09:00
Takashi Kokubun 7982dc1dfd
Decide JIT-ed insn based on cached cfunc
for opt_* insns.

opt_eq handles rb_obj_equal inside opt_eq, and all other cfunc is
handled by opt_send_without_block. Therefore we can't decide which insn
should be generated by checking whether it's cfunc cc or not.

```
$ benchmark-driver -v --rbenv 'before --jit;after --jit' benchmark/mjit_opt_cc_insns.yml --repeat-count=4
before --jit: ruby 2.8.0dev (2020-06-26T05:21:43Z master 9dbc2294a6) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-06-26T06:30:18Z master 75cece1b0b) +JIT [x86_64-linux]
last_commit=Decide JIT-ed insn based on cached cfunc
Calculating -------------------------------------
                     before --jit  after --jit
        mjit_nil?(1)      73.878M      74.021M i/s -     40.000M times in 0.541432s 0.540391s
         mjit_not(1)      72.635M      74.601M i/s -     40.000M times in 0.550702s 0.536187s
     mjit_eq(1, nil)       7.331M       7.445M i/s -      8.000M times in 1.091211s 1.074596s
     mjit_eq(nil, 1)      49.450M      64.711M i/s -      8.000M times in 0.161781s 0.123627s

Comparison:
                     mjit_nil?(1)
         after --jit:  74020528.4 i/s
        before --jit:  73878185.9 i/s - 1.00x  slower

                      mjit_not(1)
         after --jit:  74600882.0 i/s
        before --jit:  72634507.6 i/s - 1.03x  slower

                  mjit_eq(1, nil)
         after --jit:   7444657.4 i/s
        before --jit:   7331304.3 i/s - 1.02x  slower

                  mjit_eq(nil, 1)
         after --jit:  64710790.6 i/s
        before --jit:  49449507.4 i/s - 1.31x  slower
```
2020-06-25 23:33:08 -07:00
Takashi Kokubun 946e5cc668
Annotate Kernel#class as inline (#3250)
```
$ benchmark-driver -v --rbenv 'before;after;before --jit;after --jit' benchmark/mjit_class.yml --repeat-count=4
before: ruby 2.8.0dev (2020-06-23T07:09:54Z master 37a2e48d76) [x86_64-linux]
after: ruby 2.8.0dev (2020-06-23T17:29:56Z inline-class 0ff147c007) [x86_64-linux]
before --jit: ruby 2.8.0dev (2020-06-23T07:09:54Z master 37a2e48d76) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-06-23T17:29:56Z inline-class 0ff147c007) +JIT [x86_64-linux]
Calculating -------------------------------------
                         before       after  before --jit  after --jit
    mjit_class(self)    39.219M     40.060M       53.502M      69.202M i/s -     40.000M times in 1.019915s 0.998495s 0.747631s 0.578021s
       mjit_class(1)    39.567M     41.242M       52.100M      68.895M i/s -     40.000M times in 1.010935s 0.969885s 0.767749s 0.580591s

Comparison:
                 mjit_class(self)
         after --jit:  69201690.7 i/s
        before --jit:  53502336.4 i/s - 1.29x  slower
               after:  40060289.1 i/s - 1.73x  slower
              before:  39218939.2 i/s - 1.76x  slower

                    mjit_class(1)
         after --jit:  68895358.6 i/s
        before --jit:  52100353.0 i/s - 1.32x  slower
               after:  41241993.6 i/s - 1.67x  slower
              before:  39567314.0 i/s - 1.74x  slower
```
2020-06-23 23:49:03 -07:00
Takashi Kokubun 78352fb52e
Compile opt_send for opt_* only when cc has ISeq
because opt_nil/opt_not/opt_eq populates cc even when it doesn't
fallback to opt_send_without_block because of vm_method_cfunc_is.

```
$ benchmark-driver -v --rbenv 'before --jit;after --jit' benchmark/mjit_opt_cc_insns.yml --repeat-count=4
before --jit: ruby 2.8.0dev (2020-06-22T08:11:24Z master d231b8f95b) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-06-22T08:53:27Z master e1125879ed) +JIT [x86_64-linux]
last_commit=Compile opt_send for opt_* only when cc has ISeq
Calculating -------------------------------------
                     before --jit  after --jit
        mjit_nil?(1)      54.106M      73.693M i/s -     40.000M times in 0.739288s 0.542795s
         mjit_not(1)      53.398M      74.477M i/s -     40.000M times in 0.749090s 0.537075s
     mjit_eq(1, nil)       7.427M       6.497M i/s -      8.000M times in 1.077136s 1.231326s

Comparison:
                     mjit_nil?(1)
         after --jit:  73692594.3 i/s
        before --jit:  54106108.4 i/s - 1.36x  slower

                      mjit_not(1)
         after --jit:  74477487.9 i/s
        before --jit:  53398125.0 i/s - 1.39x  slower

                  mjit_eq(1, nil)
        before --jit:   7427105.9 i/s
         after --jit:   6497063.0 i/s - 1.14x  slower
```

Actually opt_eq becomes slower by this. Maybe it's indeed using
opt_send_without_block, but I'll approach that one in another commit.
2020-06-22 02:08:21 -07:00
Takashi Kokubun 4c5780e51e
Share warmup logic across MJIT benchmarks 2020-06-22 00:54:27 -07:00
Takashi Kokubun faf93e4545
The RUBYOPT= comment is no longer needed 2020-06-22 00:20:30 -07:00
Takashi Kokubun 8838600c1e
Stop relying on `make benchmark`'s `-I$(srcdir)/benchmark/lib`
These days I don't use `make benchmark`. The YAML files should be
executable with bare `benchmark-driver` CLI without passing
`RUBYOPT=-Ibenchmark/lib`.
2020-06-22 00:17:10 -07:00
Takashi Kokubun 7561db8c00
Introduce Primitive.attr! to annotate 'inline' (#3242)
[Feature #15589]
2020-06-20 17:13:03 -07:00
Takashi Kokubun 95b0fed371
Make Integer#zero? a separated method and builtin (#3226)
A prerequisite to fix https://bugs.ruby-lang.org/issues/15589 with JIT.
This commit alone doesn't make a significant difference yet, but I thought
this commit should be committed independently.

This method override was discussed in [Misc #16961].
2020-06-20 14:55:09 -07:00
Ryuta Kamizono 9d24ddbb53 Fix `make benchmark` example
`make benchmark ARGS=../benchmark/erb_render.yml` does not work.

```
% make benchmark ARGS=../benchmark/erb_render.yml
/Users/kamipo/.rbenv/shims/ruby --disable=gems -rrubygems -I./benchmark/lib ./benchmark/benchmark-driver/exe/benchmark-driver \
	            --executables="compare-ruby::/Users/kamipo/.rbenv/shims/ruby --disable=gems -I.ext/common --disable-gem" \
	            --executables="built-ruby::./miniruby -I./lib -I. -I.ext/common  ./tool/runruby.rb --extout=.ext  -- --disable-gems --disable-gem" \
	            ../benchmark/erb_render.yml 
Traceback (most recent call last):
	6: from ./benchmark/benchmark-driver/exe/benchmark-driver:112:in `<main>'
	5: from ./benchmark/benchmark-driver/exe/benchmark-driver:112:in `flat_map'
	4: from ./benchmark/benchmark-driver/exe/benchmark-driver:112:in `each'
	3: from ./benchmark/benchmark-driver/exe/benchmark-driver:122:in `block in <main>'
	2: from /Users/kamipo/.rbenv/versions/2.6.6/lib/ruby/2.6.0/psych.rb:577:in `load_file'
	1: from /Users/kamipo/.rbenv/versions/2.6.6/lib/ruby/2.6.0/psych.rb:577:in `open'
/Users/kamipo/.rbenv/versions/2.6.6/lib/ruby/2.6.0/psych.rb:577:in `initialize': No such file or directory @ rb_sysopen - ../benchmark/erb_render.yml (Errno::ENOENT)
make: *** [benchmark] Error 1

% make benchmark ARGS=benchmark/erb_render.yml
/Users/kamipo/.rbenv/shims/ruby --disable=gems -rrubygems -I./benchmark/lib ./benchmark/benchmark-driver/exe/benchmark-driver \
	            --executables="compare-ruby::/Users/kamipo/.rbenv/shims/ruby --disable=gems -I.ext/common --disable-gem" \
	            --executables="built-ruby::./miniruby -I./lib -I. -I.ext/common  ./tool/runruby.rb --extout=.ext  -- --disable-gems --disable-gem" \
	            benchmark/erb_render.yml 
Calculating -------------------------------------
                     compare-ruby  built-ruby 
          erb_render     825.454k    783.664k i/s -      1.500M times in 1.817181s 1.914086s

Comparison:
                       erb_render
        compare-ruby:    825454.4 i/s 
          built-ruby:    783663.8 i/s - 1.05x  slower

```
2020-06-07 10:33:14 +09:00
卜部昌平 d4015cfee3 add benchmark for different block handlers 2020-06-03 16:13:47 +09:00
Nobuyoshi Nakada 02cb643ddb
Added String#split benchmark for empty regexp
|               |compare-ruby|built-ruby|
|:--------------|-----------:|---------:|
|re_chars-1     |    169.230k|  973.855k|
|               |           -|     5.75x|
|re_chars-10    |     25.536k|  107.598k|
|               |           -|     4.21x|
|re_chars-100   |      2.621k|   11.207k|
|               |           -|     4.28x|
|re_chars-1000  |     259.098|    1.133k|
|               |           -|     4.37x|
2020-05-12 22:59:58 +09:00
Nobuyoshi Nakada 693f7ab315 Optimize String#split
Optimized `String#split` with `/ /` (single space regexp) as
simple string splitting.  [ruby-core:98272]

|               |compare-ruby|built-ruby|
|:--------------|-----------:|---------:|
|re_space-1     |    432.786k|    1.539M|
|               |           -|     3.56x|
|re_space-10    |     76.231k|  191.547k|
|               |           -|     2.51x|
|re_space-100   |      8.152k|   19.557k|
|               |           -|     2.40x|
|re_space-1000  |     837.405|    2.022k|
|               |           -|     2.41x|

ruby-core:98272: https://bugs.ruby-lang.org/issues/15771#change-85511
2020-05-12 19:58:58 +09:00
S.H 17083011ee
support builtin for Kernel#Float
# Iteration per second (i/s)

|             |compare-ruby|built-ruby|
|:------------|-----------:|---------:|
|float        |     30.395M|   38.314M|
|             |           -|     1.26x|
|float_true   |      3.833M|   27.322M|
|             |           -|     7.13x|
|float_false  |      4.182M|   24.938M|
|             |           -|     5.96x|
2020-04-22 09:49:13 +09:00
Takashi Kokubun f883d6219e
Unify vm benchmark prefixes to vm_ (#3028)
The vm1_ prefix and vm2_ had had special meaning until
820ad9cb1d and
12068aa4e9. AFAIK there's no special
meaning in vm3_ prefix.

As they have confused people (like "In `benchmark` what is difference
between `vm1_`, `vm2_` and `vm3_`"), I'd like to remove the obsoleted
prefix as we obviated that two years ago.
2020-04-13 21:37:42 -07:00
Takashi Kokubun 310ef9f40b
Make vm_call_cfunc_with_frame a fastpath (#3027)
when there's no need to call CALLER_SETUP_ARG and CALLER_REMOVE_EMPTY_KW_SPLAT
(i.e. !rb_splat_or_kwargs_p(ci) && !calling->kw_splat).

Micro benchmark:
```
$ benchmark-driver -v --rbenv 'before;after' benchmark/vm_send_cfunc.yml --repeat-count=4
before: ruby 2.8.0dev (2020-04-13T23:45:05Z master b9d3ceee8f) [x86_64-linux]
after: ruby 2.8.0dev (2020-04-14T00:48:52Z no-splat-fastpath 418d363722) [x86_64-linux]
Calculating -------------------------------------
                         before       after
       vm_send_cfunc    69.585M     88.724M i/s -    100.000M times in 1.437097s 1.127096s

Comparison:
                    vm_send_cfunc
               after:  88723605.2 i/s
              before:  69584737.1 i/s - 1.28x  slower
```

Optcarrot:
```
$ benchmark-driver -v --rbenv 'before;after' benchmark.yml --repeat-count=12 --output=all
before: ruby 2.8.0dev (2020-04-13T23:45:05Z master b9d3ceee8f) [x86_64-linux]
after: ruby 2.8.0dev (2020-04-14T00:48:52Z no-splat-fastpath 418d363722) [x86_64-linux]
Calculating -------------------------------------
                                       before                 after
Optcarrot Lan_Master.nes    50.76119601545175     42.73858236484051 fps
                            50.76388649761503     51.04211379912850
                            50.80930672252514     51.39455790755538
                            50.90236000778749     51.75656936556145
                            51.01744746340430     51.86875277356489
                            51.06495279015112     51.88692482485558
                            51.07785337168974     51.93429603190578
                            51.20163525187862     51.95768145071314
                            51.34671771913112     52.45577266040274
                            51.35918340835583     52.53163888762858
                            51.46641337418146     52.62172484121034
                            51.50835463462257     52.85064021113239
```
2020-04-13 20:32:59 -07:00
Takashi Kokubun b9d3ceee8f
Unwrap vm_call_cfunc indirection on JIT
for VM_METHOD_TYPE_CFUNC.

This has been known to decrease optcarrot fps:

```
$ benchmark-driver -v --rbenv 'before --jit;after --jit' benchmark.yml --repeat-count=24 --output=all
before --jit: ruby 2.8.0dev (2020-04-13T16:25:13Z master fb40495cd9) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-04-13T23:23:11Z mjit-inline-c bdcd06d159) +JIT [x86_64-linux]
Calculating -------------------------------------
                                 before --jit           after --jit
Optcarrot Lan_Master.nes    66.38132676191719     67.41369177299630 fps
                            69.42728743772243     68.90327567263054
                            72.16028300263211     69.62605130880686
                            72.46631319102777     70.48818243767207
                            73.37078877002490     70.79522887347566
                            73.69422431217367     70.99021920193194
                            74.01471487018695     74.69931965402584
                            75.48685183295630     74.86714575949016
                            75.54445264507932     75.97864419721677
                            77.28089738169756     76.48908637569581
                            78.04183397891302     76.54320932488021
                            78.36807984096562     76.59407262898067
                            78.92898762543574     77.31316743361343
                            78.93576483233765     77.97153484180480
                            79.13754917503078     77.98478782102325
                            79.62648945850653     78.02263322726446
                            79.86334213878064     78.26333724045934
                            80.05100635898518     78.60056756355614
                            80.26186843769584     78.91082645644468
                            80.34205717020330     79.01226659142263
                            80.62286066044338     79.32733939423721
                            80.95883033058557     79.63793060542024
                            80.97376819251613     79.73108936622778
                            81.23050939202896     80.18280109433088
```

and I deleted this capability in an early stage of YARV-MJIT development:
0ab130feee

I suspect either of the following things could be the cause:

* Directly calling vm_call_cfunc requires more optimization effort in GCC,
  resulting in 30ms-ish compilation time increase for such methods and
  decreasing the number of methods compiled in a benchmarked period.

* Code size increase => icache miss hit

These hypotheses could be verified by some methodologies. However, I'd
like to introduce this regardless of the result because this blocks
inlining C method's definition.

I may revert this commit when I give up to implement inlining C method
definition, which requires this change.

Microbenchmark-wise, this gives slight performance improvement:

```
$ benchmark-driver -v --rbenv 'before --jit;after --jit' benchmark/mjit_send_cfunc.yml --repeat-count=4
before --jit: ruby 2.8.0dev (2020-04-13T16:25:13Z master fb40495cd9) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-04-13T23:23:11Z mjit-inline-c bdcd06d159) +JIT [x86_64-linux]
Calculating -------------------------------------
                     before --jit  after --jit
     mjit_send_cfunc      41.961M      56.489M i/s -    100.000M times in 2.383143s 1.770244s

Comparison:
                  mjit_send_cfunc
         after --jit:  56489372.5 i/s
        before --jit:  41961388.1 i/s - 1.35x  slower
```
2020-04-13 16:45:05 -07:00
Takashi Kokubun 151f8be40d
Make JIT-ed leave insn leaf
to eliminate sp / pc moves by cancelling JIT execution on interrupts.

$ benchmark-driver benchmark.yml -v --rbenv 'before --jit;after --jit' --repeat-count=12 --output=all
before --jit: ruby 2.8.0dev (2020-04-01T03:48:56Z master 5a81562dfe) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-04-01T04:58:01Z master 39beb26a27) +JIT [x86_64-linux]
Calculating -------------------------------------
                                 before --jit           after --jit
Optcarrot Lan_Master.nes    75.06409603894944     76.06422026555558 fps
                            75.12025067279242     78.48161731616810
                            77.42020273492177     79.78958240950033
                            79.07253675128945     79.88645902325614
                            79.99179109732327     80.33743931749331
                            80.07633091008627     80.53790081529166
                            80.15450942667547     80.99048270668010
                            80.48372803283709     81.70497146081003
                            80.57410149187352     82.79494539467382
                            81.80449157081202     82.85797792223954
                            82.24629397834902     83.00603891515506
                            82.63708148686703     83.23221006969828

$ benchmark-driver -v --rbenv 'before;before --jit;after --jit' benchmark/mjit_leave.yml --repeat-count=4
before: ruby 2.8.0dev (2020-04-01T03:48:56Z master 5a81562dfe) [x86_64-linux]
before --jit: ruby 2.8.0dev (2020-04-01T03:48:56Z master 5a81562dfe) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-04-01T04:58:01Z master 39beb26a27) +JIT [x86_64-linux]
Calculating -------------------------------------
                         before  before --jit  after --jit
          mjit_leave   106.656M       82.786M      91.635M i/s -    200.000M times in 1.875183s 2.415881s 2.182569s

Comparison:
                       mjit_leave
              before: 106656239.9 i/s
         after --jit:  91635143.7 i/s - 1.16x  slower
        before --jit:  82785537.2 i/s - 1.29x  slower
2020-03-31 22:10:16 -07:00
Takashi Kokubun dad110d068
Remove an unused pragma
It originally had a string literal, but it no longer has one.
2020-03-30 23:30:08 -07:00
Takashi Kokubun b736ea63bd
Optimize exivar access on JIT-ed getivar
JIT support of dd723771c1.

$ benchmark-driver -v --rbenv 'before;before --jit;after --jit' benchmark/mjit_exivar.yml --repeat-count=4
before: ruby 2.8.0dev (2020-03-30T12:32:26Z master e5db3da9d3) [x86_64-linux]
before --jit: ruby 2.8.0dev (2020-03-30T12:32:26Z master e5db3da9d3) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-03-31T05:57:24Z mjit-exivar 128625baec) +JIT [x86_64-linux]
Calculating -------------------------------------
                         before  before --jit  after --jit
         mjit_exivar    57.944M       53.579M      54.471M i/s -    200.000M times in 3.451588s 3.732772s 3.671687s

Comparison:
                      mjit_exivar
              before:  57944345.1 i/s
         after --jit:  54470876.7 i/s - 1.06x  slower
        before --jit:  53579483.4 i/s - 1.08x  slower
2020-03-30 23:16:35 -07:00
Jeremy Evans d2c41b1bff Reduce allocations for keyword argument hashes
Previously, passing a keyword splat to a method always allocated
a hash on the caller side, and accepting arbitrary keywords in
a method allocated a separate hash on the callee side.  Passing
explicit keywords to a method that accepted a keyword splat
did not allocate a hash on the caller side, but resulted in two
hashes allocated on the callee side.

This commit makes passing a single keyword splat to a method not
allocate a hash on the caller side.  Passing multiple keyword
splats or a mix of explicit keywords and a keyword splat still
generates a hash on the caller side.  On the callee side,
if arbitrary keywords are not accepted, it does not allocate a
hash.  If arbitrary keywords are accepted, it will allocate a
hash, but this commit uses a callinfo flag to indicate whether
the caller already allocated a hash, and if so, the callee can
use the passed hash without duplicating it.  So this commit
should make it so that a maximum of a single hash is allocated
during method calls.

To set the callinfo flag appropriately, method call argument
compilation checks if only a single keyword splat is given.
If only one keyword splat is given, the VM_CALL_KW_SPLAT_MUT
callinfo flag is not set, since in that case the keyword
splat is passed directly and not mutable.  If more than one
splat is used, a new hash needs to be generated on the caller
side, and in that case the callinfo flag is set, indicating
the keyword splat is mutable by the callee.

In compile_hash, used for both hash and keyword argument
compilation, if compiling keyword arguments and only a
single keyword splat is used, pass the argument directly.

On the caller side, in vm_args.c, the callinfo flag needs to
be recognized and handled.  Because the keyword splat
argument may not be a hash, it needs to be converted to a
hash first if not.  Then, unless the callinfo flag is set,
the hash needs to be duplicated.  The temporary copy of the
callinfo flag, kw_flag, is updated if a hash was duplicated,
to prevent the need to duplicate it again.  If we are
converting to a hash or duplicating a hash, we need to update
the argument array, which can including duplicating the
positional splat array if one was passed.  CALLER_SETUP_ARG
and a couple other places needs to be modified to handle
similar issues for other types of calls.

This includes fairly comprehensive tests for different ways
keywords are handled internally, checking that you get equal
results but that keyword splats on the caller side result in
distinct objects for keyword rest parameters.

Included are benchmarks for keyword argument calls.
Brief results when compiled without optimization:

  def kw(a: 1) a end
  def kws(**kw) kw end
  h = {a: 1}

  kw(a: 1)       # about same
  kw(**h)        # 2.37x faster
  kws(a: 1)      # 1.30x faster
  kws(**h)       # 2.19x faster
  kw(a: 1, **h)  # 1.03x slower
  kw(**h, **h)   # about same
  kws(a: 1, **h) # 1.16x faster
  kws(**h, **h)  # 1.14x faster
2020-03-17 12:09:43 -07:00
S.H 290d608637
support builtin for Kernel#clone 2020-03-17 19:37:07 +09:00
Nobuyoshi Nakada 5e897227ff
Added more benchmarks for String 2020-02-29 15:42:24 +09:00
Nobuyoshi Nakada 05229cef45
Improve `String#slice!` performance
Instead of searching twice to extract and to delete, extract and
delete the found position at the first search.

This makes faster nearly twice, for regexps and strings.

|              |compare-ruby|built-ruby|
|:-------------|-----------:|---------:|
|regexp-short  |      2.143M|    3.918M|
|regexp-long   |    105.162k|  205.410k|
|string-short  |      3.789M|    7.964M|
|string-long   |      1.301M|    2.457M|
2020-01-31 17:12:05 +09:00
Kazuhiro NISHIYAMA c90fc55a1f Drop executable bit of *.{yml,h,mk.tmpl} 2020-01-22 16:04:38 +09:00
Lourens Naudé 40c57ad4a1 Let execution context local storage be an ID table 2020-01-11 14:40:36 +13:00
Lourens Naudé 592d7ceeeb Speeds up fallback to Hash#default_proc in rb_hash_aref by removing a method call 2020-01-08 18:09:52 +09:00
David Rodríguez f48655d04d Remove unneeded exec bits from some files
I noticed that some files in rubygems were executable, and I could think
of no reason why they should be.

In general, I think ruby files should never have the executable bit set
unless they include a shebang, so I run the following command over the
whole repo:

```bash
find . -name '*.rb' -type f -executable -exec bash -c 'grep -L "^#!" $1 || chmod -x $1' _ {} \;
```
2019-11-09 21:36:30 +09:00
Nobuyoshi Nakada 8390057d1e
Benchmark for [Feature #16155] 2019-10-22 22:49:48 +09:00
Dylan Thacker-Smith b970259044 Stop making a redundant hash copy in Hash#dup (#2489)
* Stop making a redundant hash copy in Hash#dup

It was making a copy of the hash without rehashing, then created an
extra copy of the hash to do the rehashing.  Since rehashing creates
a new copy already, this change just uses that rehashing to make
the copy.

[Bug #16121]

* Remove redundant Check_Type after to_hash

* Fix freeing and clearing destination hash in Hash#initialize_copy

The code was assuming the state of the destination hash based on the
source hash for clearing any existing table on it. If these don't match,
then that can cause the old table to be leaked. This can be seen by
compiling hash.c with `#define HASH_DEBUG 1` and running the following
script, which will crash from a debug assertion.

```ruby
h = 9.times.map { |i| [i, i] }.to_h
h.send(:initialize_copy, {})
```

* Remove dead code paths in rb_hash_initialize_copy

Given that `RHASH_ST_TABLE_P(h)` is defined as `(!RHASH_AR_TABLE_P(h))`
it shouldn't be possible for a hash to be neither of these, so there
is no need for the removed `else if` blocks.

* Share implementation between Hash#replace and Hash#initialize_copy

This also fixes key rehashing for small hashes backed by an array
table for Hash#replace.  This used to be done consistently in ruby
2.5.x, but stopped being done for small arrays in ruby 2.6.x.

This also bring optimization improvements that were done for
Hash#initialize_copy to Hash#replace.

* Add the Hash#dup benchmark
2019-10-21 17:29:21 +09:00
Dylan Thacker-Smith a1fda16b23 Optimize Array#flatten and flatten! for already flattened arrays (#2495)
* Optimize Array#flatten and flatten! for already flattened arrays
* Add benchmark for Array#flatten and Array#flatten!

[Bug #16119]
2019-09-28 01:24:24 +09:00
Takashi Kokubun 41e3c204fd
Reduce ISeq size of mjit_exec benchmark
to avoid unwanted memory pressure
2019-09-26 22:13:31 +09:00
Takashi Kokubun 4a4c502825
Add special runner to benchmark mjit_exec
I wanted to dynamically generate benchmark cases to test various number
of methods. Thus I added a dedicated runner of benchmark-driver.
2019-09-26 16:34:40 +09:00
Takashi Kokubun b414999370
Add a benchmark for JIT-ed code dispatch 2019-09-21 16:09:52 +09:00
卜部昌平 d74fa8e55c reuse cc->call
I noticed that in case of cache misshit, re-calculated cc->me can
be the same method entry than the pevious one.  That is an okay
situation but can't we partially reuse the cache, because cc->call
should still be valid then?

One thing that has to be special-cased is when the method entry
gets amended by some refinements.  That happens behind-the-scene
of call cache mechanism.  We have to check if cc->me->def points to
the previously saved one.

Calculating -------------------------------------
                          trunk        ours
vm2_poly_same_method     1.534M      2.025M i/s -      6.000M times in 3.910203s 2.962752s

Comparison:
             vm2_poly_same_method
                ours:   2025143.9 i/s
               trunk:   1534447.2 i/s - 1.32x  slower
2019-09-19 15:18:10 +09:00
Takashi Kokubun ff462bc6c3
Add a benchmark for opt_regexpmatch2
vm2_regexp was for opt_regexpmatch1.
2019-09-02 13:46:33 +09:00
Nobuyoshi Nakada 07e42e88d9
Close created files [ci skip] 2019-08-10 11:26:23 +09:00
Masato Ohba 6902824729
Fix typo in comment [ci skip]
s/Thtread/Thread
2019-08-10 09:35:28 +09:00
Yaw Boakye 6bb3618f28
n+1 to include n in range
Python's range stop right before n, which means factL never returns the correct result.

Closes: https://github.com/ruby/ruby/pull/1982
2019-08-05 09:04:32 +09:00
Yusuke Endoh 086ffe72c7 Revert "Revert "Add a specialized instruction for `.nil?` calls""
This reverts commit a0980f2446.

Retry for macOS Mojave.
2019-08-02 23:25:38 +09:00
Yusuke Endoh a0980f2446 Revert "Add a specialized instruction for `.nil?` calls"
This reverts commit 9faef3113f.

It seemed to cause a failure on macOS Mojave, though I'm unsure how.
https://rubyci.org/logs/rubyci.s3.amazonaws.com/osx1014/ruby-master/log/20190802T034503Z.fail.html.gz

This tentative revert is to check if the issue is actually caused by the
change or not.
2019-08-02 15:03:34 +09:00
Aaron Patterson 9faef3113f
Add a specialized instruction for `.nil?` calls
This commit adds a specialized instruction for called to `.nil?`.  It is
about 27% faster than master in the case where the object is nil or not
nil.  In the case where an object implements `nil?`, I think it may be
slightly slower.  Here is a benchmark:

```ruby
require "benchmark/ips"

class Niller
  def nil?; true; end
end

not_nil = Object.new
xnil = nil
niller = Niller.new

Benchmark.ips do |x|
  x.report("nil?")    { xnil.nil? }
  x.report("not nil") { not_nil.nil? }
  x.report("niller")   { niller.nil? }
end
```

On Ruby master:

```
[aaron@TC ~/g/ruby (master)]$ ./ruby compil.rb
Warming up --------------------------------------
                nil?   429.195k i/100ms
             not nil   437.889k i/100ms
              niller   437.935k i/100ms
Calculating -------------------------------------
                nil?     20.166M (± 8.1%) i/s -    100.002M in   5.002794s
             not nil     20.046M (± 7.6%) i/s -     99.839M in   5.020086s
              niller     22.467M (± 6.1%) i/s -    112.111M in   5.013817s
[aaron@TC ~/g/ruby (master)]$ ./ruby compil.rb
Warming up --------------------------------------
                nil?   449.660k i/100ms
             not nil   433.836k i/100ms
              niller   443.073k i/100ms
Calculating -------------------------------------
                nil?     19.997M (± 8.8%) i/s -     99.375M in   5.020458s
             not nil     20.529M (± 7.0%) i/s -    102.385M in   5.020689s
              niller     21.796M (± 8.0%) i/s -    108.110M in   5.002300s
[aaron@TC ~/g/ruby (master)]$ ./ruby compil.rb
Warming up --------------------------------------
                nil?   402.119k i/100ms
             not nil   438.968k i/100ms
              niller   398.226k i/100ms
Calculating -------------------------------------
                nil?     20.050M (±12.2%) i/s -     98.519M in   5.008817s
             not nil     20.614M (± 8.0%) i/s -    102.280M in   5.004531s
              niller     22.223M (± 8.8%) i/s -    110.309M in   5.013106s

```

On this branch:

```
[aaron@TC ~/g/ruby (specialized-nilp)]$ ./ruby compil.rb
Warming up --------------------------------------
                nil?   468.371k i/100ms
             not nil   456.517k i/100ms
              niller   454.981k i/100ms
Calculating -------------------------------------
                nil?     27.849M (± 7.8%) i/s -    138.169M in   5.001730s
             not nil     26.417M (± 8.7%) i/s -    131.020M in   5.011674s
              niller     21.561M (± 7.5%) i/s -    107.376M in   5.018113s
[aaron@TC ~/g/ruby (specialized-nilp)]$ ./ruby compil.rb
Warming up --------------------------------------
                nil?   477.259k i/100ms
             not nil   428.712k i/100ms
              niller   446.109k i/100ms
Calculating -------------------------------------
                nil?     28.071M (± 7.3%) i/s -    139.837M in   5.016590s
             not nil     25.789M (±12.9%) i/s -    126.470M in   5.011144s
              niller     20.002M (±12.2%) i/s -     98.144M in   5.001737s
[aaron@TC ~/g/ruby (specialized-nilp)]$ ./ruby compil.rb
Warming up --------------------------------------
                nil?   467.676k i/100ms
             not nil   445.791k i/100ms
              niller   415.024k i/100ms
Calculating -------------------------------------
                nil?     26.907M (± 8.0%) i/s -    133.755M in   5.013915s
             not nil     25.319M (± 7.9%) i/s -    125.713M in   5.007758s
              niller     19.569M (±11.8%) i/s -     96.286M in   5.008533s
```

Co-Authored-By: Ashe Connor <kivikakk@github.com>
2019-07-31 16:21:25 -07:00
Takashi Kokubun 1392b821b9
Explain what's benchmark/lib/load.rb [ci skip]
I'm actually not using this, but ko1 is.
2019-07-20 15:33:55 +09:00
Samuel Williams 56fcf98849
Add note about setting `vm.max_map_count` for Linux. 2019-07-18 20:54:55 +12:00
Samuel Williams 9b28eefeb2
Add benchmark to help diagnose performance regression.
See https://bugs.ruby-lang.org/issues/16009 for more details.
2019-07-18 11:13:49 +12:00
Nobuyoshi Nakada ae599db22f
* remove trailing spaces. 2019-07-12 17:57:28 +09:00
Samuel Williams 012e954b47
Improved fiber benchmarks. Increase number of iterations. 2019-07-12 11:56:51 +12:00
Hiroshi SHIBATA e8a2521abe
Adjust memory_status.rb under the tool directory. 2019-07-02 21:39:22 +09:00
Jeremy Evans 11c311e36f Use realpath(3) instead of custom realpath implementation if available
This approach is simpler than the previous approach which tries to
emulate realpath(3).  It also performs much better on both Linux and
OpenBSD on the included benchmarks.

By using realpath(3), we can better integrate with system security
features such as OpenBSD's unveil(2) system call.

This does not use realpath(3) on Windows even if it exists, as the
approach for checking for absolute paths does not work for drive
letters.  This can be fixed without too much difficultly, though until
Windows defines realpath(3), there is no need to do so.

For File.realdirpath, where the last element of the path is not
required to exist, fallback to the previous approach, as realpath(3)
on most operating systems requires the whole path be valid (per POSIX),
and the operating systems where this isn't true either plan to conform
to POSIX or may change to conform to POSIX in the future.

glibc realpath(3) does not handle /path/to/file.rb/../other_file.rb
paths, returning ENOTDIR in that case.  Fallback to the previous code
if realpath(3) returns ENOTDIR.

glibc doesn't like realpath(3) usage for paths like /dev/fd/5,
returning ENOENT even though the path may appear to exist in the
filesystem.  If ENOENT is returned and the path exists, then fall
back to the default approach.
2019-07-01 11:46:30 -07:00
Samuel Williams 3fd83cb6fc Improve benchmarks and tests for threads. 2019-06-19 20:39:10 +12:00
Takashi Kokubun caa90202c9
Make sure to suppress .irbrc on benchmark
By the way, this is already improved by nobu:

```
$ benchmark-driver benchmark/irb_exec.yml --rbenv '2.6.3;2.7.0-preview1;before;after' -v
2.6.3: ruby 2.6.3p62 (2019-04-16 revision 67580) [x86_64-linux]
2.7.0-preview1: ruby 2.7.0preview1 (2019-05-31 trunk c55db6aa27) [x86_64-linux]
before: ruby 2.7.0dev (2019-06-10T21:13:14+09:00 master 973fd18f11) [x86_64-linux]
after: ruby 2.7.0dev (2019-06-10T21:18:56+09:00 master 976c689ad4) [x86_64-linux]
Calculating -------------------------------------
                          2.6.3  2.7.0-preview1      before       after
            irb_exec     11.868           5.872       6.297      10.278 i/s -      30.000 times in 2.527776s 5.108997s 4.764167s 2.918821s

Comparison:
                         irb_exec
               2.6.3:        11.9 i/s
               after:        10.3 i/s - 1.15x  slower
              before:         6.3 i/s - 1.88x  slower
      2.7.0-preview1:         5.9 i/s - 2.02x  slower

```
2019-06-10 22:04:52 +09:00
Takashi Kokubun 973fd18f11
Add a benchmark of irb boot time
```
$ benchmark-driver benchmark/irb_exec.yml --rbenv '2.6.3;2.7.0-preview1'
Calculating -------------------------------------
                          2.6.3  2.7.0-preview1
            irb_exec     11.844           5.171 i/s -      30.000 times in 2.532887s 5.801960s

Comparison:
                         irb_exec
               2.6.3:        11.8 i/s
      2.7.0-preview1:         5.2 i/s - 2.29x  slower
```
2019-06-10 21:13:14 +09:00
Takashi Kokubun 0a29dc87e6
Optimize CGI.escapeHTML by reducing buffer extension
and switch-case branches.

Buffer allocation optimization using `ALLOCA_N` would be the main
benefit of patch. It eliminates the O(N) buffer extensions.

It also reduces the number of branches using escape table like
https://mattn.kaoriya.net/software/lang/c/20160817011915.htm.

Closes: https://github.com/ruby/ruby/pull/2226

Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
Co-authored-by: Yasuhiro MATSUMOTO <mattn.jp@gmail.com>
2019-06-05 21:07:04 +09:00
Takashi Kokubun 71b14affc6
Revert "Optimize CGI.escapeHTML by reducing buffer extension"
This reverts commit 8d81e59aa7.

`ALLOCA_N` does not check stack overflow unlike ALLOCV. I'll fix it and
re-commit it again.
2019-06-05 11:00:54 +09:00
Takashi Kokubun 8d81e59aa7
Optimize CGI.escapeHTML by reducing buffer extension
and switch-case branches.

Buffer allocation optimization using `ALLOCA_N` would be the main
benefit of patch. It eliminates the O(N) buffer extensions.

It also reduces the number of branches using escape table like
https://mattn.kaoriya.net/software/lang/c/20160817011915.htm.

Closes: https://github.com/ruby/ruby/pull/2226

Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
Co-authored-by: Yasuhiro MATSUMOTO <mattn.jp@gmail.com>
2019-06-05 10:08:55 +09:00
Takashi Kokubun f630359d9b
Add a benchmark using IRB::Color
I heard actually this part would not be a bottleneck for rendering
because writing anything to terminal takes way longer time anyway, but I
thought this benchmark script might be useful for benchmarking Ruby
itself.
2019-06-01 20:07:50 +09:00
Lourens Naudé a47f598d77
Reduce ONIG_NREGION from 10 to 4: power of 2 and testing revealed most pattern matches are less than or equal to 4 results
Closes: https://github.com/ruby/ruby/pull/2135
2019-05-07 21:58:55 +09:00
Nobuyoshi Nakada 77440e949b
Improve performance of case-conversion methods 2019-05-03 23:59:18 +09:00
nobu e1eb54b99d string.c: improve splitting into chars
* string.c (rb_str_split_m): improve splitting into chars by an
  empty string, without a regexp.

    Comparison:
                           to_chars-1
              built-ruby:   1273527.6 i/s
            compare-ruby:    189423.3 i/s - 6.72x  slower

                          to_chars-10
              built-ruby:    120993.5 i/s
            compare-ruby:     37075.8 i/s - 3.26x  slower

                         to_chars-100
              built-ruby:     15646.4 i/s
            compare-ruby:      4012.1 i/s - 3.90x  slower

                        to_chars-1000
              built-ruby:      1295.1 i/s
            compare-ruby:       408.5 i/s - 3.17x  slower

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67582 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-04-17 05:34:46 +00:00
eregon f2d6338574 benchmark/app_aobench.rb: complete commented code to write the image to a file
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66900 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-01-21 12:31:29 +00:00
eregon 32e259f097 benchmark/app_aobench.rb: remove extra printf arguments
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66899 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-01-21 12:31:20 +00:00
eregon dfe6918604 benchmark/app_aobench.rb: move `srand(0)` at the top
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66898 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-01-21 12:31:10 +00:00
mame 8a294e0f20 benchmark/app_aobench.rb: add `srand(0)`
To prevent noise for benchmark result.  Just for the case.
[Bug #15552]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66893 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-01-21 07:29:00 +00:00
mame be3a0c7042 benchmark/app_aobench.rb: add a magic comment
To support the change of default encoding.
It had not worked correctly since 2.0 :-)
[Bug #15552]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66892 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-01-21 07:13:32 +00:00
nobu 72ad092960 Time.strptime benchmarks
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66743 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-01-07 02:05:21 +00:00
mrkn d41fb6ce0b benchmark/range_last.yml: remove needless prelude
[ci skip]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66740 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-01-06 15:46:21 +00:00
mrkn bf6584b5d0 benchmark/range_last.yml: add benchmark cases
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66733 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-01-06 00:46:35 +00:00