Граф коммитов

1090 Коммитов

Автор SHA1 Сообщение Дата
Maxime Chevalier-Boisvert c9feb72b65 Implement opt_mod as call to interpreter function (#29) 2021-10-20 18:19:35 -04:00
Maxime Chevalier-Boisvert e2c1d69331 Implement opt_eq by calling interpreter function (#28) 2021-10-20 18:19:35 -04:00
Maxime Chevalier-Boisvert e5f8b41786 Implement send with alias method (#23)
* Implement send with alias method

* Add alias_method tests
2021-10-20 18:19:34 -04:00
Alan Wu 36134f7d29 Implement calls to methods with simple optional params
* Implement calls to methods with simple optional params

* Remove unnecessary MJIT_STATIC

See comment for MJIT_STATIC. I added it not knowing whether it's
required because the function next to it has it. Don't use it and wait
for problems to come up instead.

* Better naming, some comments

* Count bailing on kw only iseqs

On railsbench:
```
opt_send_without_block exit reasons:
                  bmethod      59729 (27.7%)
         optimized_method      59137 (27.5%)
      iseq_complex_callee      41362 (19.2%)
             alias_method      33346 (15.5%)
      callsite_not_simple      19170 ( 8.9%)
       iseq_only_keywords       1300 ( 0.6%)
                 kw_splat       1299 ( 0.6%)
    cfunc_ruby_array_varg         18 ( 0.0%)
```
2021-10-20 18:19:34 -04:00
Alan Wu b626dd7211 YJIT: Fancier opt_getinlinecache
Make sure `opt_getinlinecache` is in a block all on its own, and
invalidate it from the interpreter when `opt_setinlinecache`.
It will recompile with a filled cache the second time around.
This lets YJIT runs well when the IC for constant is cold.
2021-10-20 18:19:33 -04:00
Alan Wu 5d834bcf9f YJIT: lazy polymorphic getinstancevariable
Lazily compile out a chain of checks for different known classes and
whether `self` embeds its ivars or not.

* Remove trailing whitespaces

* Get proper addresss in Capstone disassembly

* Lowercase address in Capstone disassembly

Capstone uses lowercase for jump targets in generated listings. Let's
match it.

* Use the same successor in getivar guard chains

Cuts down on duplication

* Address reviews

* Fix copypasta error

* Add a comment
2021-10-20 18:19:31 -04:00
Maxime Chevalier-Boisvert 9d8cc01b75 WIP JIT-to-JIT returns 2021-10-20 18:19:28 -04:00
Nobuyoshi Nakada 8d6dbecc80
Remove useless casts 2021-10-19 17:09:32 +09:00
Nobuyoshi Nakada ec021e469d
Get rid of type-punning cast 2021-10-19 17:08:25 +09:00
S.H dc9112cf10
Using NIL_P macro instead of `== Qnil` 2021-10-03 22:34:45 +09:00
Jeremy Evans 9b04909a85 Introduce rb_vm_call_with_refinements to DRY up a few calls 2021-10-01 08:12:46 -09:00
Jeremy Evans 1f5f8a187a
Make Array#min/max optimization respect refined methods
Pass in ec to vm_opt_newarray_{max,min}. Avoids having to
call GET_EC inside the functions, for better performance.

While here, add a test for Array#min/max being redefined to
test_optimization.rb.

Fixes [Bug #18180]
2021-09-30 15:18:14 -07:00
Nobuyoshi Nakada 70624ae43d
Extract hook macro for attributes 2021-09-19 16:27:47 +09:00
Nobuyoshi Nakada cd829bb078 Remove printf family from the mjit header
Linking printf family functions makes mjit objects to link
unnecessary code.
2021-09-11 08:41:32 +09:00
Jeremy Evans 2d98593bf5 Support tracing of attr_reader and attr_writer
In vm_call_method_each_type, check for c_call and c_return events before
dispatching to vm_call_ivar and vm_call_attrset.  With this approach, the
call cache will still dispatch directly to those functions, so this
change will only decrease performance for the first (uncached) call, and
even then, the performance decrease is very minimal.

This approach requires that we clear the call caches when tracing is
enabled or disabled.  The approach currently switches all vm_call_ivar
and vm_call_attrset call caches to vm_call_general any time tracing is
enabled or disabled. So it could theoretically result in a slowdown for
code that constantly enables or disables tracing.

This approach does not handle targeted tracepoints, but from my testing,
c_call and c_return events are not supported for targeted tracepoints,
so that shouldn't matter.

This includes a benchmark showing the performance decrease is minimal
if detectable at all.

Fixes [Bug #16383]
Fixes [Bug #10470]

Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
2021-08-29 07:23:39 -07:00
Nobuyoshi Nakada a0a8f2abf5 Get rid of type-punning pointer casts [Bug #18062] 2021-08-11 12:07:44 +09:00
S.H 378e8cdad6
Using RBOOL macro 2021-08-02 12:06:44 +09:00
Nobuyoshi Nakada 0700ee0e94
Refactor class variable cache functions
Extracted repeated code as update_classvariable_cache.  When cvc
table is not set in getclassvariable, an empty table was created
but it has no id and would cause [BUG], so made the code same as
setclassvariable.
2021-06-23 10:55:22 +09:00
eileencodes b91b3bc771 Add a cache for class variables
Redo of 34a2acdac788602c14bf05fb616215187badd504 and
931138b00696419945dc03e10f033b1f53cd50f3 which were reverted.

GitHub PR #4340.

This change implements a cache for class variables. Previously there was
no cache for cvars. Cvar access is slow due to needing to travel all the
way up th ancestor tree before returning the cvar value. The deeper the
ancestor tree the slower cvar access will be.

The benefits of the cache are more visible with a higher number of
included modules due to the way Ruby looks up class variables. The
benchmark here includes 26 modules and shows with the cache, this branch
is 6.5x faster when accessing class variables.

```
compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105c) [x86_64-darwin19]
built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be009) [x86_64-darwin19]

|         |compare-ruby|built-ruby|
|:--------|-----------:|---------:|
|vm_cvar  |      5.681M|   36.980M|
|         |           -|     6.51x|
```

Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails
application. ActiveRecord::Base.logger has 71 ancestors. The more
ancestors a tree has, the more clear the speed increase. IE if Base had
only one ancestor we'd see no improvement. This benchmark is run on a
vanilla Rails application.

Benchmark code:

```ruby
require "benchmark/ips"
require_relative "config/environment"

Benchmark.ips do |x|
  x.report "logger" do
    ActiveRecord::Base.logger
  end
end
```

Ruby 3.0 master / Rails 6.1:

```
Warming up --------------------------------------
              logger   155.251k i/100ms
Calculating -------------------------------------
```

Ruby 3.0 with cvar cache /  Rails 6.1:

```
Warming up --------------------------------------
              logger     1.546M i/100ms
Calculating -------------------------------------
              logger     14.857M (± 4.8%) i/s -     74.198M in   5.006202s
```

Lastly we ran a benchmark to demonstate the difference between master
and our cache when the number of modules increases. This benchmark
measures 1 ancestor, 30 ancestors, and 100 ancestors.

Ruby 3.0 master:

```
Warming up --------------------------------------
            1 module     1.231M i/100ms
          30 modules   432.020k i/100ms
         100 modules   145.399k i/100ms
Calculating -------------------------------------
            1 module     12.210M (± 2.1%) i/s -     61.553M in   5.043400s
          30 modules      4.354M (± 2.7%) i/s -     22.033M in   5.063839s
         100 modules      1.434M (± 2.9%) i/s -      7.270M in   5.072531s

Comparison:
            1 module: 12209958.3 i/s
          30 modules:  4354217.8 i/s - 2.80x  (± 0.00) slower
         100 modules:  1434447.3 i/s - 8.51x  (± 0.00) slower
```

Ruby 3.0 with cvar cache:

```
Warming up --------------------------------------
            1 module     1.641M i/100ms
          30 modules     1.655M i/100ms
         100 modules     1.620M i/100ms
Calculating -------------------------------------
            1 module     16.279M (± 3.8%) i/s -     82.038M in   5.046923s
          30 modules     15.891M (± 3.9%) i/s -     79.459M in   5.007958s
         100 modules     16.087M (± 3.6%) i/s -     81.005M in   5.041931s

Comparison:
            1 module: 16279458.0 i/s
         100 modules: 16087484.6 i/s - same-ish: difference falls within error
          30 modules: 15891406.2 i/s - same-ish: difference falls within error
```

Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
2021-06-18 10:02:44 -07:00
Aaron Patterson 07f055bb13
Revert "Filling cache values on cvar write"
This reverts commit 08de37f9fa.
This reverts commit e8ae922b62.
2021-05-11 13:31:00 -07:00
eileencodes 08de37f9fa Filling cache values on cvar write
Instead of on read. Once it's in the inline cache we never have to make
one again. We want to eventually put the value into the cache, and the
best opportunity to do that is when you write the value.
2021-05-11 12:04:27 -07:00
eileencodes e8ae922b62 Add a cache for class variables
This change implements a cache for class variables. Previously there was
no cache for cvars. Cvar access is slow due to needing to travel all the
way up th ancestor tree before returning the cvar value. The deeper the
ancestor tree the slower cvar access will be.

The benefits of the cache are more visible with a higher number of
included modules due to the way Ruby looks up class variables. The
benchmark here includes 26 modules and shows with the cache, this branch
is 6.5x faster when accessing class variables.

```
compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105ca45) [x86_64-darwin19]
built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be0093ae) [x86_64-darwin19]

|         |compare-ruby|built-ruby|
|:--------|-----------:|---------:|
|vm_cvar  |      5.681M|   36.980M|
|         |           -|     6.51x|
```

Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails
application. ActiveRecord::Base.logger has 71 ancestors. The more
ancestors a tree has, the more clear the speed increase. IE if Base had
only one ancestor we'd see no improvement. This benchmark is run on a
vanilla Rails application.

Benchmark code:

```ruby
require "benchmark/ips"
require_relative "config/environment"

Benchmark.ips do |x|
  x.report "logger" do
    ActiveRecord::Base.logger
  end
end
```

Ruby 3.0 master / Rails 6.1:

```
Warming up --------------------------------------
              logger   155.251k i/100ms
Calculating -------------------------------------
```

Ruby 3.0 with cvar cache /  Rails 6.1:

```
Warming up --------------------------------------
              logger     1.546M i/100ms
Calculating -------------------------------------
              logger     14.857M (± 4.8%) i/s -     74.198M in   5.006202s
```

Lastly we ran a benchmark to demonstate the difference between master
and our cache when the number of modules increases. This benchmark
measures 1 ancestor, 30 ancestors, and 100 ancestors.

Ruby 3.0 master:

```
Warming up --------------------------------------
            1 module     1.231M i/100ms
          30 modules   432.020k i/100ms
         100 modules   145.399k i/100ms
Calculating -------------------------------------
            1 module     12.210M (± 2.1%) i/s -     61.553M in   5.043400s
          30 modules      4.354M (± 2.7%) i/s -     22.033M in   5.063839s
         100 modules      1.434M (± 2.9%) i/s -      7.270M in   5.072531s

Comparison:
            1 module: 12209958.3 i/s
          30 modules:  4354217.8 i/s - 2.80x  (± 0.00) slower
         100 modules:  1434447.3 i/s - 8.51x  (± 0.00) slower
```

Ruby 3.0 with cvar cache:

```
Warming up --------------------------------------
            1 module     1.641M i/100ms
          30 modules     1.655M i/100ms
         100 modules     1.620M i/100ms
Calculating -------------------------------------
            1 module     16.279M (± 3.8%) i/s -     82.038M in   5.046923s
          30 modules     15.891M (± 3.9%) i/s -     79.459M in   5.007958s
         100 modules     16.087M (± 3.6%) i/s -     81.005M in   5.041931s

Comparison:
            1 module: 16279458.0 i/s
         100 modules: 16087484.6 i/s - same-ish: difference falls within error
          30 modules: 15891406.2 i/s - same-ish: difference falls within error
```

Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
2021-05-11 12:04:27 -07:00
Nobuyoshi Nakada 0bbab1e515
Protoized old pre-ANSI K&R style declarations and definitions 2021-05-07 00:04:36 +09:00
Alan Wu dee58d7ae7
Add back checks for empty kw splat with tests (#4405)
This reverts commit a224ce8150.
Turns out the checks are needed to handle splatting an array with an
empty ruby2 keywords hash.
2021-04-23 22:17:20 -04:00
Jeremy Evans 1f2b5c6dfe Remove part of comment that is no longer accurate
In Ruby 2.7, empty keyword splats could be added back for backwards
compatibility.  However, that stopped in Ruby 3.0.
2021-04-23 16:44:01 -07:00
Alan Wu a224ce8150
Remove unnecessary checks for empty kw splat
These two checks are surrounded by an if that ensures the
call site is not a kw splat call site.
2021-04-23 19:37:03 -04:00
Koichi Sasada ecfa8dcdba fix return from orphan Proc in lambda
A "return" statement in a Proc in a lambda like:
  `lambda{ proc{ return }.call }`
should return outer lambda block. However, the inner Proc can become
orphan Proc from the lambda block. This "return" escape outer-scope
like method, but this behavior was decieded as a bug.
[Bug #17105]

This patch raises LocalJumpError by checking the proc is orphan or
not from lambda blocks before escaping by "return".

Most of tests are written by Jeremy Evans
https://github.com/ruby/ruby/pull/4223
2021-04-02 09:25:33 +09:00
Aaron Patterson 04a814931a return bool instead of VALUE 2021-03-17 10:55:37 -07:00
Aaron Patterson ea817c60fc Refactor vm_defined to return a boolean
We just need this function to return whether or not the thing we're
looking for is defined.  If it's defined, return something true,
otherwise false.
2021-03-17 10:55:37 -07:00
Aaron Patterson c3971bea33 Stop calling `rb_iseq_defined_string` in vm_defined
We already have access to the string from the iseqs, so we can stop
calling this function.
2021-03-17 10:55:37 -07:00
John Hawthorn 9d0ae387c8 Remove DEFINED_IVAR2 from enum
This version of defined? doesn't seem to be possible to emit anymore.
2021-03-10 09:38:20 -08:00
Koichi Sasada 813fe4c256 opt_equality_by_mid for rb_equal_opt
This patch improves the performance of sequential and parallel
execution of rb_equal() (and rb_eql()).
[Bug #17497]

rb_equal_opt (and rb_eql_opt) does not have own cd and it waste
a time to initialize cd. This patch introduces opt_equality_by_mid()
to check equality without cd.

Furthermore, current master uses "static" cd on rb_equal_opt
(and rb_eql_opt) and it hurts CPU caches on multi-thread execution.
Now they are gone so there are no bottleneck on parallel execution.
2021-02-13 11:51:33 +09:00
Koichi Sasada 1ecda21366 global call-cache cache table for rb_funcall*
rb_funcall* (rb_funcall(), rb_funcallv(), ...) functions invokes
Ruby's method with given receiver. Ruby 2.7 introduced inline method
cache with static memory area. However, Ruby 3.0 reimplemented the
method cache data structures and the inline cache was removed.

Without inline cache, rb_funcall* searched methods everytime.
Most of cases per-Class Method Cache (pCMC) will be helped but
pCMC requires VM-wide locking and it hurts performance on
multi-Ractor execution, especially all Ractors calls methods
with rb_funcall*.

This patch introduced Global Call-Cache Cache Table (gccct) for
rb_funcall*. Call-Cache was introduced from Ruby 3.0 to manage
method cache entry atomically and gccct enables method-caching
without VM-wide locking. This table solves the performance issue
on multi-ractor execution.
[Bug #17497]

Ruby-level method invocation does not use gccct because it has
inline-method-cache and the table size is limited. Basically
rb_funcall* is not used frequently, so 1023 entries can be enough.
We will revisit the table size if it is not enough.
2021-01-29 16:22:12 +09:00
Nobuyoshi Nakada 2bd8098c6d
Suppress the warning for the invalid method_explorer case 2021-01-17 18:07:33 +09:00
Nobuyoshi Nakada 85b5d4c8bf Revert "[Bug #11213] let defined?(super) call respond_to_missing?"
This reverts commit fac2498e02 for
now, due to [Bug #17509], the breakage in the case `super` is
called in `respond_to?`.
2021-01-13 18:11:46 +09:00
Koichi Sasada 55e52c19e7 simplify assertion
searched_cme is used only this line so the variable is not needed.
2021-01-07 23:44:25 +09:00
Takashi Kokubun 7a3322a0fd
Fix broken JIT of getinlinecache
e7fc353f04 reverted vm_ic_hit_p's signature change made in 53babf35ef,
which broke JIT compilation of getinlinecache.

To make sure it doesn't happen again, I separated vm_inlined_ic_hit_p to
make the intention clear.
2021-01-04 13:09:08 -08:00
Koichi Sasada e7fc353f04 enable constant cache on ractors
constant cache `IC` is accessed by non-atomic manner and there are
thread-safety issues, so Ruby 3.0 disables to use const cache on
non-main ractors.

This patch enables it by introducing `imemo_constcache` and allocates
it by every re-fill of const cache like `imemo_callcache`.
[Bug #17510]

Now `IC` only has one entry `IC::entry` and it points to
`iseq_inline_constant_cache_entry`, managed by T_IMEMO object.

`IC` is atomic data structure so `rb_mjit_before_vm_ic_update()` and
`rb_mjit_after_vm_ic_update()` is not needed.
2021-01-05 02:27:58 +09:00
Nobuyoshi Nakada 292230cbf9 Fixed leaked global symbols 2020-12-26 09:39:53 +09:00
Koichi Sasada 02d9524cda separate rb_ractor_pub from rb_ractor_t
separate some fields from rb_ractor_t to rb_ractor_pub and put it
at the beggining of rb_ractor_t and declare it in vm_core.h so
vm_core.h can access rb_ractor_pub fields.

Now rb_ec_ractor_hooks() is a complete inline function and no
MJIT related issue.
2020-12-22 00:03:00 +09:00
Koichi Sasada a2950369bd TracePoint.new(&block) should be ractor-local
TracePoint should be ractor-local because the Proc can violate the
Ractor-safe.
2020-12-22 00:03:00 +09:00
Koichi Sasada dca6752fec Introduce Ractor::IsolationError
Ractor has several restrictions to keep each ractor being isolated
and some operation such as `CONST="foo"` in non-main ractor raises
an exception. This kind of operation raises an error but there is
confusion (some code raises RuntimeError and some code raises
NameError).

To make clear we introduce Ractor::IsolationError which is raised
when the isolation between ractors is violated.
2020-12-21 22:29:05 +09:00
Nobuyoshi Nakada 8148f88b92
ALWAYS_INLINE implies inline 2020-12-19 18:04:47 +09:00
Takashi Kokubun 52b1716c78
Fix vm_search_invokeblock
call_ needs to be vm_invokeblock_i, and flags is also not empty.
2020-12-19 00:27:44 -08:00
Takashi Kokubun 8ec8f37566
discourage inlining for vm_sendish()
reversing 9213771817 only for JIT, because it made JIT slower.

$ benchmark-driver -v --rbenv 'before;after;before --jit;after --jit' --repeat-count=36 --alternate --output=all benchmark.yml
before: ruby 3.0.0dev (2020-12-19T07:38:17Z master a139318538) [x86_64-linux]
after: ruby 3.0.0dev (2020-12-19T07:52:01Z master ce9faaeff5) [x86_64-linux]
last_commit=discourage inlining for vm_sendish()
before --jit: ruby 3.0.0dev (2020-12-19T07:38:17Z master a139318538) +JIT [x86_64-linux]
after --jit: ruby 3.0.0dev (2020-12-19T07:52:01Z master ce9faaeff5) +JIT [x86_64-linux]
last_commit=discourage inlining for vm_sendish()
Calculating -------------------------------------
                                       before                 after          before --jit           after --jit
Optcarrot Lan_Master.nes    42.83365858987760     42.68912456143848     76.50136803552716     65.74704713379785 fps
                            42.87724738609940     42.89045158177300     79.72624911659534     81.26221749201044
                            43.34963955708526     42.95431841174180     80.18085951039328     82.86458983313545
                            43.56786038452823     43.57563008888242     80.45933051716041     83.09150550702445
                            43.83219269706004     43.60748924115331     80.67164125046142     83.39458202043882
                            43.99035062888973     43.62050459554573     80.93204435712701     83.56303651352751
                            44.25176047881120     44.04822899344536     81.15051082548314     83.58166141398522
                            44.41978060794512     44.06521657912991     81.35651907376140     83.80036752456826
                            44.46864790591856     44.09325484326153     81.53456531520031     83.87502933718609
                            45.54712020644544     44.70693952869038     81.97738413452767     83.95818356402224
                            45.84292299382878     44.77704345873913     82.35118338199700     83.95966387450966
                            45.89411137280815     45.41425773286726     83.01052538434648     84.12812994632024
                            45.93130099197283     46.16884439916935     83.50833510120576     84.26276094927231
                            46.13648038236674     46.66645417860622     84.88757531920830     85.41732546800056
                            46.74873798919658     46.71790568883760     84.90953097036886     85.56340808970482
                            47.11273577214855     46.74581938882115     84.93196765297411     85.57603396455576
                            47.17870777128640     46.82414166607185     84.97178445888456     86.63510466280221
                            47.19338055580042     46.83645774240446     85.43536447262163     86.74129103462393
                            47.25761413477774     46.86834469505590     85.59822430471097     86.85376073363715
                            47.53327847102834     46.90228589364909     85.76446609620548     87.26108400015282
                            47.64308771617673     47.02814519551055     85.79904863600991     87.72293541243303
                            47.80286861846863     47.44672838168050     85.88640862064263     87.86803587836525
                            47.86455937950740     47.65301489003541     85.88750199172448     88.16881051171814
                            47.90065455321760     47.73425082354376     85.94295700508701     88.71267004066843
                            47.90727961241468     47.86377917424705     85.94674546805844     88.77726627283683
                            47.93243954623904     47.88720812998766     86.51872778134982     88.78993962536994
                            47.95062952008558     47.88774830879015     86.63116771614249     88.88085054889298
                            47.95097849989396     47.89825669442417     86.77387990931732     89.72021826461126
                            48.04730571166697     47.89981045730949     86.95084011077047     89.75804193954582
                            48.08042611622322     48.03246661737583     87.87239147980547     90.05949240088842
                            48.08999523258601     48.15253490344558     88.31289344498016     90.36439442190294
                            48.25670456430854     48.26904755214532     88.33999433286937     90.54253266759406
                            48.25947200597002     48.41894159956091     88.35502296938638     90.72591894564106
                            48.30826210577268     48.43125201523194     88.58311746582939     90.77173035874087
                            48.31514124187375     48.53932287546499     88.89099681179805     91.07747476133886
                            48.44349281318267     48.58969411593706     89.34043973691581     91.08545627378257
2020-12-19 00:03:46 -08:00
Koichi Sasada 9213771817 encourage inlining for vm_sendish()
Some tunings.
* add `inline` for vm_sendish()
* pass enum instead of func ptr to vm_sendish()
* reorder initial order of `calling` struct.
* add ALWAYS_INLINE for vm_search_method_fastpath()
* call vm_search_method_fastpath() from vm_sendish()
2020-12-17 16:53:41 +09:00
Takashi Kokubun 53babf35ef
Inline getconstant on JIT (#3906)
* Inline getconstant on JIT

* Support USE_MJIT=0
2020-12-16 06:24:07 -08:00
Koichi Sasada 98dca85573 tuning vm_setivar_slowpath() more.
specify inline/noinline code for is_attr condition.
2020-12-16 14:20:20 +09:00
Koichi Sasada 1e11c12a06 remove unused function 2020-12-16 13:30:34 +09:00
Koichi Sasada 5499651ee7 tuning ivar set
* make rb_init_iv_list() simple
* introduce vm_setivar_slowpath() for cache miss cases

../clean/miniruby is 647ee6f091.

Calculating -------------------------------------
                      ./miniruby  ../clean/miniruby  ../ruby_2_7/miniruby
         vm_ivar_init     7.388M             6.814M                5.771M i/s -     30.000M times in 4.060420s 4.402534s 5.198781s
vm_ivar_init_subclass     2.158M             2.147M                1.974M i/s -      3.000M times in 1.390328s 1.397587s 1.519951s
          vm_ivar_set   128.607M            97.931M              140.668M i/s -     30.000M times in 0.233269s 0.306338s 0.213268s
              vm_ivar   144.315M           151.722M              117.734M i/s -     30.000M times in 0.207879s 0.197730s 0.254811s

Comparison:
                      vm_ivar_init
           ./miniruby:   7388398.8 i/s
    ../clean/miniruby:   6814257.1 i/s - 1.08x  slower
 ../ruby_2_7/miniruby:   5770583.9 i/s - 1.28x  slower

             vm_ivar_init_subclass
           ./miniruby:   2157763.6 i/s
    ../clean/miniruby:   2146557.0 i/s - 1.01x  slower
 ../ruby_2_7/miniruby:   1973747.9 i/s - 1.09x  slower

                       vm_ivar_set
 ../ruby_2_7/miniruby: 140668063.8 i/s
           ./miniruby: 128606912.1 i/s - 1.09x  slower
    ../clean/miniruby:  97931027.8 i/s - 1.44x  slower

                           vm_ivar
    ../clean/miniruby: 151722121.9 i/s
           ./miniruby: 144314526.5 i/s - 1.05x  slower
 ../ruby_2_7/miniruby: 117734305.5 i/s - 1.29x  slower
2020-12-16 13:06:13 +09:00
Koichi Sasada 171f0431e7 fix typo 2020-12-16 10:36:23 +09:00
Koichi Sasada 72a73691bd add several debug counters
add cc_found_in_ccs (renamed from cc_found_ccs), cc_not_found_in_ccs,
call0_public, call0_other debug counters to measure more details.

also it contains several modification.
2020-12-15 13:29:30 +09:00
Koichi Sasada aa6287cd26 fix inline method cache sync bug
`cd` is passed to method call functions to method invocation
functions, but `cd` can be manipulated by other ractors simultaneously
so it contains thread-safety issue.

To solve this issue, this patch stores `ci` and found `cc` to `calling`
and stops to pass `cd`.
2020-12-15 13:29:30 +09:00
Koichi Sasada 7060d6b721 fix condition and add another debug counter
mc_inline_miss_same_def is added to check same method or not.
Also the mc_inline_miss_same_cc calculation was fixed.
2020-12-14 18:38:40 +09:00
Koichi Sasada da3be76cb0 add debug counters to survey the IMC miss 2020-12-14 17:56:34 +09:00
Koichi Sasada fa63052be1 create ccs with 0 capa
ccs is created with default 4 capa, but it is too much, so start
with 0 and extend to 1, 2, 4, 8, ...
2020-12-14 11:57:46 +09:00
Koichi Sasada d741c77b5f fix ivar with shareable objects issue
Instance variables of sharable objects are accessible only from
main ractor, so we need to check it correctly.
2020-12-12 06:19:18 +09:00
Jeremy Evans 01b7d5acc7 Remove the uninitialized instance variable verbose mode warning
This speeds up all instance variable access, even when not in
verbose mode.  Uninitialized instance variable warnings were
rarely helpful, and resulted in slower code if you wanted to
avoid warnings when run in verbose mode.

Implements [Feature #17055]
2020-12-10 10:16:05 -08:00
Koichi Sasada 182fb73c40 rb_ext_ractor_safe() to declare ractor-safe ext
C extensions can violate the ractor-safety, so only ractor-safe
C extensions (C methods) can run on non-main ractors.
rb_ext_ractor_safe(true) declares that the successive
defined methods are ractor-safe. Otherwiwze, defined methods
checked they are invoked in main ractor and raise an error
if invoked at non-main ractors.

[Feature #17307]
2020-12-01 15:44:18 +09:00
Takashi Kokubun 8ce1711c25
Revert "Set VM_FRAME_FLAG_FINISH at once on MJIT"
This reverts commit 4d2c8edca6.

Unfortunately this seems to cause several issues:
https://github.com/ruby/ruby/runs/1462188376?check_suite_focus=true
http://ci.rvm.jp/results/trunk-mjit-wait@phosphorus-docker/3272802
2020-11-26 22:41:15 -08:00
Takashi Kokubun 4d2c8edca6
Set VM_FRAME_FLAG_FINISH at once on MJIT
Performance is probably improved?

$ benchmark-driver -v --rbenv 'before --jit;after --jit' --repeat-count=12 --alternate --output=all benchmark.yml
before --jit: ruby 3.0.0dev (2020-11-27T04:37:47Z master 69e77e81dc) +JIT [x86_64-linux]
after --jit: ruby 3.0.0dev (2020-11-27T05:28:19Z master df6b05c6dd) +JIT [x86_64-linux]
last_commit=Set VM_FRAME_FLAG_FINISH at once
Calculating -------------------------------------
                                 before --jit           after --jit
Optcarrot Lan_Master.nes    80.89292998533379     82.19497327502751 fps
                            80.93130641142331     85.13943315260148
                            81.06214830270119     87.43757879797808
                            82.29172808453910     87.89942441487113
                            84.61206450455929     87.91309779491075
                            85.44545883567997     87.98026086648694
                            86.02923132404449     88.03081060383973
                            86.07411817365879     88.14650206137341
                            86.34348799602836     88.32791633649961
                            87.90257338977324     88.57599644892220
                            88.58006509876580     88.67426384743277
                            89.26611118140011     88.81669430874207

This should have no bad impact on VM because this function is ALWAYS_INLINE.
2020-11-26 21:32:14 -08:00
Alan Wu e0944bde91 Prefer rb_module_new() over rb_define_module_id()
rb_define_module_id() doesn't do anything with its parameter so
it's a bit confusing.
2020-11-25 17:05:06 -05:00
Nobuyoshi Nakada fac2498e02 [Bug #11213] let defined?(super) call respond_to_missing? 2020-11-20 16:04:45 +09:00
Alan Wu 68ffc8db08 Set allocator on class creation
Allocating an instance of a class uses the allocator for the class. When
the class has no allocator set, Ruby looks for it in the super class
(see rb_get_alloc_func()).

It's uncommon for classes created from Ruby code to ever have an
allocator set, so it's common during the allocation process to search
all the way to BasicObject from the class with which the allocation is
being performed. This makes creating instances of classes that have
long ancestry chains more expensive than creating instances of classes
have that shorter ancestry chains.

Setting the allocator at class creation time removes the need to perform
a search for the alloctor during allocation.

This is a breaking change for C-extensions that assume that classes
created from Ruby code have no allocator set. Libraries that setup a
class hierarchy in Ruby code and then set the allocator on some parent
class, for example, can experience breakage. This seems like an unusual
use case and hopefully it is rare or non-existent in practice.

Rails has many classes that have upwards of 60 elements in the ancestry
chain and benchmark shows a significant improvement for allocating with
a class that includes 64 modules.

```
pre: ruby 3.0.0dev (2020-11-12T14:39:27Z master 6325866421)
post: ruby 3.0.0dev (2020-11-12T20:15:30Z cut-allocator-lookup)

Comparison:
                  allocate_8_deep
                post:  10336985.6 i/s
                 pre:   8691873.1 i/s - 1.19x  slower

                 allocate_32_deep
                post:  10423181.2 i/s
                 pre:   6264879.1 i/s - 1.66x  slower

                 allocate_64_deep
                post:  10541851.2 i/s
                 pre:   4936321.5 i/s - 2.14x  slower

                allocate_128_deep
                post:  10451505.0 i/s
                 pre:   3031313.5 i/s - 3.45x  slower
```
2020-11-16 17:41:40 -05:00
Jeremy Evans ce9beb9d20 Improve error message when subclassing non-Class
Fixes [Bug #14726]
2020-11-13 07:06:13 -08:00
Aaron Patterson 4219cb7adb Add debug counter for ivar inline cache misses that could hit
This commit adds a debug counter for the case where the inline cache
*missed* but the ivar index table has an entry for that ivar.  This is a
case where a polymorphic cache could help
2020-11-09 14:05:41 -08:00
Aaron Patterson f259906eab Avoid slow path ivar setting
If the ivar index table exists, we can avoid the slowest path for
setting ivars.
2020-11-09 14:05:41 -08:00
Aaron Patterson 2324584d46 Update vm_insnhelper.c
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
2020-11-09 09:44:16 -08:00
Aaron Patterson 5582c5a232 Remove iv table size check
iv tables cannot shrink.  If the inline cache was ever set, then there
must be an entry for the instance variable in the iv table.  Just set
the iv list on the object to be equal to the iv index table size, then
set the iv.
2020-11-09 09:44:16 -08:00
Aaron Patterson eb229994e5 eagerly initialize ivar table when index is small enough
When the inline cache is written, the iv table will contain an entry for
the instance variable.  If we get an inline cache hit, then we know the
iv table must contain a value for the index written to the inline cache.

If the index in the inline cache is larger than the list on the object,
but *smaller* than the iv index table on the class, then we can just
eagerly allocate the iv list to be the same size as the iv index table.

This avoids duplicate work of checking frozen as well as looking up the
index for the particular instance variable name.
2020-11-09 09:44:16 -08:00
Koichi Sasada 5d97bdc2dc Ractor.make_shareable(a_proc)
Ractor.make_shareable() supports Proc object if
(1) a Proc only read outer local variables (no assignments)
(2) read outer local variables are shareable.

Read local variables are stored in a snapshot, so after making
shareable Proc, any assignments are not affeect like that:

```ruby
a = 1
pr = Ractor.make_shareable(Proc.new{p a})
pr.call #=> 1
a = 2
pr.call #=> 1 # `a = 2` doesn't affect
```

[Feature #17284]
2020-10-30 03:12:09 +09:00
Jeremy Evans ff2276ef8c Fix bootstrap-test error in previous commit 2020-10-25 15:56:10 -07:00
Koichi Sasada f6661f5085 sync RClass::ext::iv_index_tbl
iv_index_tbl manages instance variable indexes (ID -> index).
This data structure should be synchronized with other ractors
so introduce some VM locks.

This patch also introduced atomic ivar cache used by
set/getinlinecache instructions. To make updating ivar cache (IVC),
we changed iv_index_tbl data structure to manage (ID -> entry)
and an entry points serial and index. IVC points to this entry so
that cache update becomes atomically.
2020-10-17 08:18:04 +09:00
Alan Wu 0d17cdd0ac Abort on system stack overflow during GC
Buggy native extensions could have mark functions that cause stack
overflow. When a stack overflow happens during GC, Ruby used to recover
by raising an exception, which runs the interpreter. It's not safe to
run the interpreter during GC since the GC is in an inconsistent state.
This could cause object allocation during GC, for example.

Instead of running the interpreter and potentially causing a crash down
the line, fail fast and abort.
2020-10-16 10:24:12 -04:00
Koichi Sasada fad97f1f96 sync generic_ivtbl
generic_ivtbl is a process global table to maintain instance variables
for non T_OBJECT/T_CLASS/... objects. So we need to protect them
for multi-Ractor exection.

Hint: we can make them Ractor local for unshareable objects, but
      now it is premature optimization.
2020-10-14 16:36:55 +09:00
eileencodes 8dd9a23693 Make minor improvements to super
The changes here include:

* Using `FL_TEST_RAW` instead of `FL_TEST` in the first check in
`vm_search_super_method`. While the profile showed us spending a fair
amount of time here, the subsequent benchmarks didn't show much
improvement when adding this. Regardless, we know this does less work
than `FL_TEST` and we know that `FL_TEST_RAW` is safe due to the
previous check so it's a small but accurate optimization.
* Set `mid` only once. Both `vm_ci_new_runtime` and `vm_ci_mid` were
getting the `original_id` for the method entry. We can do this once
and pass the variable to the 2 callers that need it. This also doesn't
have a huge performance improvement but cleans up the code a bit.

Benchmark:

```
|                 |compare-ruby|built-ruby|
|:----------------|-----------:|---------:|
|vm_iclass_super  |      3.540M|    3.940M|
|                 |           -|     1.11x|
```

Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
2020-10-01 10:11:02 -07:00
Koichi Sasada caaa36b4e6 prohibi method call by defined_method in other racotrs
We can not call a non-isolated Proc in multiple ractors.
2020-09-25 20:37:38 +09:00
eileencodes 637d1cc0c0 Improve the performance of super
This PR improves the performance of `super` calls. While working on some
Rails optimizations jhawthorn discovered that `super` calls were slower
than expected.

The changes here do the following:

1) Adds a check for whether the call frame is not equal to the method
entry iseq. This avoids the `rb_obj_is_kind_of` check on the next line
which is quite slow. If the current call frame is equal to the method
entry we know we can't have an instance eval, etc.
2) Changes `FL_TEST` to `FL_TEST_RAW`. This is safe because we've
already done the check for `T_ICLASS` above.
3) Adds a benchmark for `T_ICLASS` super calls.
4) Note: makes a chage for `method_entry_cref` to use `const`.

On master the benchmarks showed that `super` is 1.76x slower. Our
changes improved the performance so that it is now only 1.36x slower.

Benchmark IPS:

```
Warming up --------------------------------------
               super   244.918k i/100ms
         method call   383.007k i/100ms
Calculating -------------------------------------
               super      2.280M (± 6.7%) i/s -     11.511M in   5.071758s
         method call      3.834M (± 4.9%) i/s -     19.150M in   5.008444s

Comparison:
         method call:  3833648.3 i/s
               super:  2279837.9 i/s - 1.68x  (± 0.00) slower
```

With changes:

```
Warming up --------------------------------------
               super   308.777k i/100ms
         method call   375.051k i/100ms
Calculating -------------------------------------
               super      2.951M (± 5.4%) i/s -     14.821M in   5.039592s
         method call      3.551M (± 4.9%) i/s -     18.002M in   5.081695s

Comparison:
         method call:  3551372.7 i/s
               super:  2950557.9 i/s - 1.20x  (± 0.00) slower
```

Ruby VM benchmarks also showed an improvement:

Existing `vm_super` benchmark`.

```
$ make benchmark ITEM=vm_super

|          |compare-ruby|built-ruby|
|:---------|-----------:|---------:|
|vm_super  |     21.555M|   37.819M|
|          |           -|     1.75x|
```

New `vm_iclass_super` benchmark:

```
$ make benchmark ITEM=vm_iclass_super

|                 |compare-ruby|built-ruby|
|:----------------|-----------:|---------:|
|vm_iclass_super  |      1.669M|    3.683M|
|                 |           -|     2.21x|
```

This is the benchmark script used for the benchmark-ips benchmarks:

```ruby
require "benchmark/ips"

class Foo
  def zuper; end
  def top; end

  last_method = "top"

  ("A".."M").each do |module_name|
    eval <<-EOM
    module #{module_name}
      def zuper; super; end
      def #{module_name.downcase}
        #{last_method}
      end
    end
    prepend #{module_name}
    EOM
    last_method = module_name.downcase
  end
end

foo = Foo.new

Benchmark.ips do |x|
  x.report "super" do
    foo.zuper
  end

  x.report "method call" do
    foo.m
  end

  x.compare!
end
```

Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
Co-authored-by: John Hawthorn <john@hawthorn.email>
2020-09-23 11:52:36 -07:00
Jeremy Evans 179384a668 Revert "Prevent SystemStackError when calling super in module with activated refinement"
This reverts commit eeef16e190.

This also reverts the spec change.

Preventing the SystemStackError would be nice, but there is valid
code that the fix breaks, and it is probably more common than cases
that cause the SystemStackError.

Fixes [Bug #17182]
2020-09-22 12:04:48 -07:00
Benoit Daloze 9b535f3ff7 Interpolated strings are no longer frozen with frozen-string-literal: true
* Remove freezestring instruction since this was the only usage for it.
* [Feature #17104]
2020-09-15 21:32:35 +02:00
Koichi Sasada 79df14c04b Introduce Ractor mechanism for parallel execution
This commit introduces Ractor mechanism to run Ruby program in
parallel. See doc/ractor.md for more details about Ractor.
See ticket [Feature #17100] to see the implementation details
and discussions.

[Feature #17100]

This commit does not complete the implementation. You can find
many bugs on using Ractor. Also the specification will be changed
so that this feature is experimental. You will see a warning when
you make the first Ractor with `Ractor.new`.

I hope this feature can help programmers from thread-safety issues.
2020-09-03 21:11:06 +09:00
Alan Wu 4c3f0597de Remove the pc argument of vm_trace()
This makes the binary 272 bytes smaller on -O3 GCC 10.2.0.
2020-09-01 22:02:29 -04:00
Jeremy Evans c60aaed185
Fix Method#super_method for aliased methods
Previously, Method#super_method looked at the called_id to
determine the method id to use, but that isn't correct for
aliased methods, because the super target depends on the
original method id, not the called_id.

Additionally, aliases can reference methods defined in other
classes and modules, and super lookup needs to start in the
super of the defined class in such cases.

This adds tests for Method#super_method for both types of
aliases, one that uses VM_METHOD_TYPE_ALIAS and another that
does not.  Both check that the results for calling super
methods return the expected values.

To find the defined class for alias methods, add an rb_ prefix
to find_defined_class_by_owner in vm_insnhelper.c and make it
non-static, so that it can be called from method_super_method
in proc.c.

This bug was original discovered while researching [Bug #11189].

Fixes [Bug #17130]
2020-08-27 08:37:03 -07:00
Jeremy Evans eeef16e190 Prevent SystemStackError when calling super in module with activated refinement
Without this, if a refinement defines a method that calls super and
includes a module with a module that calls super and has a activated
refinement at the point super is called, the module method super call
will end up calling back into the refinement method, creating a loop.

Fixes [Bug #17007]
2020-07-27 08:18:11 -07:00
Nobuyoshi Nakada a82252df42
Fixed another typo 2020-07-10 12:48:47 +09:00
Nobuyoshi Nakada 234f8eee33
Fixed typos 2020-07-10 12:32:48 +09:00
卜部昌平 e4ee992099 vm_push_frame_debug_counter_inc: use branches
Ko1 doesn't like previous code.
2020-07-10 12:23:41 +09:00
卜部昌平 0e276dc458 vm_push_frame: move assignments around
Struct assignment using a compound literal is more readable than before,
to me at least.  It seems compilers reorder assignments anyways.
Neither speedup nor slowdown is observed on my machine.
2020-07-10 12:23:41 +09:00
卜部昌平 4b8170ce80 vm_push_frame: move assertions out of the function
These assertions are purely static.  Ned not be checked on-the-fly.
2020-07-10 12:23:41 +09:00
卜部昌平 1d93705d6a vm_push_frame: hoist out debug codes
Made it a bit readable.
2020-07-10 12:23:41 +09:00
卜部昌平 db7f3496dd nobody uses the return value of vm_push_frame
Surprised to see such a waste of time in this super duper hot path.
2020-07-10 12:23:41 +09:00
Koichi Sasada a0f12a0258
Use ID instead of GENTRY for gvars. (#3278)
Use ID instead of GENTRY for gvars.

Global variables are compiled into GENTRY (a pointer to struct
rb_global_entry). This patch replace this GENTRY to ID and
make the code simple.

We need to search GENTRY from ID every time (st_lookup), so
additional overhead will be introduced.
However, the performance of accessing global variables is not
important now a day and this simplicity helps Ractor development.
2020-07-03 16:56:44 +09:00
Nobuyoshi Nakada 52ef2477e4
Extracted METHOD_ENTRY_CACHEABLE macro 2020-06-30 19:12:02 +09:00
卜部昌平 1bf0d36171 vm_getivar: do not goto into a branch
I'm not necessarily against every goto in general, but jumping into a
branch is definitely a bad idea.  Better refactor.
2020-06-29 11:05:41 +09:00
Takashi Kokubun 7982dc1dfd
Decide JIT-ed insn based on cached cfunc
for opt_* insns.

opt_eq handles rb_obj_equal inside opt_eq, and all other cfunc is
handled by opt_send_without_block. Therefore we can't decide which insn
should be generated by checking whether it's cfunc cc or not.

```
$ benchmark-driver -v --rbenv 'before --jit;after --jit' benchmark/mjit_opt_cc_insns.yml --repeat-count=4
before --jit: ruby 2.8.0dev (2020-06-26T05:21:43Z master 9dbc2294a6) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-06-26T06:30:18Z master 75cece1b0b) +JIT [x86_64-linux]
last_commit=Decide JIT-ed insn based on cached cfunc
Calculating -------------------------------------
                     before --jit  after --jit
        mjit_nil?(1)      73.878M      74.021M i/s -     40.000M times in 0.541432s 0.540391s
         mjit_not(1)      72.635M      74.601M i/s -     40.000M times in 0.550702s 0.536187s
     mjit_eq(1, nil)       7.331M       7.445M i/s -      8.000M times in 1.091211s 1.074596s
     mjit_eq(nil, 1)      49.450M      64.711M i/s -      8.000M times in 0.161781s 0.123627s

Comparison:
                     mjit_nil?(1)
         after --jit:  74020528.4 i/s
        before --jit:  73878185.9 i/s - 1.00x  slower

                      mjit_not(1)
         after --jit:  74600882.0 i/s
        before --jit:  72634507.6 i/s - 1.03x  slower

                  mjit_eq(1, nil)
         after --jit:   7444657.4 i/s
        before --jit:   7331304.3 i/s - 1.02x  slower

                  mjit_eq(nil, 1)
         after --jit:  64710790.6 i/s
        before --jit:  49449507.4 i/s - 1.31x  slower
```
2020-06-25 23:33:08 -07:00
Takashi Kokubun d9f608b686
Verify builtin inline annotation with VM_CHECK_MODE (#3244)
* Verify builtin inline annotation with VM_CHECK_MODE

* Remove static to fix the link issue on MJIT
2020-06-21 10:27:04 -07:00
Takashi Kokubun 426db4cd90
Fix -Wmaybe-uninitialized at vm_invoke_block 2020-06-21 00:34:39 -07:00
Nobuyoshi Nakada ccb7a4b9f2
Replaced accessors of `Struct` with `invokebuiltin` 2020-06-17 08:18:46 +09:00
Nobuyoshi Nakada 318d52e820
Revert "Replaced accessors of `Struct` with `invokebuiltin`"
This reverts commit 19cabe8b09,
which didn't support tool/lib/iseq_loader_checker.rb.
2020-06-16 18:44:58 +09:00
Nobuyoshi Nakada 19cabe8b09
Replaced accessors of `Struct` with `invokebuiltin` 2020-06-16 18:24:02 +09:00
卜部昌平 5648976c3c vm_call_method: avoid marking on-stack object
This callcache is on stack, must not be GCed.  However its contents are
copied from other materials, which can be an ordinal object.  Should
set a flag to make sure it is properly skipped by the GC.
2020-06-10 10:22:39 +09:00
卜部昌平 1016cff4ff rb_eql_opt,rb_equal_opt: purge stale cc
When on USE_EMBED_CI, cd is stored statically.  Previous use could cache
stale cd->cc, which could have already been GCed.  Need flush them.
2020-06-09 09:52:46 +09:00
卜部昌平 ffe58b9c8b vm_ccs_push: do not cache non-heap entries
Entires not GC-able must be considered to be volatile.  Not eligible for
later use.
2020-06-09 09:52:46 +09:00
卜部昌平 e1e84fbb4f VM_CI_NEW_ID: USE_EMBED_CI could be false
It was a wrong idea to assume CIs are always embedded.
2020-06-09 09:52:46 +09:00
卜部昌平 324038c66e eliminate C99 compound literals
Ko1 prefers variables be assgined, instead of bare literals in function
arguments.
2020-06-09 09:52:46 +09:00
卜部昌平 4fbe86d0e2 vm_call_method: use struct assignment
This further reduces the generated binary of vm_call_method from 566
bytes to 545 bytes on my machine, according to nm(1).
2020-06-09 09:52:46 +09:00
卜部昌平 46728557c1 rb_vm_call0: on-stack call info
This changeset reduces the generated binary of rb_vm_call0 from 281
bytes to 211 bytes on my machine.  Should reduce GC pressure as well.
2020-06-09 09:52:46 +09:00
卜部昌平 db406daa60 vm_yield_setup_args: refactor use macro 2020-06-09 09:52:46 +09:00
卜部昌平 367263c3dd vm_call_method: no call vm_cc_fill
This changeset reduces the generated binary of vm_call_method from 600
bytes to 566 bytes on my machine, accroding to nm(1).
2020-06-09 09:52:46 +09:00
卜部昌平 fb3f1f95e8 vm_call_refined: no call vm_cc_fill
This changeset reduces the generated binary of vm_call_method_each_type
from 2,442 bytes to 2,378 bytes on my machine, accroding to nm(1).
2020-06-09 09:52:46 +09:00
卜部昌平 be5dfdd8a2 vm_call_zsuper: no call vm_cc_fill
This changeset reduces the generated binary of vm_call_method_each_type
from 2,522 bytes to 2,442 bytes on my machine, accroding to nm(1).
2020-06-09 09:52:46 +09:00
卜部昌平 dbbde61cef vm_call_method_missing_body: on-stack call info
This changeset reduces the generated binary of
vm_call_method_missing_body from 604 bytes to 532 bytes on my machine.
Should reduce GC pressure as well.
2020-06-09 09:52:46 +09:00
卜部昌平 9c287f8caa vm_call_symbol: on-stack call info
This changeset reduces the generated binary of vm_call_symbol from 808
bytes to 798 bytes on my machine.  Should reduce GC pressure as well.
2020-06-09 09:52:46 +09:00
卜部昌平 62b471bd44 vm_call_alias: no call vm_cc_fill
This changeset reduces the generated binary of vm_call_alias from 188
bytes to 149 bytes on my machine, accroding to nm(1).
2020-06-09 09:52:46 +09:00
卜部昌平 97f456374d rb_eql_opt: fully static call data
This changeset reduces the generated binary of rb_eql_opt from 86 bytes to
20 bytes on my machine, according to nm(1).
2020-06-09 09:52:46 +09:00
卜部昌平 3da9c51973 rb_vm_search_method_slowpath: skip vm_empty_cc
Now that vm_empty_cc is statically allocated outside of the object
space.  It shall not be GCed.  Here, because vm_search_cc can return
that.  Must not be blindly passed to RB_OBJ_WRITE, unless assertions
fail on RGENGC_CHECK_MODE, like this:

-- C level backtrace information
-------------------------------------------
ruby(rb_print_backtrace+0x19) [0x5555557fd579] vm_dump.c:757
ruby(rb_vm_bugreport+0x151) [0x5555557fd6f1] vm_dump.c:955
ruby(rb_bug+0x1d6) [0x5555558d6396] error.c:660
ruby(check_rvalue_consistency_force+0x707) [0x5555555adb97] gc.c:1289
ruby(check_rvalue_consistency+0x1a) [0x555555598a0a] gc.c:1305
ruby(RVALUE_OLD_P+0x15) [0x5555555975d5] gc.c:1382
ruby(rb_gc_writebarrier+0x9f) [0x55555559753f] gc.c:6882
ruby(rb_obj_written+0x3a) [0x5555557a025a] include/ruby/internal/rgengc.h:180
ruby(rb_obj_write+0x41) [0x5555557a1a81] include/ruby/internal/rgengc.h:195
ruby(rb_vm_search_method_slowpath+0x5a) [0x5555557a125a] vm_insnhelper.c:1603
ruby(vm_search_method_fastpath+0x197) [0x5555557d8027] vm_insnhelper.c:1638
ruby(vm_search_method+0xea) [0x5555557d7d2a] vm_insnhelper.c:1650
ruby(vm_search_method_wrap+0x29) [0x5555557dbaf9] vm_insnhelper.c:4091
ruby(vm_sendish+0xa9) [0x5555557dba39] vm_insnhelper.c:4143
ruby(vm_exec_core+0xe357) [0x5555557b0757] insns.def:801
ruby(rb_vm_exec+0x12c) [0x5555557d17cc] vm.c:1942
ruby(invoke_block+0xea) [0x5555557f42fa] vm.c:1058
ruby(invoke_iseq_block_from_c+0x16e) [0x5555557f3eae] vm.c:1130
ruby(invoke_block_from_c_bh) vm.c:1148
ruby(vm_yield+0x71) [0x5555557f3c41] vm.c:1193
ruby(rb_yield_0+0x25) [0x5555557ca615] vm_eval.c:1141
ruby(rb_yield_1+0x27) [0x5555557ca5c7] vm_eval.c:1147
ruby(rb_yield+0x34) [0x5555557ca654] vm_eval.c:1157
ruby(rb_ary_collect+0xb0) [0x555555828320] array.c:3186
ruby(call_cfunc_0+0x29) [0x5555557f0f39] vm_insnhelper.c:2385
ruby(vm_call_cfunc_with_frame+0x278) [0x5555557eca98] vm_insnhelper.c:2553
ruby(vm_sendish+0xd0) [0x5555557dba60] vm_insnhelper.c:4146
ruby(vm_exec_core+0xe0f8) [0x5555557b04f8] insns.def:782
ruby(rb_vm_exec+0x12c) [0x5555557d17cc] vm.c:1942
ruby(invoke_block+0xea) [0x5555557f42fa] vm.c:1058
ruby(invoke_iseq_block_from_c+0x16e) [0x5555557f3eae] vm.c:1130
ruby(invoke_block_from_c_bh) vm.c:1148
ruby(vm_yield+0x71) [0x5555557f3c41] vm.c:1193
ruby(rb_yield_0+0x25) [0x5555557ca615] vm_eval.c:1141
ruby(rb_yield_1+0x27) [0x5555557ca5c7] vm_eval.c:1147
ruby(rb_yield+0x34) [0x5555557ca654] vm_eval.c:1157
ruby(rb_ary_each+0xa5) [0x55555581c795] array.c:2242
ruby(call_cfunc_0+0x29) [0x5555557f0f39] vm_insnhelper.c:2385
ruby(vm_call_cfunc_with_frame+0x278) [0x5555557eca98] vm_insnhelper.c:2553
ruby(vm_sendish+0xd0) [0x5555557dba60] vm_insnhelper.c:4146
ruby(vm_exec_core+0xe0f8) [0x5555557b04f8] insns.def:782
ruby(rb_vm_exec+0x12c) [0x5555557d17cc] vm.c:1942
ruby(invoke_block+0xea) [0x5555557f42fa] vm.c:1058
ruby(invoke_iseq_block_from_c+0x16e) [0x5555557f3eae] vm.c:1130
ruby(invoke_block_from_c_bh) vm.c:1148
ruby(vm_yield+0x71) [0x5555557f3c41] vm.c:1193
ruby(rb_yield_0+0x25) [0x5555557ca615] vm_eval.c:1141
ruby(rb_yield_1+0x27) [0x5555557ca5c7] vm_eval.c:1147
ruby(rb_yield+0x34) [0x5555557ca654] vm_eval.c:1157
ruby(rb_ary_each+0xa5) [0x55555581c795] array.c:2242
ruby(call_cfunc_0+0x29) [0x5555557f0f39] vm_insnhelper.c:2385
ruby(vm_call_cfunc_with_frame+0x278) [0x5555557eca98] vm_insnhelper.c:2553
ruby(vm_sendish+0xd0) [0x5555557dba60] vm_insnhelper.c:4146
ruby(vm_exec_core+0xe0f8) [0x5555557b04f8] insns.def:782
ruby(rb_vm_exec+0x19f) [0x5555557d183f] vm.c:1951
ruby(rb_iseq_eval+0x30) [0x5555557d2530] vm.c:2190
ruby(load_iseq_eval+0xd6) [0x5555555fa7e6] load.c:592
ruby(require_internal+0x25e) [0x5555555f7f5e] load.c:1022
ruby(rb_require_string+0x27) [0x5555555f74e7] load.c:1094
ruby(rb_f_require_relative+0x5f) [0x5555555f758f] load.c:837
ruby(call_cfunc_1+0x30) [0x5555557f0f70] vm_insnhelper.c:2391
ruby(vm_call_cfunc_with_frame+0x278) [0x5555557eca98] vm_insnhelper.c:2553
ruby(vm_call_cfunc+0xad) [0x5555557e521d] vm_insnhelper.c:2574
ruby(vm_call_method_each_type+0xc7) [0x5555557e4af7] vm_insnhelper.c:3040
ruby(vm_call_method+0x19c) [0x5555557e45dc] vm_insnhelper.c:3144
ruby(vm_call_general+0x2d) [0x5555557c8c3d] vm_insnhelper.c:3176
ruby(vm_sendish+0xd0) [0x5555557dba60] vm_insnhelper.c:4146
ruby(vm_exec_core+0xe357) [0x5555557b0757] insns.def:801
ruby(rb_vm_exec+0x12c) [0x5555557d17cc] vm.c:1942
ruby(rb_iseq_eval_main+0x30) [0x5555557d2670] vm.c:2201
ruby(rb_ec_exec_node+0x16b) [0x55555557e39b] eval.c:296
ruby(ruby_run_node+0x72) [0x55555557e1f2] eval.c:354
ruby(main+0x78) [0x55555557a5d8] main.c:50
2020-06-09 09:52:46 +09:00
卜部昌平 8f3d4090f0 rb_equal_opt: fully static call data
This changeset reduces the generated binary of rb_equal_opt from 129 bytes
to 17 bytes on my machine, according to nm(1).
2020-06-09 09:52:46 +09:00
卜部昌平 3928c151a6 vm_search_method_fastpath: avoid rb_vm_empty_cc()
This is such a hot path that it's worth eliminating a function call.  Use
the static variable directly instead.
2020-06-09 09:52:46 +09:00
卜部昌平 877238f2d3 check_cfunc: add assertions
For debug.  Must not change generated binary unless VM_ASSERT is on.
2020-06-09 09:52:46 +09:00
Nobuyoshi Nakada 184f78314e Properly resolve refinements in defined? on private call [Bug #16932] 2020-06-04 02:12:57 +09:00
Nobuyoshi Nakada 8340c773e5 Properly resolve refinements in defined? on method call [Bug #16932] 2020-06-04 02:12:57 +09:00
卜部昌平 de5e0f7c06 vm_invoke_proc_block: reduce recursion
According to nobu recursion can be longer than my expectation.  Limit
them here.
2020-06-03 16:13:47 +09:00
卜部昌平 b61e82eac9 vm_call_symbol: check stack overflow
VM stack could overflow here.  The condition is when a symbol is passed
to a block-taking method via &variable, and that symbol has never been
used for actual method names (thus yielding that results in calling
method_missing), and the VM stack is full (no single word left).  This
is a once-in-a-blue-moon event.  Yet there is a very tiny room of stack
overflow.  We need to check that.
2020-06-03 16:13:47 +09:00
卜部昌平 ba20e6080d vm_invoke_block: remove auto qualifier
Was (harmless but) redundant.
2020-06-03 16:13:47 +09:00
卜部昌平 6302b96368 vm_insnhelper.c: add space [ci skip]
Just cosmetic change to improve readability.
2020-06-03 16:13:47 +09:00
卜部昌平 054c2fdfdf vm_invoke_symbol_block: reduce MEMCPY
This commit changes the number of calls of MEMCPY from...

                           | send  | &:sym
  -------------------------|-------|-------
   Symbol already interned | once  | twice
   Symbol not pinned yet   | none  | once

to:

                           | send  | &:sym
  -------------------------|-------|-------
   Symbol already interned | once  | none
   Symbol not pinned yet   | twice | once

So it sacrifices exceptional situation for normal path.
2020-06-03 16:13:47 +09:00
卜部昌平 36322942db vm_invoke_symbol_block: call vm_call_opt_send
Symbol#to_proc and Object#send are closely related each other.  Why not
share their implementations.  By doing so we can skip recursive call of
vm_exec(), which could benefit for speed.
2020-06-03 16:13:47 +09:00
卜部昌平 973883aaa1 vm_invoke_block: force indirect jump
This changeset slightly speeds up on my machine.

Calculating -------------------------------------
                                       before                 after
Optcarrot Lan_Master.nes    38.33488426546287     40.89825082589147 fps
                            40.91288557922081     41.48687465359386
                            40.96591995270991     41.98499064664184
                            41.20461943032173     43.67314690779162
                            42.38344888176518     44.02777536251875
                            43.43563728880915     44.88695892714136
                            43.88082889062643     45.11226186242523
2020-06-03 16:13:47 +09:00
卜部昌平 796f9edae0 vm_invoke_block: insertion of unused args
This makes it possible for vm_invoke_block to pass its passed arguments
verbatimly to calling functions.  Because they are tail-called the
function calls can be strength-recuced into indirect jumps, which is a
huge win.
2020-06-03 16:13:47 +09:00
卜部昌平 ec87a58d55 vm_invoke_block: eliminate goto
Use recursion for better readability.
2020-06-03 16:13:47 +09:00
卜部昌平 11c70b3162 vm_invoke_block: move logics around
Moved block handler -> captured block conversion from vm_invokeblock to
each vm_invoke_*_block functions.
2020-06-03 16:13:47 +09:00
Nobuyoshi Nakada d05f04d27d
Fixed `defined?` against protected method call
Protected methods are restricted to be called according to the
class/module in where it is defined, not the actual receiver's
class.  [Bug #16931]
2020-06-02 19:01:56 +09:00
卜部昌平 40ced763b4 vm_insnhelper.c: merge opt_eq_func / opt_eql_func
These two function were almost identical, except in case of
T_STRING/T_FLOAT. Why not merge them into one, and let the difference be
handled in normal method calls (slowpath).  This does not improve
runtime performance for me, but at least reduces for instance rb_eql_opt
from 653 bytes to 86 bytes on my machine, according to nm(1).
2020-06-02 12:38:01 +09:00
Takashi Kokubun 9d71373c23
Mark vm_stackoverflow as NOINLINE COLDFUNC on JIT
to reduce code size and improve locality of hot code.
2020-05-26 23:24:58 -07:00
Jeremy Evans ad729a1d11 Fix origin iclass pointer for modules
If a module has an origin, and that module is included in another
module or class, previously the iclass created for the module had
an origin pointer to the module's origin instead of the iclass's
origin.

Setting the origin pointer correctly requires using a stack, since
the origin iclass is not created until after the iclass itself.
Use a hidden ruby array to implement that stack.

Correctly assigning the origin pointers in the iclass caused a
use-after-free in GC.  If a module with an origin is included
in a class, the iclass shares a method table with the module
and the iclass origin shares a method table with module origin.

Mark iclass origin with a flag that notes that even though the
iclass is an origin, it shares a method table, so the method table
should not be garbage collected.  The shared method table will be
garbage collected when the module origin is garbage collected.
I've tested that this does not introduce a memory leak.

This change caused a VM assertion failure, which was traced to callable
method entries using the incorrect defined_class.  Update
rb_vm_check_redefinition_opt_method and find_defined_class_by_owner
to treat iclass origins different than class origins to avoid this
issue.

This also includes a fix for Module#included_modules to skip
iclasses with origins.

Fixes [Bug #16736]
2020-05-22 20:31:23 -07:00
Koichi Sasada cbd45af2a9 fix memory leak of ccs
rb_callable_method_entry() creates ccs entry in cc_tbl, but this
code overwrite by insert newly created ccs and overwrote ccs never
freed.
[Bug #16900]
2020-05-22 11:23:20 +09:00
卜部昌平 15e977349e more on NULL versus functions
Function pointers are not void*.  See also
115fec062c
ce4ea956d2
8427fca49b
2020-05-11 16:47:25 +09:00
卜部昌平 9e41a75255 sed -i 's|ruby/impl|ruby/internal|'
To fix build failures.
2020-05-11 09:24:08 +09:00
卜部昌平 d7f4d732c1 sed -i s|ruby/3|ruby/impl|g
This shall fix compile errors.
2020-05-11 09:24:08 +09:00
Nobuyoshi Nakada cfe0e660f4
Disable -Wswitch warning when VM_CHECK_MODE 2020-05-03 00:15:56 +09:00
Takashi Kokubun 79f3403be0
Invalidate fastpath when calling attr_reader by super
The same bug as 8355a99883 existed in attr_reader too.
2020-04-14 23:49:29 -07:00
Takashi Kokubun 8355a99883
Invalidate fastpath when calling attr_writer by super
We started to use fastpath on invokesuper when a method is not refinements
since 5c27681813, but we shouldn't have used fastpath for attr_writer either.

`cc->aux_.attr_index` is for an actual receiver class, while we store
its superclass in `cc->klass` and therefore there's no way to properly
invalidate attr_writer's inline cache when it's called by super.

[Bug #16785]

I suspect the same bug also exists in attr_reader. I'll address that in
another commit.
2020-04-14 23:32:13 -07:00
Takashi Kokubun 310ef9f40b
Make vm_call_cfunc_with_frame a fastpath (#3027)
when there's no need to call CALLER_SETUP_ARG and CALLER_REMOVE_EMPTY_KW_SPLAT
(i.e. !rb_splat_or_kwargs_p(ci) && !calling->kw_splat).

Micro benchmark:
```
$ benchmark-driver -v --rbenv 'before;after' benchmark/vm_send_cfunc.yml --repeat-count=4
before: ruby 2.8.0dev (2020-04-13T23:45:05Z master b9d3ceee8f) [x86_64-linux]
after: ruby 2.8.0dev (2020-04-14T00:48:52Z no-splat-fastpath 418d363722) [x86_64-linux]
Calculating -------------------------------------
                         before       after
       vm_send_cfunc    69.585M     88.724M i/s -    100.000M times in 1.437097s 1.127096s

Comparison:
                    vm_send_cfunc
               after:  88723605.2 i/s
              before:  69584737.1 i/s - 1.28x  slower
```

Optcarrot:
```
$ benchmark-driver -v --rbenv 'before;after' benchmark.yml --repeat-count=12 --output=all
before: ruby 2.8.0dev (2020-04-13T23:45:05Z master b9d3ceee8f) [x86_64-linux]
after: ruby 2.8.0dev (2020-04-14T00:48:52Z no-splat-fastpath 418d363722) [x86_64-linux]
Calculating -------------------------------------
                                       before                 after
Optcarrot Lan_Master.nes    50.76119601545175     42.73858236484051 fps
                            50.76388649761503     51.04211379912850
                            50.80930672252514     51.39455790755538
                            50.90236000778749     51.75656936556145
                            51.01744746340430     51.86875277356489
                            51.06495279015112     51.88692482485558
                            51.07785337168974     51.93429603190578
                            51.20163525187862     51.95768145071314
                            51.34671771913112     52.45577266040274
                            51.35918340835583     52.53163888762858
                            51.46641337418146     52.62172484121034
                            51.50835463462257     52.85064021113239
```
2020-04-13 20:32:59 -07:00
Takashi Kokubun 5c27681813
Enable fastpath on invokesuper (#3021)
Fastpath has not been used for invokesuper since it has set vm_call_super_method on every invocation.
Because it seems to be blocked only by refinements, try enabling fastpath on invokesuper when cme is not for refinements.

While this patch itself should be helpful for VM performance, a part of this patch's motivation is to unblock inlining invokesuper on JIT. 

$ benchmark-driver -v --rbenv 'before;after' benchmark/vm2_super.yml --repeat-count=4
before: ruby 2.8.0dev (2020-04-11T15:19:58Z master a01bda5949) [x86_64-linux]
after: ruby 2.8.0dev (2020-04-12T02:00:08Z invokesuper-fastpath c171984ee3) [x86_64-linux]
Calculating -------------------------------------
                         before       after
           vm2_super    20.031M     32.860M i/s -      6.000M times in 0.299534s 0.182593s

Comparison:
                        vm2_super
               after:  32859885.2 i/s
              before:  20031097.3 i/s - 1.64x  slower
2020-04-11 20:45:22 -07:00
Jeremy Evans 900e83b501 Turn class variable warnings into exceptions
This changes the following warnings:

* warning: class variable access from toplevel
* warning: class variable @foo of D is overtaken by C

into RuntimeErrors.  Handle defined?(@@foo) at toplevel
by returning nil instead of raising an exception (the previous
behavior warned before returning nil when defined? was used).

Refactor the specs to avoid the warnings even in older versions.
The specs were checking for the warnings, but the purpose of
the related specs as evidenced from their description is to
test for behavior, not for warnings.

Fixes [Bug #14541]
2020-04-10 00:29:05 -07:00
卜部昌平 9e6e39c351
Merge pull request #2991 from shyouhei/ruby.h
Split ruby.h
2020-04-08 13:28:13 +09:00
Jeremy Evans d2c41b1bff Reduce allocations for keyword argument hashes
Previously, passing a keyword splat to a method always allocated
a hash on the caller side, and accepting arbitrary keywords in
a method allocated a separate hash on the callee side.  Passing
explicit keywords to a method that accepted a keyword splat
did not allocate a hash on the caller side, but resulted in two
hashes allocated on the callee side.

This commit makes passing a single keyword splat to a method not
allocate a hash on the caller side.  Passing multiple keyword
splats or a mix of explicit keywords and a keyword splat still
generates a hash on the caller side.  On the callee side,
if arbitrary keywords are not accepted, it does not allocate a
hash.  If arbitrary keywords are accepted, it will allocate a
hash, but this commit uses a callinfo flag to indicate whether
the caller already allocated a hash, and if so, the callee can
use the passed hash without duplicating it.  So this commit
should make it so that a maximum of a single hash is allocated
during method calls.

To set the callinfo flag appropriately, method call argument
compilation checks if only a single keyword splat is given.
If only one keyword splat is given, the VM_CALL_KW_SPLAT_MUT
callinfo flag is not set, since in that case the keyword
splat is passed directly and not mutable.  If more than one
splat is used, a new hash needs to be generated on the caller
side, and in that case the callinfo flag is set, indicating
the keyword splat is mutable by the callee.

In compile_hash, used for both hash and keyword argument
compilation, if compiling keyword arguments and only a
single keyword splat is used, pass the argument directly.

On the caller side, in vm_args.c, the callinfo flag needs to
be recognized and handled.  Because the keyword splat
argument may not be a hash, it needs to be converted to a
hash first if not.  Then, unless the callinfo flag is set,
the hash needs to be duplicated.  The temporary copy of the
callinfo flag, kw_flag, is updated if a hash was duplicated,
to prevent the need to duplicate it again.  If we are
converting to a hash or duplicating a hash, we need to update
the argument array, which can including duplicating the
positional splat array if one was passed.  CALLER_SETUP_ARG
and a couple other places needs to be modified to handle
similar issues for other types of calls.

This includes fairly comprehensive tests for different ways
keywords are handled internally, checking that you get equal
results but that keyword splats on the caller side result in
distinct objects for keyword rest parameters.

Included are benchmarks for keyword argument calls.
Brief results when compiled without optimization:

  def kw(a: 1) a end
  def kws(**kw) kw end
  h = {a: 1}

  kw(a: 1)       # about same
  kw(**h)        # 2.37x faster
  kws(a: 1)      # 1.30x faster
  kws(**h)       # 2.19x faster
  kw(a: 1, **h)  # 1.03x slower
  kw(**h, **h)   # about same
  kws(a: 1, **h) # 1.16x faster
  kws(**h, **h)  # 1.14x faster
2020-03-17 12:09:43 -07:00
卜部昌平 f12b9a3338 %p is for void *
See also
35eb12c063
6f5eb28507
687308cf0d
b6a2d63eb3
2020-03-04 12:30:42 +09:00
Koichi Sasada 91de0daaa2 method_missing_reason should be set.
send() has special method launcher in VM and it has special
method_missing caller. This path doesn't set
ec->method_missing_reason which is used at exception creation,
so setup this information. Without this setting, NoMethodError
exception becomes NameError.

This patch will fix:
http://ci.rvm.jp/results/trunk-random1@phosphorus-docker/2761643
2020-03-03 02:44:02 +09:00
Koichi Sasada 18674aef0d check imemo_type
check imemo_type to debug
http://ci.rvm.jp/results/trunk-vm-asserts@silicon-docker/2744755
2020-02-27 10:50:20 +09:00