Граф коммитов

198 Коммитов

Автор SHA1 Сообщение Дата
卜部昌平 acd8ee8dbc tool/prelude.c.tmpl: use RubyVM::CEscape
Do not repeat yourself.
2020-08-11 16:51:07 +09:00
卜部昌平 b0eb5aa344 RubyVM::CEscape#rstring2cstr: do not escape '
A single quote "is representable either by itself or by the escape
sequence", according to ISO/IEC 9899 (checked all versions).  So this is
not a bug fix.  But the generated output is a bit readable without
backslashes.
2020-08-11 16:51:07 +09:00
卜部昌平 1fb4e28002 skip inlining cexpr! that are not attr! inline
Requested by ko1.
2020-07-16 11:49:09 +09:00
卜部昌平 8d3a084572 _mjit_compile_invokebuiltin: sp_inc can be negative
Was my bad to assume sp_inc was positive.  Real criteria is the
calculated sp is non-negative.  We have to assert that.
2020-07-14 13:15:06 +09:00
卜部昌平 927fe2422f mk_builtin_loader.rb: STACK_ADDR_FROM_TOP unusable
Stacks are emulated in MJIT, must not touch the original VM stack.

See also http://ci.rvm.jp/results/trunk-mjit-wait@silicon-docker/3061353
2020-07-13 12:30:43 +09:00
卜部昌平 7e536b3be2 builtin.h: avoid copy&paste
Instead of doubling the invokebuiltin logic here and there, use the same
insns.def definition for both MJIT/non-JIT situations.
2020-07-13 08:56:18 +09:00
卜部昌平 9721f477c7 inline Primitive.cexpr!
We can obtain the verbatim source code of Primitive.cexpr!.  Why not
paste that content into the JITed program.
2020-07-13 08:56:18 +09:00
卜部昌平 f66e0212ef precalc invokebuiltin destinations
Noticed that struct rb_builtin_function is a purely compile-time
constant.  MJIT can eliminate some runtime calculations by statically
generate dedicated C code generator for each builtin functions.
2020-07-13 08:56:18 +09:00
Takashi Kokubun 7fa3c71bec
Make sure vm_call_cfunc uses inlined cc
which is checked by the first guard. When JIT-inlined cc and operand
cd->cc are different, the JIT-ed code might wrongly dispatch cd->cc even
while class check is done with another cc inlined by JIT.

This fixes SEGV on railsbench.
2020-07-10 00:44:02 -07:00
Takashi Kokubun e4f7eee009
Check ROBJECT_EMBED on guards-merged ivar access
Fix CI failure like
http://ci.rvm.jp/results/trunk-mjit-wait@silicon-docker/3043247
introduced by a69dd699ee
2020-07-04 16:02:46 -07:00
Takashi Kokubun a69dd699ee
Merge ivar guards on JIT (#3284)
when an ISeq has multiple ivar accesses.
2020-07-03 17:52:52 -07:00
Koichi Sasada a0f12a0258
Use ID instead of GENTRY for gvars. (#3278)
Use ID instead of GENTRY for gvars.

Global variables are compiled into GENTRY (a pointer to struct
rb_global_entry). This patch replace this GENTRY to ID and
make the code simple.

We need to search GENTRY from ID every time (st_lookup), so
additional overhead will be introduced.
However, the performance of accessing global variables is not
important now a day and this simplicity helps Ractor development.
2020-07-03 16:56:44 +09:00
Takashi Kokubun 40b40523dc
Show what's inlined first in "JIT inline" log
and add a debug log
2020-06-25 23:50:19 -07:00
Takashi Kokubun 7982dc1dfd
Decide JIT-ed insn based on cached cfunc
for opt_* insns.

opt_eq handles rb_obj_equal inside opt_eq, and all other cfunc is
handled by opt_send_without_block. Therefore we can't decide which insn
should be generated by checking whether it's cfunc cc or not.

```
$ benchmark-driver -v --rbenv 'before --jit;after --jit' benchmark/mjit_opt_cc_insns.yml --repeat-count=4
before --jit: ruby 2.8.0dev (2020-06-26T05:21:43Z master 9dbc2294a6) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-06-26T06:30:18Z master 75cece1b0b) +JIT [x86_64-linux]
last_commit=Decide JIT-ed insn based on cached cfunc
Calculating -------------------------------------
                     before --jit  after --jit
        mjit_nil?(1)      73.878M      74.021M i/s -     40.000M times in 0.541432s 0.540391s
         mjit_not(1)      72.635M      74.601M i/s -     40.000M times in 0.550702s 0.536187s
     mjit_eq(1, nil)       7.331M       7.445M i/s -      8.000M times in 1.091211s 1.074596s
     mjit_eq(nil, 1)      49.450M      64.711M i/s -      8.000M times in 0.161781s 0.123627s

Comparison:
                     mjit_nil?(1)
         after --jit:  74020528.4 i/s
        before --jit:  73878185.9 i/s - 1.00x  slower

                      mjit_not(1)
         after --jit:  74600882.0 i/s
        before --jit:  72634507.6 i/s - 1.03x  slower

                  mjit_eq(1, nil)
         after --jit:   7444657.4 i/s
        before --jit:   7331304.3 i/s - 1.02x  slower

                  mjit_eq(nil, 1)
         after --jit:  64710790.6 i/s
        before --jit:  49449507.4 i/s - 1.31x  slower
```
2020-06-25 23:33:08 -07:00
Takashi Kokubun 37a2e48d76
Avoid generating opt_send with cfunc cc with JIT
only for opt_nil_p and opt_not.

While vm_method_cfunc_is is used for opt_eq too, many fast paths of it
don't call it. So if it's populated, it should generate opt_send,
regardless of cfunc or not. And again, opt_neq isn't relevant due to the
difference in operands.
So opt_nil_p and opt_not are the only variants using vm_method_cfunc_is
like they use.

```
$ benchmark-driver -v --rbenv 'before2 --jit::ruby --jit;before --jit;after --jit' benchmark/mjit_opt_cc_insns.yml --repeat-count=4
before2 --jit: ruby 2.8.0dev (2020-06-22T08:37:37Z master 3238641750) +JIT [x86_64-linux]
before --jit: ruby 2.8.0dev (2020-06-23T01:01:24Z master 9ce2066209) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-06-23T06:58:37Z master 17e9df3157) +JIT [x86_64-linux]
last_commit=Avoid generating opt_send with cfunc cc with JIT
Calculating -------------------------------------
                     before2 --jit  before --jit  after --jit
        mjit_nil?(1)       54.204M       75.536M      75.031M i/s -     40.000M times in 0.737947s 0.529548s 0.533110s
         mjit_not(1)       53.822M       70.921M      71.920M i/s -     40.000M times in 0.743195s 0.564007s 0.556171s
     mjit_eq(1, nil)        7.367M        6.496M       7.331M i/s -      8.000M times in 1.085882s 1.231470s 1.091327s

Comparison:
                     mjit_nil?(1)
        before --jit:  75536059.3 i/s
         after --jit:  75031409.4 i/s - 1.01x  slower
       before2 --jit:  54204431.6 i/s - 1.39x  slower

                      mjit_not(1)
         after --jit:  71920324.1 i/s
        before --jit:  70921063.1 i/s - 1.01x  slower
       before2 --jit:  53821697.6 i/s - 1.34x  slower

                  mjit_eq(1, nil)
       before2 --jit:   7367280.0 i/s
         after --jit:   7330527.4 i/s - 1.01x  slower
        before --jit:   6496302.8 i/s - 1.13x  slower
```
2020-06-23 00:09:54 -07:00
Takashi Kokubun 78352fb52e
Compile opt_send for opt_* only when cc has ISeq
because opt_nil/opt_not/opt_eq populates cc even when it doesn't
fallback to opt_send_without_block because of vm_method_cfunc_is.

```
$ benchmark-driver -v --rbenv 'before --jit;after --jit' benchmark/mjit_opt_cc_insns.yml --repeat-count=4
before --jit: ruby 2.8.0dev (2020-06-22T08:11:24Z master d231b8f95b) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-06-22T08:53:27Z master e1125879ed) +JIT [x86_64-linux]
last_commit=Compile opt_send for opt_* only when cc has ISeq
Calculating -------------------------------------
                     before --jit  after --jit
        mjit_nil?(1)      54.106M      73.693M i/s -     40.000M times in 0.739288s 0.542795s
         mjit_not(1)      53.398M      74.477M i/s -     40.000M times in 0.749090s 0.537075s
     mjit_eq(1, nil)       7.427M       6.497M i/s -      8.000M times in 1.077136s 1.231326s

Comparison:
                     mjit_nil?(1)
         after --jit:  73692594.3 i/s
        before --jit:  54106108.4 i/s - 1.36x  slower

                      mjit_not(1)
         after --jit:  74477487.9 i/s
        before --jit:  53398125.0 i/s - 1.39x  slower

                  mjit_eq(1, nil)
        before --jit:   7427105.9 i/s
         after --jit:   6497063.0 i/s - 1.14x  slower
```

Actually opt_eq becomes slower by this. Maybe it's indeed using
opt_send_without_block, but I'll approach that one in another commit.
2020-06-22 02:08:21 -07:00
Takashi Kokubun d9f608b686
Verify builtin inline annotation with VM_CHECK_MODE (#3244)
* Verify builtin inline annotation with VM_CHECK_MODE

* Remove static to fix the link issue on MJIT
2020-06-21 10:27:04 -07:00
Takashi Kokubun 7561db8c00
Introduce Primitive.attr! to annotate 'inline' (#3242)
[Feature #15589]
2020-06-20 17:13:03 -07:00
Takashi Kokubun e544a3a23c
Remove obsoleted opt_call_c_function insn (#3232)
* Remove obsoleted opt_call_c_function insn

* Keep opt_call_c_function with DEFINE_INSN_IF
2020-06-17 09:16:01 -07:00
Takashi Kokubun 0bd025ad69
Add a debug_counter for JIT cancel on leave 2020-05-28 22:45:35 -07:00
Takashi Kokubun b16a2aa938
Reduce code size for rb_class_of
by inlining only hot path.

=== mame/optcarrot ===

$ benchmark-driver -v --rbenv 'before --jit;after --jit' benchmark.yml --repeat-count=24 --output=all
before --jit: ruby 2.8.0dev (2020-05-18T05:21:31Z master 0e5a58b6bf) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-05-18T06:12:04Z master 0e3d71a8d1) +JIT [x86_64-linux]
last_commit=Reduce code size for rb_class_of
Calculating -------------------------------------
                                 before --jit           after --jit
Optcarrot Lan_Master.nes    71.62880463568773     70.95730063273503 fps
                            71.73973684273152     71.98447841929851
                            75.03923801841310     75.54262519509039
                            75.16300287174957     77.64029272984344
                            75.16834828625935     78.67861469580785
                            75.17670723726911     78.81879353707393
                            75.67637908020630     79.18188850392886
                            76.19843953215396     79.66484891814478
                            77.28166716118808     79.80278072861037
                            77.38509903325165     80.05859292679696
                            78.12693418455953     80.34624804808006
                            78.73654441746730     80.66326571254345
                            79.25387513454415     80.69760605740196
                            79.44137881689524     81.32053489212245
                            79.50497657368358     81.50250852553751
                            79.62401328582868     82.27544931834611
                            79.79178811723664     82.67455264522741
                            81.20275352937418     82.93857260493297
                            81.57027048640776     83.15019118788184
                            81.63373188649095     83.20728816044721
                            81.93420437766426     83.25027576772972
                            82.05716136357167     83.27072145898173
                            82.21070805525066     83.36008265822194
                            82.56924063784872     83.36112268888493

=== benchmark-driver/sinatra ===

[rps]
before: 13143.49 rps
after: 13505.70 rps

[inlined rb_class_of size]
before: 11.5K
after: 3.8K

(calculated by `dwarftree --die inlined_subroutine --flat --merge --show-size`)
2020-05-17 23:38:19 -07:00
Takashi Kokubun a5073c053f
Always correct sp on leave cancel
Even if local stack optimization is not used and values are written to
VM stack, the stack pointer itself may not be moved properly. So this
should be always moved on JIT cancellation.

By the way it's hard to write a test for this because if we try to
generate an interrupt, it will be a method call and it consumes the
interrupt by itself on popping a frame.
2020-05-06 20:26:03 -07:00
Takashi Kokubun f5ddbba9a2
Include unit id in a function name of an inlined method
I'm trying to make it possible to include all JIT-ed code in a single C
file. This is needed to guarantee uniqueness of all function names
2020-04-30 23:08:13 -07:00
Takashi Kokubun 04e56958e6
Make sure newarraykwsplat accesses a correct index
on stack when local_stack_p is enabled.

This fixes `RB_FL_TEST_RAW:"RB_FL_ABLE(obj)"` assertion failure
on power_assert's test with JIT enabled.
2020-04-18 01:41:50 -07:00
Takashi Kokubun 310ef9f40b
Make vm_call_cfunc_with_frame a fastpath (#3027)
when there's no need to call CALLER_SETUP_ARG and CALLER_REMOVE_EMPTY_KW_SPLAT
(i.e. !rb_splat_or_kwargs_p(ci) && !calling->kw_splat).

Micro benchmark:
```
$ benchmark-driver -v --rbenv 'before;after' benchmark/vm_send_cfunc.yml --repeat-count=4
before: ruby 2.8.0dev (2020-04-13T23:45:05Z master b9d3ceee8f) [x86_64-linux]
after: ruby 2.8.0dev (2020-04-14T00:48:52Z no-splat-fastpath 418d363722) [x86_64-linux]
Calculating -------------------------------------
                         before       after
       vm_send_cfunc    69.585M     88.724M i/s -    100.000M times in 1.437097s 1.127096s

Comparison:
                    vm_send_cfunc
               after:  88723605.2 i/s
              before:  69584737.1 i/s - 1.28x  slower
```

Optcarrot:
```
$ benchmark-driver -v --rbenv 'before;after' benchmark.yml --repeat-count=12 --output=all
before: ruby 2.8.0dev (2020-04-13T23:45:05Z master b9d3ceee8f) [x86_64-linux]
after: ruby 2.8.0dev (2020-04-14T00:48:52Z no-splat-fastpath 418d363722) [x86_64-linux]
Calculating -------------------------------------
                                       before                 after
Optcarrot Lan_Master.nes    50.76119601545175     42.73858236484051 fps
                            50.76388649761503     51.04211379912850
                            50.80930672252514     51.39455790755538
                            50.90236000778749     51.75656936556145
                            51.01744746340430     51.86875277356489
                            51.06495279015112     51.88692482485558
                            51.07785337168974     51.93429603190578
                            51.20163525187862     51.95768145071314
                            51.34671771913112     52.45577266040274
                            51.35918340835583     52.53163888762858
                            51.46641337418146     52.62172484121034
                            51.50835463462257     52.85064021113239
```
2020-04-13 20:32:59 -07:00
Takashi Kokubun b9d3ceee8f
Unwrap vm_call_cfunc indirection on JIT
for VM_METHOD_TYPE_CFUNC.

This has been known to decrease optcarrot fps:

```
$ benchmark-driver -v --rbenv 'before --jit;after --jit' benchmark.yml --repeat-count=24 --output=all
before --jit: ruby 2.8.0dev (2020-04-13T16:25:13Z master fb40495cd9) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-04-13T23:23:11Z mjit-inline-c bdcd06d159) +JIT [x86_64-linux]
Calculating -------------------------------------
                                 before --jit           after --jit
Optcarrot Lan_Master.nes    66.38132676191719     67.41369177299630 fps
                            69.42728743772243     68.90327567263054
                            72.16028300263211     69.62605130880686
                            72.46631319102777     70.48818243767207
                            73.37078877002490     70.79522887347566
                            73.69422431217367     70.99021920193194
                            74.01471487018695     74.69931965402584
                            75.48685183295630     74.86714575949016
                            75.54445264507932     75.97864419721677
                            77.28089738169756     76.48908637569581
                            78.04183397891302     76.54320932488021
                            78.36807984096562     76.59407262898067
                            78.92898762543574     77.31316743361343
                            78.93576483233765     77.97153484180480
                            79.13754917503078     77.98478782102325
                            79.62648945850653     78.02263322726446
                            79.86334213878064     78.26333724045934
                            80.05100635898518     78.60056756355614
                            80.26186843769584     78.91082645644468
                            80.34205717020330     79.01226659142263
                            80.62286066044338     79.32733939423721
                            80.95883033058557     79.63793060542024
                            80.97376819251613     79.73108936622778
                            81.23050939202896     80.18280109433088
```

and I deleted this capability in an early stage of YARV-MJIT development:
0ab130feee

I suspect either of the following things could be the cause:

* Directly calling vm_call_cfunc requires more optimization effort in GCC,
  resulting in 30ms-ish compilation time increase for such methods and
  decreasing the number of methods compiled in a benchmarked period.

* Code size increase => icache miss hit

These hypotheses could be verified by some methodologies. However, I'd
like to introduce this regardless of the result because this blocks
inlining C method's definition.

I may revert this commit when I give up to implement inlining C method
definition, which requires this change.

Microbenchmark-wise, this gives slight performance improvement:

```
$ benchmark-driver -v --rbenv 'before --jit;after --jit' benchmark/mjit_send_cfunc.yml --repeat-count=4
before --jit: ruby 2.8.0dev (2020-04-13T16:25:13Z master fb40495cd9) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-04-13T23:23:11Z mjit-inline-c bdcd06d159) +JIT [x86_64-linux]
Calculating -------------------------------------
                     before --jit  after --jit
     mjit_send_cfunc      41.961M      56.489M i/s -    100.000M times in 2.383143s 1.770244s

Comparison:
                  mjit_send_cfunc
         after --jit:  56489372.5 i/s
        before --jit:  41961388.1 i/s - 1.35x  slower
```
2020-04-13 16:45:05 -07:00
Takashi Kokubun b66d7d9be5
Remove unused variable stack_size
_mjit_compile_send.erb doesn't use _mjit_compile_insn_body.erb
2020-04-06 02:00:23 -07:00
Takashi Kokubun 3194cd36e2
Delay definition of pc_moved_p
to unify the duplicated declarations and to make sure it's not used
until set properly.

Also changed it from legacy TRUE/FALSE to stdbool.
2020-04-06 01:55:18 -07:00
Takashi Kokubun 928bb17770
Fix -Wshorten-64-to-32 in 4f802828f4 2020-04-06 01:50:12 -07:00
Takashi Kokubun 4f802828f4
Refactor `argc` in mjit_compile_send
using sp_inc_of_sendish for consistency and to make it easier to
understand
2020-04-06 01:42:32 -07:00
Takashi Kokubun 1a33845215
Update outdated comments in mjit_compile_send
and simplify `v` variable references a little.

There's no CALL_METHOD anymore, and the original code lives in
vm_sendish instead of insns.def now.
2020-04-06 01:31:11 -07:00
Takashi Kokubun f984975c4d
Collapse `if` conditions to decrease indentation
in mjit_compile_send to clarify it's not that deeply branched.
2020-04-06 00:45:43 -07:00
Nobuyoshi Nakada ec03d13742
Fallback if Pathname#relative_path_from fails
It can fail due to different prefixes, e.g., drive letters or UNC
paths on DOSish platform.
2020-04-05 11:58:31 +09:00
Takashi Kokubun 151f8be40d
Make JIT-ed leave insn leaf
to eliminate sp / pc moves by cancelling JIT execution on interrupts.

$ benchmark-driver benchmark.yml -v --rbenv 'before --jit;after --jit' --repeat-count=12 --output=all
before --jit: ruby 2.8.0dev (2020-04-01T03:48:56Z master 5a81562dfe) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-04-01T04:58:01Z master 39beb26a27) +JIT [x86_64-linux]
Calculating -------------------------------------
                                 before --jit           after --jit
Optcarrot Lan_Master.nes    75.06409603894944     76.06422026555558 fps
                            75.12025067279242     78.48161731616810
                            77.42020273492177     79.78958240950033
                            79.07253675128945     79.88645902325614
                            79.99179109732327     80.33743931749331
                            80.07633091008627     80.53790081529166
                            80.15450942667547     80.99048270668010
                            80.48372803283709     81.70497146081003
                            80.57410149187352     82.79494539467382
                            81.80449157081202     82.85797792223954
                            82.24629397834902     83.00603891515506
                            82.63708148686703     83.23221006969828

$ benchmark-driver -v --rbenv 'before;before --jit;after --jit' benchmark/mjit_leave.yml --repeat-count=4
before: ruby 2.8.0dev (2020-04-01T03:48:56Z master 5a81562dfe) [x86_64-linux]
before --jit: ruby 2.8.0dev (2020-04-01T03:48:56Z master 5a81562dfe) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-04-01T04:58:01Z master 39beb26a27) +JIT [x86_64-linux]
Calculating -------------------------------------
                         before  before --jit  after --jit
          mjit_leave   106.656M       82.786M      91.635M i/s -    200.000M times in 1.875183s 2.415881s 2.182569s

Comparison:
                       mjit_leave
              before: 106656239.9 i/s
         after --jit:  91635143.7 i/s - 1.16x  slower
        before --jit:  82785537.2 i/s - 1.29x  slower
2020-03-31 22:10:16 -07:00
Takashi Kokubun b736ea63bd
Optimize exivar access on JIT-ed getivar
JIT support of dd723771c1.

$ benchmark-driver -v --rbenv 'before;before --jit;after --jit' benchmark/mjit_exivar.yml --repeat-count=4
before: ruby 2.8.0dev (2020-03-30T12:32:26Z master e5db3da9d3) [x86_64-linux]
before --jit: ruby 2.8.0dev (2020-03-30T12:32:26Z master e5db3da9d3) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-03-31T05:57:24Z mjit-exivar 128625baec) +JIT [x86_64-linux]
Calculating -------------------------------------
                         before  before --jit  after --jit
         mjit_exivar    57.944M       53.579M      54.471M i/s -    200.000M times in 3.451588s 3.732772s 3.671687s

Comparison:
                      mjit_exivar
              before:  57944345.1 i/s
         after --jit:  54470876.7 i/s - 1.06x  slower
        before --jit:  53579483.4 i/s - 1.08x  slower
2020-03-30 23:16:35 -07:00
Takashi Kokubun 0cd7be99e9
Avoid referring to an old value of realloc
OpenBSD RubyCI has failed with SEGV since 4bcd5981e8.
https://rubyci.org/logs/rubyci.s3.amazonaws.com/openbsd-current/ruby-master/log/20200312T223005Z.fail.html.gz

This was because `status->cc_entries` could be stale after `realloc` call
for inlined iseqs.
2020-03-12 22:51:34 -07:00
Takashi Kokubun da4b97a0e3
Pin and inline cme in JIT-ed method calls
```
$ benchmark-driver benchmark.yml -v --rbenv 'before --jit;after --jit' --repeat-count=12 --output=all
before --jit: ruby 2.8.0dev (2020-03-11T07:43:12Z master e89ebdcb87) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-03-11T07:54:18Z master 143776a0da) +JIT [x86_64-linux]
Calculating -------------------------------------
                                 before --jit           after --jit
Optcarrot Lan_Master.nes    73.86976729561439     77.20184819316513 fps
                            74.46997176460742     78.43493030231805
                            77.59686308754307     78.55714131655935
                            78.53693921126656     79.08984255596820
                            80.10158944910573     79.17751731838183
                            80.12254974411167     79.60853122429181
                            80.28678655204945     79.74674066871896
                            80.38690681095379     79.90624544440300
                            80.79223498756919     80.57881084206193
                            80.82857188422419     80.70677614429169
                            81.06447745878245     81.03868541295149
                            81.21620802278490     82.16354660940607
```
2020-03-11 00:59:34 -07:00
Takashi Kokubun 9511b4c8fa
Optimize away call data refs in JIT-ed method calls
According to ko1, `cd->cc != cc` was for GC.compact guard.
As we pin cc by rb_gc_mark(), we don't need the check.

```
$ benchmark-driver benchmark.yml -v --rbenv 'before --jit;after --jit' --repeat-count=12 --output=all
before --jit: ruby 2.8.0dev (2020-03-11T05:36:48Z master da6948753e) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-03-11T06:26:34Z master 36b20b8b4a) +JIT [x86_64-linux]
Calculating -------------------------------------
                                 before --jit           after --jit
Optcarrot Lan_Master.nes    74.03480698689405     71.63404803273507 fps
                            74.15085286586992     73.43923328104295
                            75.51738277744781     75.75465268365384
                            76.24922600109410     76.74071607861318
                            76.45513422802325     77.47521029238116
                            76.86617230739330     78.14759496269018
                            77.71509137131933     79.14051571125866
                            77.72839157096146     79.35884822673313
                            78.25218904561633     79.92538876408051
                            78.72521071333249     79.98075556706726
                            78.79950460165091     80.51747831497875
                            79.43884960720381     80.97973166525254
```
2020-03-10 23:29:50 -07:00
Takashi Kokubun aa3a7d6d74
Remove an unnecessary TODO comment
Fixing 4bcd5981e8/mjit.c (L338)
should be the right solution for this. We may not be able to free the cc immediately.

Plus, we're not copying cc but just holding references to be marked. cc
should be GC-ed once jit_unit is freed.
2020-03-10 01:33:38 -07:00
Takashi Kokubun 4bcd5981e8
Capture inlined iseq's cc entries in root iseq's
jit_unit to avoid marking wrong cc entries when inlined iseq is compiled
multiple times, resolving the TODO added by daf7c48d88.

This obviates pseudo jit_unit in inlined iseq introduced by 7ec2359374
and fixes memory leak of the adhoc unit.
2020-03-10 00:53:35 -07:00
Takashi Kokubun 69f377a3d6
Internalize rb_mjit_unit definition again
Fixed a TODO in b9007b6c54
2020-02-26 00:27:29 -08:00
Koichi Sasada b9007b6c54 Introduce disposable call-cache.
This patch contains several ideas:

(1) Disposable inline method cache (IMC) for race-free inline method cache
    * Making call-cache (CC) as a RVALUE (GC target object) and allocate new
      CC on cache miss.
    * This technique allows race-free access from parallel processing
      elements like RCU.
(2) Introduce per-Class method cache (pCMC)
    * Instead of fixed-size global method cache (GMC), pCMC allows flexible
      cache size.
    * Caching CCs reduces CC allocation and allow sharing CC's fast-path
      between same call-info (CI) call-sites.
(3) Invalidate an inline method cache by invalidating corresponding method
    entries (MEs)
    * Instead of using class serials, we set "invalidated" flag for method
      entry itself to represent cache invalidation.
    * Compare with using class serials, the impact of method modification
      (add/overwrite/delete) is small.
    * Updating class serials invalidate all method caches of the class and
      sub-classes.
    * Proposed approach only invalidate the method cache of only one ME.

See [Feature #16614] for more details.
2020-02-22 09:58:59 +09:00
Koichi Sasada f2286925f0 VALUE size packed callinfo (ci).
Now, rb_call_info contains how to call the method with tuple of
(mid, orig_argc, flags, kwarg). Most of cases, kwarg == NULL and
mid+argc+flags only requires 64bits. So this patch packed
rb_call_info to VALUE (1 word) on such cases. If we can not
represent it in VALUE, then use imemo_callinfo which contains
conventional callinfo (rb_callinfo, renamed from rb_call_info).

iseq->body->ci_kw_size is removed because all of callinfo is VALUE
size (packed ci or a pointer to imemo_callinfo).

To access ci information, we need to use these functions:
vm_ci_mid(ci), _flag(ci), _argc(ci), _kwarg(ci).

struct rb_call_info_kw_arg is renamed to rb_callinfo_kwarg.

rb_funcallv_with_cc() and rb_method_basic_definition_p_with_cc()
is temporary removed because cd->ci should be marked.
2020-02-22 09:58:59 +09:00
Takashi Kokubun c4794ed73a
Avoid jumping to a wrong destination
when the next insn is already compiled by former branches.
2020-02-18 23:19:06 -08:00
Aaron Patterson 7dbbba38a0
Make sure we don't push MOVED or NONE on the stack 2019-12-11 11:07:15 -08:00
Aaron Patterson 2c8d186c6e
Introduce an "Inline IVAR cache" struct
This commit introduces an "inline ivar cache" struct.  The reason we
need this is so compaction can differentiate from an ivar cache and a
regular inline cache.  Regular inline caches contain references to
`VALUE` and ivar caches just contain references to the ivar index.  With
this new struct we can easily update references for inline caches (but
not inline var caches as they just contain an int)
2019-12-05 13:37:02 -08:00
Dylan Thacker-Smith ac112f2b5d Avoid top-level search for nested constant reference from nil in defined?
Fixes [Bug #16332]

Constant access was changed to no longer allow top-level constant access
through `nil`, but `defined?` wasn't changed at the same time to stay
consistent.

Use a separate defined type to distinguish between a constant
referenced from the current lexical scope and one referenced from
another namespace.
2019-11-13 15:36:58 +09:00
Koichi Sasada 46acd0075d support builtin features with Ruby and C.
Support loading builtin features written in Ruby, which implement
with C builtin functions.
[Feature #16254]

Several features:

(1) Load .rb file at boottime with native binary.

Now, prelude.rb is loaded at boottime. However, this file is contained
into the interpreter as a text format and we need to compile it.
This patch contains a feature to load from binary format.

(2) __builtin_func() in Ruby call func() written in C.

In Ruby file, we can write `__builtin_func()` like method call.
However this is not a method call, but special syntax to call
a function `func()` written in C. C functions should be defined
in a file (same compile unit) which load this .rb file.

Functions (`func` in above example) should be defined with
  (a) 1st parameter: rb_execution_context_t *ec
  (b) rest parameters (0 to 15).
  (c) VALUE return type.
This is very similar requirements for functions used by
rb_define_method(), however `rb_execution_context_t *ec`
is new requirement.

(3) automatic C code generation from .rb files.

tool/mk_builtin_loader.rb creates a C code to load .rb files
needed by miniruby and ruby command. This script is run by
BASERUBY, so *.rb should be written in BASERUBY compatbile
syntax. This script load a .rb file and find all of __builtin_
prefix method calls, and generate a part of C code to export
functions.

tool/mk_builtin_binary.rb creates a C code which contains
binary compiled Ruby files needed by ruby command.
2019-11-08 09:09:29 +09:00
卜部昌平 d45a013a1a extend rb_call_cache
Prior to this changeset, majority of inline cache mishits resulted
into the same method entry when rb_callable_method_entry() resolves
a method search.  Let's not call the function at the first place on
such situations.

In doing so we extend the struct rb_call_cache from 44 bytes (in
case of 64 bit machine) to 64 bytes, and fill the gap with
secondary class serial(s).  Call cache's class serials now behavies
as a LRU cache.

Calculating -------------------------------------
                           ours         2.7         2.6
vm2_poly_same_method     2.339M      1.744M      1.369M i/s - 6.000M times in 2.565086s 3.441329s 4.381386s

Comparison:
             vm2_poly_same_method
                ours:   2339103.0 i/s
                 2.7:   1743512.3 i/s - 1.34x  slower
                 2.6:   1369429.8 i/s - 1.71x  slower
2019-11-07 17:41:30 +09:00
Alan Wu 89e7997622 Combine call info and cache to speed up method invocation
To perform a regular method call, the VM needs two structs,
`rb_call_info` and `rb_call_cache`. At the moment, we allocate these two
structures in separate buffers. In the worst case, the CPU needs to read
4 cache lines to complete a method call. Putting the two structures
together reduces the maximum number of cache line reads to 2.

Combining the structures also saves 8 bytes per call site as the current
layout uses separate two pointers for the call info and the call cache.
This saves about 2 MiB on Discourse.

This change improves the Optcarrot benchmark at least 3%. For more
details, see attached bugs.ruby-lang.org ticket.

Complications:
 - A new instruction attribute `comptime_sp_inc` is introduced to
 calculate SP increase at compile time without using call caches. At
 compile time, a `TS_CALLDATA` operand points to a call info struct, but
 at runtime, the same operand points to a call data struct. Instruction
 that explicitly define `sp_inc` also need to define `comptime_sp_inc`.
 - MJIT code for copying call cache becomes slightly more complicated.
 - This changes the bytecode format, which might break existing tools.

[Misc #16258]
2019-10-24 18:03:42 +09:00