A single quote "is representable either by itself or by the escape
sequence", according to ISO/IEC 9899 (checked all versions). So this is
not a bug fix. But the generated output is a bit readable without
backslashes.
Noticed that struct rb_builtin_function is a purely compile-time
constant. MJIT can eliminate some runtime calculations by statically
generate dedicated C code generator for each builtin functions.
which is checked by the first guard. When JIT-inlined cc and operand
cd->cc are different, the JIT-ed code might wrongly dispatch cd->cc even
while class check is done with another cc inlined by JIT.
This fixes SEGV on railsbench.
Use ID instead of GENTRY for gvars.
Global variables are compiled into GENTRY (a pointer to struct
rb_global_entry). This patch replace this GENTRY to ID and
make the code simple.
We need to search GENTRY from ID every time (st_lookup), so
additional overhead will be introduced.
However, the performance of accessing global variables is not
important now a day and this simplicity helps Ractor development.
for opt_* insns.
opt_eq handles rb_obj_equal inside opt_eq, and all other cfunc is
handled by opt_send_without_block. Therefore we can't decide which insn
should be generated by checking whether it's cfunc cc or not.
```
$ benchmark-driver -v --rbenv 'before --jit;after --jit' benchmark/mjit_opt_cc_insns.yml --repeat-count=4
before --jit: ruby 2.8.0dev (2020-06-26T05:21:43Z master 9dbc2294a6) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-06-26T06:30:18Z master 75cece1b0b) +JIT [x86_64-linux]
last_commit=Decide JIT-ed insn based on cached cfunc
Calculating -------------------------------------
before --jit after --jit
mjit_nil?(1) 73.878M 74.021M i/s - 40.000M times in 0.541432s 0.540391s
mjit_not(1) 72.635M 74.601M i/s - 40.000M times in 0.550702s 0.536187s
mjit_eq(1, nil) 7.331M 7.445M i/s - 8.000M times in 1.091211s 1.074596s
mjit_eq(nil, 1) 49.450M 64.711M i/s - 8.000M times in 0.161781s 0.123627s
Comparison:
mjit_nil?(1)
after --jit: 74020528.4 i/s
before --jit: 73878185.9 i/s - 1.00x slower
mjit_not(1)
after --jit: 74600882.0 i/s
before --jit: 72634507.6 i/s - 1.03x slower
mjit_eq(1, nil)
after --jit: 7444657.4 i/s
before --jit: 7331304.3 i/s - 1.02x slower
mjit_eq(nil, 1)
after --jit: 64710790.6 i/s
before --jit: 49449507.4 i/s - 1.31x slower
```
only for opt_nil_p and opt_not.
While vm_method_cfunc_is is used for opt_eq too, many fast paths of it
don't call it. So if it's populated, it should generate opt_send,
regardless of cfunc or not. And again, opt_neq isn't relevant due to the
difference in operands.
So opt_nil_p and opt_not are the only variants using vm_method_cfunc_is
like they use.
```
$ benchmark-driver -v --rbenv 'before2 --jit::ruby --jit;before --jit;after --jit' benchmark/mjit_opt_cc_insns.yml --repeat-count=4
before2 --jit: ruby 2.8.0dev (2020-06-22T08:37:37Z master 3238641750) +JIT [x86_64-linux]
before --jit: ruby 2.8.0dev (2020-06-23T01:01:24Z master 9ce2066209) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-06-23T06:58:37Z master 17e9df3157) +JIT [x86_64-linux]
last_commit=Avoid generating opt_send with cfunc cc with JIT
Calculating -------------------------------------
before2 --jit before --jit after --jit
mjit_nil?(1) 54.204M 75.536M 75.031M i/s - 40.000M times in 0.737947s 0.529548s 0.533110s
mjit_not(1) 53.822M 70.921M 71.920M i/s - 40.000M times in 0.743195s 0.564007s 0.556171s
mjit_eq(1, nil) 7.367M 6.496M 7.331M i/s - 8.000M times in 1.085882s 1.231470s 1.091327s
Comparison:
mjit_nil?(1)
before --jit: 75536059.3 i/s
after --jit: 75031409.4 i/s - 1.01x slower
before2 --jit: 54204431.6 i/s - 1.39x slower
mjit_not(1)
after --jit: 71920324.1 i/s
before --jit: 70921063.1 i/s - 1.01x slower
before2 --jit: 53821697.6 i/s - 1.34x slower
mjit_eq(1, nil)
before2 --jit: 7367280.0 i/s
after --jit: 7330527.4 i/s - 1.01x slower
before --jit: 6496302.8 i/s - 1.13x slower
```
because opt_nil/opt_not/opt_eq populates cc even when it doesn't
fallback to opt_send_without_block because of vm_method_cfunc_is.
```
$ benchmark-driver -v --rbenv 'before --jit;after --jit' benchmark/mjit_opt_cc_insns.yml --repeat-count=4
before --jit: ruby 2.8.0dev (2020-06-22T08:11:24Z master d231b8f95b) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-06-22T08:53:27Z master e1125879ed) +JIT [x86_64-linux]
last_commit=Compile opt_send for opt_* only when cc has ISeq
Calculating -------------------------------------
before --jit after --jit
mjit_nil?(1) 54.106M 73.693M i/s - 40.000M times in 0.739288s 0.542795s
mjit_not(1) 53.398M 74.477M i/s - 40.000M times in 0.749090s 0.537075s
mjit_eq(1, nil) 7.427M 6.497M i/s - 8.000M times in 1.077136s 1.231326s
Comparison:
mjit_nil?(1)
after --jit: 73692594.3 i/s
before --jit: 54106108.4 i/s - 1.36x slower
mjit_not(1)
after --jit: 74477487.9 i/s
before --jit: 53398125.0 i/s - 1.39x slower
mjit_eq(1, nil)
before --jit: 7427105.9 i/s
after --jit: 6497063.0 i/s - 1.14x slower
```
Actually opt_eq becomes slower by this. Maybe it's indeed using
opt_send_without_block, but I'll approach that one in another commit.
Even if local stack optimization is not used and values are written to
VM stack, the stack pointer itself may not be moved properly. So this
should be always moved on JIT cancellation.
By the way it's hard to write a test for this because if we try to
generate an interrupt, it will be a method call and it consumes the
interrupt by itself on popping a frame.
for VM_METHOD_TYPE_CFUNC.
This has been known to decrease optcarrot fps:
```
$ benchmark-driver -v --rbenv 'before --jit;after --jit' benchmark.yml --repeat-count=24 --output=all
before --jit: ruby 2.8.0dev (2020-04-13T16:25:13Z master fb40495cd9) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-04-13T23:23:11Z mjit-inline-c bdcd06d159) +JIT [x86_64-linux]
Calculating -------------------------------------
before --jit after --jit
Optcarrot Lan_Master.nes 66.38132676191719 67.41369177299630 fps
69.42728743772243 68.90327567263054
72.16028300263211 69.62605130880686
72.46631319102777 70.48818243767207
73.37078877002490 70.79522887347566
73.69422431217367 70.99021920193194
74.01471487018695 74.69931965402584
75.48685183295630 74.86714575949016
75.54445264507932 75.97864419721677
77.28089738169756 76.48908637569581
78.04183397891302 76.54320932488021
78.36807984096562 76.59407262898067
78.92898762543574 77.31316743361343
78.93576483233765 77.97153484180480
79.13754917503078 77.98478782102325
79.62648945850653 78.02263322726446
79.86334213878064 78.26333724045934
80.05100635898518 78.60056756355614
80.26186843769584 78.91082645644468
80.34205717020330 79.01226659142263
80.62286066044338 79.32733939423721
80.95883033058557 79.63793060542024
80.97376819251613 79.73108936622778
81.23050939202896 80.18280109433088
```
and I deleted this capability in an early stage of YARV-MJIT development:
0ab130feee
I suspect either of the following things could be the cause:
* Directly calling vm_call_cfunc requires more optimization effort in GCC,
resulting in 30ms-ish compilation time increase for such methods and
decreasing the number of methods compiled in a benchmarked period.
* Code size increase => icache miss hit
These hypotheses could be verified by some methodologies. However, I'd
like to introduce this regardless of the result because this blocks
inlining C method's definition.
I may revert this commit when I give up to implement inlining C method
definition, which requires this change.
Microbenchmark-wise, this gives slight performance improvement:
```
$ benchmark-driver -v --rbenv 'before --jit;after --jit' benchmark/mjit_send_cfunc.yml --repeat-count=4
before --jit: ruby 2.8.0dev (2020-04-13T16:25:13Z master fb40495cd9) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-04-13T23:23:11Z mjit-inline-c bdcd06d159) +JIT [x86_64-linux]
Calculating -------------------------------------
before --jit after --jit
mjit_send_cfunc 41.961M 56.489M i/s - 100.000M times in 2.383143s 1.770244s
Comparison:
mjit_send_cfunc
after --jit: 56489372.5 i/s
before --jit: 41961388.1 i/s - 1.35x slower
```
Fixing 4bcd5981e8/mjit.c (L338)
should be the right solution for this. We may not be able to free the cc immediately.
Plus, we're not copying cc but just holding references to be marked. cc
should be GC-ed once jit_unit is freed.
jit_unit to avoid marking wrong cc entries when inlined iseq is compiled
multiple times, resolving the TODO added by daf7c48d88.
This obviates pseudo jit_unit in inlined iseq introduced by 7ec2359374
and fixes memory leak of the adhoc unit.
This patch contains several ideas:
(1) Disposable inline method cache (IMC) for race-free inline method cache
* Making call-cache (CC) as a RVALUE (GC target object) and allocate new
CC on cache miss.
* This technique allows race-free access from parallel processing
elements like RCU.
(2) Introduce per-Class method cache (pCMC)
* Instead of fixed-size global method cache (GMC), pCMC allows flexible
cache size.
* Caching CCs reduces CC allocation and allow sharing CC's fast-path
between same call-info (CI) call-sites.
(3) Invalidate an inline method cache by invalidating corresponding method
entries (MEs)
* Instead of using class serials, we set "invalidated" flag for method
entry itself to represent cache invalidation.
* Compare with using class serials, the impact of method modification
(add/overwrite/delete) is small.
* Updating class serials invalidate all method caches of the class and
sub-classes.
* Proposed approach only invalidate the method cache of only one ME.
See [Feature #16614] for more details.
Now, rb_call_info contains how to call the method with tuple of
(mid, orig_argc, flags, kwarg). Most of cases, kwarg == NULL and
mid+argc+flags only requires 64bits. So this patch packed
rb_call_info to VALUE (1 word) on such cases. If we can not
represent it in VALUE, then use imemo_callinfo which contains
conventional callinfo (rb_callinfo, renamed from rb_call_info).
iseq->body->ci_kw_size is removed because all of callinfo is VALUE
size (packed ci or a pointer to imemo_callinfo).
To access ci information, we need to use these functions:
vm_ci_mid(ci), _flag(ci), _argc(ci), _kwarg(ci).
struct rb_call_info_kw_arg is renamed to rb_callinfo_kwarg.
rb_funcallv_with_cc() and rb_method_basic_definition_p_with_cc()
is temporary removed because cd->ci should be marked.
This commit introduces an "inline ivar cache" struct. The reason we
need this is so compaction can differentiate from an ivar cache and a
regular inline cache. Regular inline caches contain references to
`VALUE` and ivar caches just contain references to the ivar index. With
this new struct we can easily update references for inline caches (but
not inline var caches as they just contain an int)
Fixes [Bug #16332]
Constant access was changed to no longer allow top-level constant access
through `nil`, but `defined?` wasn't changed at the same time to stay
consistent.
Use a separate defined type to distinguish between a constant
referenced from the current lexical scope and one referenced from
another namespace.
Support loading builtin features written in Ruby, which implement
with C builtin functions.
[Feature #16254]
Several features:
(1) Load .rb file at boottime with native binary.
Now, prelude.rb is loaded at boottime. However, this file is contained
into the interpreter as a text format and we need to compile it.
This patch contains a feature to load from binary format.
(2) __builtin_func() in Ruby call func() written in C.
In Ruby file, we can write `__builtin_func()` like method call.
However this is not a method call, but special syntax to call
a function `func()` written in C. C functions should be defined
in a file (same compile unit) which load this .rb file.
Functions (`func` in above example) should be defined with
(a) 1st parameter: rb_execution_context_t *ec
(b) rest parameters (0 to 15).
(c) VALUE return type.
This is very similar requirements for functions used by
rb_define_method(), however `rb_execution_context_t *ec`
is new requirement.
(3) automatic C code generation from .rb files.
tool/mk_builtin_loader.rb creates a C code to load .rb files
needed by miniruby and ruby command. This script is run by
BASERUBY, so *.rb should be written in BASERUBY compatbile
syntax. This script load a .rb file and find all of __builtin_
prefix method calls, and generate a part of C code to export
functions.
tool/mk_builtin_binary.rb creates a C code which contains
binary compiled Ruby files needed by ruby command.
Prior to this changeset, majority of inline cache mishits resulted
into the same method entry when rb_callable_method_entry() resolves
a method search. Let's not call the function at the first place on
such situations.
In doing so we extend the struct rb_call_cache from 44 bytes (in
case of 64 bit machine) to 64 bytes, and fill the gap with
secondary class serial(s). Call cache's class serials now behavies
as a LRU cache.
Calculating -------------------------------------
ours 2.7 2.6
vm2_poly_same_method 2.339M 1.744M 1.369M i/s - 6.000M times in 2.565086s 3.441329s 4.381386s
Comparison:
vm2_poly_same_method
ours: 2339103.0 i/s
2.7: 1743512.3 i/s - 1.34x slower
2.6: 1369429.8 i/s - 1.71x slower
To perform a regular method call, the VM needs two structs,
`rb_call_info` and `rb_call_cache`. At the moment, we allocate these two
structures in separate buffers. In the worst case, the CPU needs to read
4 cache lines to complete a method call. Putting the two structures
together reduces the maximum number of cache line reads to 2.
Combining the structures also saves 8 bytes per call site as the current
layout uses separate two pointers for the call info and the call cache.
This saves about 2 MiB on Discourse.
This change improves the Optcarrot benchmark at least 3%. For more
details, see attached bugs.ruby-lang.org ticket.
Complications:
- A new instruction attribute `comptime_sp_inc` is introduced to
calculate SP increase at compile time without using call caches. At
compile time, a `TS_CALLDATA` operand points to a call info struct, but
at runtime, the same operand points to a call data struct. Instruction
that explicitly define `sp_inc` also need to define `comptime_sp_inc`.
- MJIT code for copying call cache becomes slightly more complicated.
- This changes the bytecode format, which might break existing tools.
[Misc #16258]