This patch contains several ideas:
(1) Disposable inline method cache (IMC) for race-free inline method cache
* Making call-cache (CC) as a RVALUE (GC target object) and allocate new
CC on cache miss.
* This technique allows race-free access from parallel processing
elements like RCU.
(2) Introduce per-Class method cache (pCMC)
* Instead of fixed-size global method cache (GMC), pCMC allows flexible
cache size.
* Caching CCs reduces CC allocation and allow sharing CC's fast-path
between same call-info (CI) call-sites.
(3) Invalidate an inline method cache by invalidating corresponding method
entries (MEs)
* Instead of using class serials, we set "invalidated" flag for method
entry itself to represent cache invalidation.
* Compare with using class serials, the impact of method modification
(add/overwrite/delete) is small.
* Updating class serials invalidate all method caches of the class and
sub-classes.
* Proposed approach only invalidate the method cache of only one ME.
See [Feature #16614] for more details.
Now, rb_call_info contains how to call the method with tuple of
(mid, orig_argc, flags, kwarg). Most of cases, kwarg == NULL and
mid+argc+flags only requires 64bits. So this patch packed
rb_call_info to VALUE (1 word) on such cases. If we can not
represent it in VALUE, then use imemo_callinfo which contains
conventional callinfo (rb_callinfo, renamed from rb_call_info).
iseq->body->ci_kw_size is removed because all of callinfo is VALUE
size (packed ci or a pointer to imemo_callinfo).
To access ci information, we need to use these functions:
vm_ci_mid(ci), _flag(ci), _argc(ci), _kwarg(ci).
struct rb_call_info_kw_arg is renamed to rb_callinfo_kwarg.
rb_funcallv_with_cc() and rb_method_basic_definition_p_with_cc()
is temporary removed because cd->ci should be marked.
This behavior was deprecated in 2.7 and scheduled to be removed
in 3.0.
Calling yield in a class definition outside a method is now a
SyntaxError instead of a LocalJumpError, as well.
We were inefficient in cases where there are a lot of duplicates due to
the use of linear search. Use a hash table instead.
These cases are not that rare in the wild.
[Feature #16505]
Saves comitters' daily life by avoid #include-ing everything from
internal.h to make each file do so instead. This would significantly
speed up incremental builds.
We take the following inclusion order in this changeset:
1. "ruby/config.h", where _GNU_SOURCE is defined (must be the very
first thing among everything).
2. RUBY_EXTCONF_H if any.
3. Standard C headers, sorted alphabetically.
4. Other system headers, maybe guarded by #ifdef
5. Everything else, sorted alphabetically.
Exceptions are those win32-related headers, which tend not be self-
containing (headers have inclusion order dependencies).
On USE_LAZY_LOAD=1, the iseq should be loaded. So rb_iseq_check()
is needed. Furthermore, now lazy loading with builtin_function_table
is not supported, so it should cancel lazy loading.
`foo(*rest, post, **empty_kw)` is compiled like
`foo(*rest + [post, **empty_kw])`, and `**empty_kw` is removed by
"newarraykwsplat" instruction.
However, the method call still has a flag of KW_SPLAT, so "post" is
considered as a keyword hash, which caused a segfault.
Note that the flag cannot be removed if "empty_kw" is not always empty.
This change fixes the issue by compiling arguments with "newarray"
instead of "newarraykwsplat".
[Bug #16442]
Now, C functions written by __builtin_cexpr!(code) and others are
named as "__builtin_inline#{n}". However, it is difficult to know
what the function is. This patch rename them into
"__builtin_foo_#{lineno}" when cexpr! is in 'foo' method.
%p is for void *. Becuase fprintf is a function with variadic arguments
automatic cast from any pointer to void * does not work. We have to be
explicit.
(This is the second try of 036bc1da6c6c9b0fa9b7f5968d897a9554dd770e.)
If iseq is GC'ed, the pointer of iseq may be reused, which may hide a
deprecation warning of keyword argument change.
http://ci.rvm.jp/results/trunk-test1@phosphorus-docker/2474221
```
1) Failure:
TestKeywordArguments#test_explicit_super_kwsplat [/tmp/ruby/v2/src/trunk-test1/test/ruby/test_keyword.rb:549]:
--- expected
+++ actual
@@ -1 +1 @@
-/The keyword argument is passed as the last hash parameter.* for `m'/m
+""
```
This change ad-hocly adds iseq_unique_id for each iseq, and use it
instead of iseq pointer. This covers the case where caller is GC'ed.
Still, the case where callee is GC'ed, is not covered.
But anyway, it is very rare that iseq is GC'ed. Even when it occurs, it
just hides some warnings. It's no big deal.
This commit introduces an "inline ivar cache" struct. The reason we
need this is so compaction can differentiate from an ivar cache and a
regular inline cache. Regular inline caches contain references to
`VALUE` and ivar caches just contain references to the ivar index. With
this new struct we can easily update references for inline caches (but
not inline var caches as they just contain an int)
jump-jump optimization ignores the event flags of the jump instruction
being skipped, which leads to overlook of line events.
This changeset stops the wrong optimization when coverage measurement is
neabled and when the jump instruction has any event flag.
Note that this issue is not only for coverage but also for TracePoint,
and this change does not fix TracePoint.
However, fixing it fundamentally is tough (which requires revamp of
the compiler). This issue is critical in terms of coverage measurement,
but minor for TracePoint (ko1 said), so we here choose a stopgap
measurement.
[Bug #15980] [Bug #16397]
Note for backporters: this changeset can be viewed by `git diff -w`.
rename __builtin_inline!(code) to __builtin_cstmt(code).
Also this commit introduce the following inlining C code features.
* __builtin_cstmt!(STMT)
(renamed from __builtin_inline!)
Define a function which run STMT implicitly and call this function at
evatuation time. Note that you need to return some value in STMT.
If there is a local variables (includes method parameters), you can
read these values.
static VALUE func(ec, self) {
VALUE x = ...;
STMT
}
Usage:
def double a
# a is readable from C code.
__builtin_cstmt! 'return INT2FIX(FIX2INT(a) * 2);'
end
* __builtin_cexpr!(EXPR)
Define a function which invoke EXPR implicitly like `__builtin_cstmt!`.
Different from cstmt!, which compiled with `return EXPR;`.
(`return` and `;` are added implicitly)
static VALUE func(ec, self) {
VALUE x = ...;
return EXPPR;
}
Usage:
def double a
__builtin_cexpr! 'INT2FIX(FIX2INT(a) * 2)'
end
* __builtin_cconst!(EXPR)
Define a function which invoke EXPR implicitly like cexpr!.
However, the function is called once at compile time, not evaluated time.
Any local variables are not accessible (because there is no local variable
at compile time).
Usage:
GCC = __builtin_cconst! '__GNUC__'
* __builtin_cinit!(STMT)
STMT are writtein in auto-generated code.
This code does not return any value.
Usage:
__builtin_cinit! '#include <zlib.h>'
def no_compression?
__builtin_cconst! 'Z_NO_COMPRESSION ? Qtrue : Qfalse'
end
These functions are used from within a compilation unit so we can
make them static, for better binary size. This changeset reduces
the size of generated ruby binary from 26,590,128 bytes to
26,584,472 bytes on my macihne.
opt_invokebuiltin_delegate and opt_invokebuiltin_delegate_leave
invokes builtin functions with same parameters of the method.
This technique eliminate stack push operations. However, delegation
parameters should be completely same as given parameters.
(e.g. `def foo(a, b, c) __builtin_foo(a, b, c)` is okay, but
__builtin_foo(b, c) is not allowed)
This patch relaxes this restriction. ISeq has a local variables
table which includes parameters. For example, the method defined
as `def foo(a, b, c) x=y=nil`, then local variables table contains
[a, b, c, x, y]. If calling builtin-function with arguments which
are sub-array of the lvar table, use opt_invokebuiltin_delegate
instruction with start index. For example, `__builtin_foo(b, c)`,
`__builtin_bar(c, x, y)` is okay, and so on.
Fixes [Bug #16332]
Constant access was changed to no longer allow top-level constant access
through `nil`, but `defined?` wasn't changed at the same time to stay
consistent.
Use a separate defined type to distinguish between a constant
referenced from the current lexical scope and one referenced from
another namespace.
Add an experimental `__builtin_inline!(c_expression)` special intrinsic
which run a C code snippet.
In `c_expression`, you can access the following variables:
* ec (rb_execution_context_t *)
* self (const VALUE)
* local variables (const VALUE)
Not that you can read these variables, but you can not write them.
You need to return from this expression and return value will be a
result of __builtin_inline!().
Examples:
`def foo(x) __builtin_inline!('return rb_p(x);'); end` calls `p(x)`.
`def double(x) __builtin_inline!('return INT2NUM(NUM2INT(x) * 2);')`
returns x*2.
Support loading builtin features written in Ruby, which implement
with C builtin functions.
[Feature #16254]
Several features:
(1) Load .rb file at boottime with native binary.
Now, prelude.rb is loaded at boottime. However, this file is contained
into the interpreter as a text format and we need to compile it.
This patch contains a feature to load from binary format.
(2) __builtin_func() in Ruby call func() written in C.
In Ruby file, we can write `__builtin_func()` like method call.
However this is not a method call, but special syntax to call
a function `func()` written in C. C functions should be defined
in a file (same compile unit) which load this .rb file.
Functions (`func` in above example) should be defined with
(a) 1st parameter: rb_execution_context_t *ec
(b) rest parameters (0 to 15).
(c) VALUE return type.
This is very similar requirements for functions used by
rb_define_method(), however `rb_execution_context_t *ec`
is new requirement.
(3) automatic C code generation from .rb files.
tool/mk_builtin_loader.rb creates a C code to load .rb files
needed by miniruby and ruby command. This script is run by
BASERUBY, so *.rb should be written in BASERUBY compatbile
syntax. This script load a .rb file and find all of __builtin_
prefix method calls, and generate a part of C code to export
functions.
tool/mk_builtin_binary.rb creates a C code which contains
binary compiled Ruby files needed by ruby command.
Get rid of these redundant and useless warnings.
```
$ ruby -e 'def bar(a) a; end; def foo(...) bar(...) end; foo({})'
-e:1: warning: The last argument is used as the keyword parameter
-e:1: warning: for `foo' defined here
-e:1: warning: The keyword argument is passed as the last hash parameter
-e:1: warning: for `bar' defined here
```
To perform a regular method call, the VM needs two structs,
`rb_call_info` and `rb_call_cache`. At the moment, we allocate these two
structures in separate buffers. In the worst case, the CPU needs to read
4 cache lines to complete a method call. Putting the two structures
together reduces the maximum number of cache line reads to 2.
Combining the structures also saves 8 bytes per call site as the current
layout uses separate two pointers for the call info and the call cache.
This saves about 2 MiB on Discourse.
This change improves the Optcarrot benchmark at least 3%. For more
details, see attached bugs.ruby-lang.org ticket.
Complications:
- A new instruction attribute `comptime_sp_inc` is introduced to
calculate SP increase at compile time without using call caches. At
compile time, a `TS_CALLDATA` operand points to a call info struct, but
at runtime, the same operand points to a call data struct. Instruction
that explicitly define `sp_inc` also need to define `comptime_sp_inc`.
- MJIT code for copying call cache becomes slightly more complicated.
- This changes the bytecode format, which might break existing tools.
[Misc #16258]