This commit adds rb_gc_mark_and_move which takes a pointer to an object
and marks it during marking phase and updates references during compaction.
This allows for marking and reference updating to be combined into a
single function, which reduces code duplication and prevents bugs if
marking and reference updating goes out of sync.
This commit also implements rb_gc_mark_and_move on iseq as an example.
With this change, we're storing the iv name on an inline cache on
setinstancevariable instructions. This allows us to check the inline
cache to count instance variables set in initialize and give us an
estimate of iv capacity for an object.
For the purpose of estimating the number of instance variables required
for an object, we're assuming that all initialize methods will call
`super`.
This change allows us to estimate the number of instance variables
required without disassembling instruction sequences.
Co-Authored-By: Aaron Patterson <tenderlove@ruby-lang.org>
Commit dba61f4 fixes a crash when GC'ing a iseq that failed to compile.
However, if we turn on RGENGC_CHECK_MODE then rb_iseq_memsize crashes
since it cannot handle an iseq without is_entries.
If there is a compilation error, is_entries may not be allocated, but
ic_size could be greater than 0. If we don't have a buffer to iterate
over, just return early. Otherwise GC could segv
[Bug #19173]
We can loosely predict the number of ivar sets on a class based on the
number of iv set instructions in the initialize method. This should give
us a more accurate estimate to use for initial size pool allocation,
which should in turn give us more cache hits.
This patch pushes dummy frames when loading code for the
profiling purpose.
The following methods push a dummy frame:
* `Kernel#require`
* `Kernel#load`
* `RubyVM::InstructionSequence.compile_file`
* `RubyVM::InstructionSequence.load_from_binary`
https://bugs.ruby-lang.org/issues/18559
Object Shapes is used for accessing instance variables and representing the
"frozenness" of objects. Object instances have a "shape" and the shape
represents some attributes of the object (currently which instance variables are
set and the "frozenness"). Shapes form a tree data structure, and when a new
instance variable is set on an object, that object "transitions" to a new shape
in the shape tree. Each shape has an ID that is used for caching. The shape
structure is independent of class, so objects of different types can have the
same shape.
For example:
```ruby
class Foo
def initialize
# Starts with shape id 0
@a = 1 # transitions to shape id 1
@b = 1 # transitions to shape id 2
end
end
class Bar
def initialize
# Starts with shape id 0
@a = 1 # transitions to shape id 1
@b = 1 # transitions to shape id 2
end
end
foo = Foo.new # `foo` has shape id 2
bar = Bar.new # `bar` has shape id 2
```
Both `foo` and `bar` instances have the same shape because they both set
instance variables of the same name in the same order.
This technique can help to improve inline cache hits as well as generate more
efficient machine code in JIT compilers.
This commit also adds some methods for debugging shapes on objects. See
`RubyVM::Shape` for more details.
For more context on Object Shapes, see [Feature: #18776]
Co-Authored-By: Aaron Patterson <tenderlove@ruby-lang.org>
Co-Authored-By: Eileen M. Uchitelle <eileencodes@gmail.com>
Co-Authored-By: John Hawthorn <john@hawthorn.email>
Object Shapes is used for accessing instance variables and representing the
"frozenness" of objects. Object instances have a "shape" and the shape
represents some attributes of the object (currently which instance variables are
set and the "frozenness"). Shapes form a tree data structure, and when a new
instance variable is set on an object, that object "transitions" to a new shape
in the shape tree. Each shape has an ID that is used for caching. The shape
structure is independent of class, so objects of different types can have the
same shape.
For example:
```ruby
class Foo
def initialize
# Starts with shape id 0
@a = 1 # transitions to shape id 1
@b = 1 # transitions to shape id 2
end
end
class Bar
def initialize
# Starts with shape id 0
@a = 1 # transitions to shape id 1
@b = 1 # transitions to shape id 2
end
end
foo = Foo.new # `foo` has shape id 2
bar = Bar.new # `bar` has shape id 2
```
Both `foo` and `bar` instances have the same shape because they both set
instance variables of the same name in the same order.
This technique can help to improve inline cache hits as well as generate more
efficient machine code in JIT compilers.
This commit also adds some methods for debugging shapes on objects. See
`RubyVM::Shape` for more details.
For more context on Object Shapes, see [Feature: #18776]
Co-Authored-By: Aaron Patterson <tenderlove@ruby-lang.org>
Co-Authored-By: Eileen M. Uchitelle <eileencodes@gmail.com>
Co-Authored-By: John Hawthorn <john@hawthorn.email>
Previously YARV bytecode implemented constant caching by having a pair
of instructions, opt_getinlinecache and opt_setinlinecache, wrapping a
series of getconstant calls (with putobject providing supporting
arguments).
This commit replaces that pattern with a new instruction,
opt_getconstant_path, handling both getting/setting the inline cache and
fetching the constant on a cache miss.
This is implemented by storing the full constant path as a
null-terminated array of IDs inside of the IC structure. idNULL is used
to signal an absolute constant reference.
$ ./miniruby --dump=insns -e '::Foo::Bar::Baz'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,13)> (catch: FALSE)
0000 opt_getconstant_path <ic:0 ::Foo::Bar::Baz> ( 1)[Li]
0002 leave
The motivation for this is that we had increasingly found the need to
disassemble the instructions between the opt_getinlinecache and
opt_setinlinecache in order to determine the constant we are fetching,
or otherwise store metadata.
This disassembly was done:
* In opt_setinlinecache, to register the IC against the constant names
it is using for granular invalidation.
* In rb_iseq_free, to unregister the IC from the invalidation table.
* In YJIT to find the position of a opt_getinlinecache instruction to
invalidate it when the cache is populated
* In YJIT to register the constant names being used for invalidation.
With this change we no longe need disassemly for these (in fact
rb_iseq_each is now unused), as the list of constant names being
referenced is held in the IC. This should also make it possible to make
more optimizations in the future.
This may also reduce the size of iseqs, as previously each segment
required 32 bytes (on 64-bit platforms) for each constant segment. This
implementation only stores one ID per-segment.
There should be no significant performance change between this and the
previous implementation. Previously opt_getinlinecache was a "leaf"
instruction, but it included a jump (almost always to a separate cache
line). Now opt_getconstant_path is a non-leaf (it may
raise/autoload/call const_missing) but it does not jump. These seem to
even out.
catch_excep_t is a field that exists for MJIT. In the process of
rewriting MJIT in Ruby, I added API to convert 1/0 of _Bool to
true/false, and it seemed confusing and hard to maintain if you
don't use _Bool for *_p fields.
* Simplify around `USE_YJIT` macro
- Use `USE_YJIT` macro only instead of `YJIT_BUILD`.
- An intermediate macro `YJIT_SUPPORTED_P` is no longer used.
* Bail out if YJIT is enabled on unsupported platforms
rb_ary_tmp_new suggests that the array is temporary in some way, but
that's not true, it just creates an array that's hidden and not on the
transient heap. This commit renames it to rb_ary_hidden_new.
We need to dump relative offsets for inline storage entries so that
loading iseqs as an array works as well. This commit also has some
minor refactoring to make computing relative ISE information easier.
This should fix the iseq dump / load as array tests we're seeing fail in
CI.
Co-Authored-By: John Hawthorn <john@hawthorn.email>
This commit adds a bitfield to the iseq body that stores offsets inside
the iseq buffer that contain values we need to mark. We can use this
bitfield to mark objects instead of disassembling the instructions.
This commit also groups inline storage entries and adds a counter for
each entry. This allows us to iterate and mark each entry without
disassembling instructions
Since we have a bitfield and grouped inline caches, we can mark all
VALUE objects associated with instructions without actually
disassembling the instructions at mark time.
[Feature #18875] [ruby-core:109042]
In December 2021, we opened an [issue] to solicit feedback regarding the
porting of the YJIT codebase from C99 to Rust. There were some
reservations, but this project was given the go ahead by Ruby core
developers and Matz. Since then, we have successfully completed the port
of YJIT to Rust.
The new Rust version of YJIT has reached parity with the C version, in
that it passes all the CRuby tests, is able to run all of the YJIT
benchmarks, and performs similarly to the C version (because it works
the same way and largely generates the same machine code). We've even
incorporated some design improvements, such as a more fine-grained
constant invalidation mechanism which we expect will make a big
difference in Ruby on Rails applications.
Because we want to be careful, YJIT is guarded behind a configure
option:
```shell
./configure --enable-yjit # Build YJIT in release mode
./configure --enable-yjit=dev # Build YJIT in dev/debug mode
```
By default, YJIT does not get compiled and cargo/rustc is not required.
If YJIT is built in dev mode, then `cargo` is used to fetch development
dependencies, but when building in release, `cargo` is not required,
only `rustc`. At the moment YJIT requires Rust 1.60.0 or newer.
The YJIT command-line options remain mostly unchanged, and more details
about the build process are documented in `doc/yjit/yjit.md`.
The CI tests have been updated and do not take any more resources than
before.
The development history of the Rust port is available at the following
commit for interested parties:
1fd9573d8b
Our hope is that Rust YJIT will be compiled and included as a part of
system packages and compiled binaries of the Ruby 3.2 release. We do not
anticipate any major problems as Rust is well supported on every
platform which YJIT supports, but to make sure that this process works
smoothly, we would like to reach out to those who take care of building
systems packages before the 3.2 release is shipped and resolve any
issues that may come up.
[issue]: https://bugs.ruby-lang.org/issues/18481
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
Co-authored-by: Noah Gibbs <the.codefolio.guy@gmail.com>
Co-authored-by: Kevin Newton <kddnewton@gmail.com>
This commit reintroduces finer-grained constant cache invalidation.
After 8008fb7 got merged, it was causing issues on token-threaded
builds (such as on Windows).
The issue was that when you're iterating through instruction sequences
and using the translator functions to get back the instruction structs,
you're either using `rb_vm_insn_null_translator` or
`rb_vm_insn_addr2insn2` depending if it's a direct-threading build.
`rb_vm_insn_addr2insn2` does some normalization to always return to
you the non-trace version of whatever instruction you're looking at.
`rb_vm_insn_null_translator` does not do that normalization.
This means that when you're looping through the instructions if you're
trying to do an opcode comparison, it can change depending on the type
of threading that you're using. This can be very confusing. So, this
commit creates a new translator function
`rb_vm_insn_normalizing_translator` to always return the non-trace
version so that opcode comparisons don't have to worry about different
configurations.
[Feature #18589]
This reverts commits for [Feature #18589]:
* 8008fb7352
"Update formatting per feedback"
* 8f6eaca2e1
"Delete ID from constant cache table if it becomes empty on ISEQ free"
* 629908586b
"Finer-grained inline constant cache invalidation"
MSWin builds on AppVeyor have been crashing since the merger.
Current behavior - caches depend on a global counter. All constant mutations cause caches to be invalidated.
```ruby
class A
B = 1
end
def foo
A::B # inline cache depends on global counter
end
foo # populate inline cache
foo # hit inline cache
C = 1 # global counter increments, all caches are invalidated
foo # misses inline cache due to `C = 1`
```
Proposed behavior - caches depend on name components. Only constant mutations with corresponding names will invalidate the cache.
```ruby
class A
B = 1
end
def foo
A::B # inline cache depends constants named "A" and "B"
end
foo # populate inline cache
foo # hit inline cache
C = 1 # caches that depend on the name "C" are invalidated
foo # hits inline cache because IC only depends on "A" and "B"
```
Examples of breaking the new cache:
```ruby
module C
# Breaks `foo` cache because "A" constant is set and the cache in foo depends
# on "A" and "B"
class A; end
end
B = 1
```
We expect the new cache scheme to be invalidated less often because names aren't frequently reused. With the cache being invalidated less, we can rely on its stability more to keep our constant references fast and reduce the need to throw away generated code in YJIT.
Use ISEQ_BODY macro to get the rb_iseq_constant_body of the ISeq. Using
this macro will make it easier for us to change the allocation strategy
of rb_iseq_constant_body when using Variable Width Allocation.
It free `rb_hook_list_t` itself if needed. To recognize the
need, this patch introduced `rb_hook_list_t::is_local` flag.
This patch is succession of https://github.com/ruby/ruby/pull/4652
The main impetus for this change is to fix [Bug #13392]. Previously, we
fired the "return" TracePoint event after popping the stack frame for
the block running as method (BMETHOD). This gave undesirable source
location outputs as the return event normally fires right before the
frame going away.
The iseq for each block can run both as a block and as a method. To
accommodate that, this commit makes vm_trace() fire call/return events for
instructions that have b_call/b_return events attached when the iseq is
running as a BMETHOD. The logic for rewriting to "trace_*" instruction
is tweaked so that when the user listens to call/return events,
instructions with b_call/b_return become trace variants.
To continue to provide the return value for non-local returns done using
the "return" or "break" keyword inside BMETHODs, the stack unwinding
code is tweaked. b_return events now provide the same return value as
return events for these non-local cases. A pre-existing test deemed not
providing a return value for these b_return events as a limitation.
This commit removes the checks for call/return TracePoint events that
happen when calling into BMETHODs when no TracePoints are active.
Technically, migrating just the return event is enough to fix the bug,
but migrating both call and return removes our reliance on
`VM_FRAME_FLAG_FINISH` and re-entering the interpreter when the caller
is already in the interpreter.
Compare with the C methods, A built-in methods written in Ruby is
slower if only mandatory parameters are given because it needs to
check the argumens and fill default values for optional and keyword
parameters (C methods can check the number of parameters with `argc`,
so there are no overhead). Passing mandatory arguments are common
(optional arguments are exceptional, in many cases) so it is important
to provide the fast path for such common cases.
`Primitive.mandatory_only?` is a special builtin function used with
`if` expression like that:
```ruby
def self.at(time, subsec = false, unit = :microsecond, in: nil)
if Primitive.mandatory_only?
Primitive.time_s_at1(time)
else
Primitive.time_s_at(time, subsec, unit, Primitive.arg!(:in))
end
end
```
and it makes two ISeq,
```
def self.at(time, subsec = false, unit = :microsecond, in: nil)
Primitive.time_s_at(time, subsec, unit, Primitive.arg!(:in))
end
def self.at(time)
Primitive.time_s_at1(time)
end
```
and (2) is pointed by (1). Note that `Primitive.mandatory_only?`
should be used only in a condition of an `if` statement and the
`if` statement should be equal to the methdo body (you can not
put any expression before and after the `if` statement).
A method entry with `mandatory_only?` (`Time.at` on the above case)
is marked as `iseq_overload`. When the method will be dispatch only
with mandatory arguments (`Time.at(0)` for example), make another
method entry with ISeq (2) as mandatory only method entry and it
will be cached in an inline method cache.
The idea is similar discussed in https://bugs.ruby-lang.org/issues/16254
but it only checks mandatory parameters or more, because many cases
only mandatory parameters are given. If we find other cases (optional
or keyword parameters are used frequently and it hurts performance),
we can extend the feature.
`RubyVM.keep_script_lines` enables to keep script lines
for each ISeq and AST. This feature is for debugger/REPL
support.
```ruby
RubyVM.keep_script_lines = true
RubyVM::keep_script_lines = true
eval("def foo = nil\ndef bar = nil")
pp RubyVM::InstructionSequence.of(method(:foo)).script_lines
```
This change fixes some cases where YJIT fails to fire tracing events.
Most of the situations YJIT did not handle correctly involves enabling
tracing while running inside generated code.
A new operation to invalidate all generated code is added, which uses
patching to make generated code exit at the next VM instruction
boundary. A new routine called `jit_prepare_routine_call()` is
introduced to facilitate this and should be used when generating code
that could allocate, or could otherwise use `RB_VM_LOCK_ENTER()`.
The `c_return` event is fired in the middle of an instruction as opposed
to at an instruction boundary, so it requires special handling. C method
call return points are patched to go to a fucntion which does everything
the interpreter does, including firing the `c_return` event. The
generated code for C method calls normally does not fire the event.
Invalided code should not change after patching so the exits are not
clobbered. A new variable is introduced to track the region of code that
should not change.
* Tie lifetime of uJIT blocks to iseqs
Blocks weren't being freed when iseqs are collected.
* Add rb_dary. Use it for method dependency table
* Keep track of blocks per iseq
Remove global version_tbl
* Block version bookkeeping fix
* dary -> darray
* free ujit_blocks
* comment about size of ujit_blocks
Insert generated addresses into st_table for mapping native code
addresses back to info about VM instructions. Export `encoded_insn_data`
to do this. Also some style fixes.
In vm_call_method_each_type, check for c_call and c_return events before
dispatching to vm_call_ivar and vm_call_attrset. With this approach, the
call cache will still dispatch directly to those functions, so this
change will only decrease performance for the first (uncached) call, and
even then, the performance decrease is very minimal.
This approach requires that we clear the call caches when tracing is
enabled or disabled. The approach currently switches all vm_call_ivar
and vm_call_attrset call caches to vm_call_general any time tracing is
enabled or disabled. So it could theoretically result in a slowdown for
code that constantly enables or disables tracing.
This approach does not handle targeted tracepoints, but from my testing,
c_call and c_return events are not supported for targeted tracepoints,
so that shouldn't matter.
This includes a benchmark showing the performance decrease is minimal
if detectable at all.
Fixes [Bug #16383]
Fixes [Bug #10470]
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
This changes Thread::Location::Backtrace#absolute_path to return
nil for methods/procs defined in eval. If the realpath of an iseq
is nil, that indicates it was defined in eval, in which case you
cannot use RubyVM::AbstractSyntaxTree.of.
Fixes [Bug #16983]
Co-authored-by: Koichi Sasada <ko1@atdot.net>
This broke coverage CI
```
1) Failure:
TestRequire#test_load_syntax_error [/home/runner/work/actions/actions/ruby/test/ruby/test_require.rb:228]:
Exception(SyntaxError) with message matches to /unexpected/.
[SyntaxError] exception expected, not #<TypeError: no implicit conversion of false into Integer>.
```
https://github.com/ruby/actions/runs/2914743968?check_suite_focus=true
RubyVM::AST.of(Thread::Backtrace::Location) returns a node that
corresponds to the location. Typically, the node is a method call, but
not always.
This change also includes iseq's dump/load support of node_ids for each
instructions.
by merging `rb_ast_body_t#line_count` and `#script_lines`.
Fortunately `line_count == RARRAY_LEN(script_lines)` was always
satisfied. When script_lines is saved, it has an array of lines, and
when not saved, it has a Fixnum that represents the old line_count.
... then, new_insn_core extracts nd_line(node).
Also, if a macro "EXPERIMENTAL_ISEQ_NODE_ID" is defined, this changeset
keeps nd_node_id(node) for each instruction. This is intended for
TypeProf to identify what AST::Node corresponds to each instruction.
This patch is originally authored by @yui-knk for showing which column a
NoMethodError occurred.
https://github.com/ruby/ruby/compare/master...yui-knk:feature/node_id
Co-Authored-By: Yuichiro Kaneko <yui-knk@ruby-lang.org>
We can take advantage of fstrings to de-duplicate the defined strings.
This means we don't need to keep the list of defined strings on the VM
(or register them as mark objects)
constant cache `IC` is accessed by non-atomic manner and there are
thread-safety issues, so Ruby 3.0 disables to use const cache on
non-main ractors.
This patch enables it by introducing `imemo_constcache` and allocates
it by every re-fill of const cache like `imemo_callcache`.
[Bug #17510]
Now `IC` only has one entry `IC::entry` and it points to
`iseq_inline_constant_cache_entry`, managed by T_IMEMO object.
`IC` is atomic data structure so `rb_mjit_before_vm_ic_update()` and
`rb_mjit_after_vm_ic_update()` is not needed.
This switches the internal function from rb_parser_compile_file_path
to rb_parser_load_file, which is the same internal method that
Kernel#load uses.
Fixes [Bug #17308]
If two or more tracepoints enabled with the same target and with
different target lines, the only last line is activated.
This patch fixes this issue by remaining existing trace instructions.
[Bug #17302]
iv_index_tbl manages instance variable indexes (ID -> index).
This data structure should be synchronized with other ractors
so introduce some VM locks.
This patch also introduced atomic ivar cache used by
set/getinlinecache instructions. To make updating ivar cache (IVC),
we changed iv_index_tbl data structure to manage (ID -> entry)
and an entry points serial and index. IVC points to this entry so
that cache update becomes atomically.
Use ID instead of GENTRY for gvars.
Global variables are compiled into GENTRY (a pointer to struct
rb_global_entry). This patch replace this GENTRY to ID and
make the code simple.
We need to search GENTRY from ID every time (st_lookup), so
additional overhead will be introduced.
However, the performance of accessing global variables is not
important now a day and this simplicity helps Ractor development.
If :return event is specified for a opt_invokebuiltin_delegate_leave
and leave combination, the instructions should be
opt_invokebuiltin_delegate
trace_return
instructions. To make it, opt_invokebuiltin_delegate_leave
instruction will be changed to opt_invokebuiltin_delegate even if
it is not an event target instruction.
CIs are created on-the-fly, which increases GC pressure. However they
include no references to other objects, and those on-the-fly CIs tend to
be short lived. Why not skip allocation of them. In doing so we need
to add a flag denotes the CI object does not reside inside of objspace.
With compiling `CPDEBUG >= 2`, `rb_iseq_disasm` segfaults if this
table has not been created. Also `ibf_load_iseq_each` calls
`rb_iseq_insns_info_encode_positions`.
Previously, passing a keyword splat to a method always allocated
a hash on the caller side, and accepting arbitrary keywords in
a method allocated a separate hash on the callee side. Passing
explicit keywords to a method that accepted a keyword splat
did not allocate a hash on the caller side, but resulted in two
hashes allocated on the callee side.
This commit makes passing a single keyword splat to a method not
allocate a hash on the caller side. Passing multiple keyword
splats or a mix of explicit keywords and a keyword splat still
generates a hash on the caller side. On the callee side,
if arbitrary keywords are not accepted, it does not allocate a
hash. If arbitrary keywords are accepted, it will allocate a
hash, but this commit uses a callinfo flag to indicate whether
the caller already allocated a hash, and if so, the callee can
use the passed hash without duplicating it. So this commit
should make it so that a maximum of a single hash is allocated
during method calls.
To set the callinfo flag appropriately, method call argument
compilation checks if only a single keyword splat is given.
If only one keyword splat is given, the VM_CALL_KW_SPLAT_MUT
callinfo flag is not set, since in that case the keyword
splat is passed directly and not mutable. If more than one
splat is used, a new hash needs to be generated on the caller
side, and in that case the callinfo flag is set, indicating
the keyword splat is mutable by the callee.
In compile_hash, used for both hash and keyword argument
compilation, if compiling keyword arguments and only a
single keyword splat is used, pass the argument directly.
On the caller side, in vm_args.c, the callinfo flag needs to
be recognized and handled. Because the keyword splat
argument may not be a hash, it needs to be converted to a
hash first if not. Then, unless the callinfo flag is set,
the hash needs to be duplicated. The temporary copy of the
callinfo flag, kw_flag, is updated if a hash was duplicated,
to prevent the need to duplicate it again. If we are
converting to a hash or duplicating a hash, we need to update
the argument array, which can including duplicating the
positional splat array if one was passed. CALLER_SETUP_ARG
and a couple other places needs to be modified to handle
similar issues for other types of calls.
This includes fairly comprehensive tests for different ways
keywords are handled internally, checking that you get equal
results but that keyword splats on the caller side result in
distinct objects for keyword rest parameters.
Included are benchmarks for keyword argument calls.
Brief results when compiled without optimization:
def kw(a: 1) a end
def kws(**kw) kw end
h = {a: 1}
kw(a: 1) # about same
kw(**h) # 2.37x faster
kws(a: 1) # 1.30x faster
kws(**h) # 2.19x faster
kw(a: 1, **h) # 1.03x slower
kw(**h, **h) # about same
kws(a: 1, **h) # 1.16x faster
kws(**h, **h) # 1.14x faster
This patch contains several ideas:
(1) Disposable inline method cache (IMC) for race-free inline method cache
* Making call-cache (CC) as a RVALUE (GC target object) and allocate new
CC on cache miss.
* This technique allows race-free access from parallel processing
elements like RCU.
(2) Introduce per-Class method cache (pCMC)
* Instead of fixed-size global method cache (GMC), pCMC allows flexible
cache size.
* Caching CCs reduces CC allocation and allow sharing CC's fast-path
between same call-info (CI) call-sites.
(3) Invalidate an inline method cache by invalidating corresponding method
entries (MEs)
* Instead of using class serials, we set "invalidated" flag for method
entry itself to represent cache invalidation.
* Compare with using class serials, the impact of method modification
(add/overwrite/delete) is small.
* Updating class serials invalidate all method caches of the class and
sub-classes.
* Proposed approach only invalidate the method cache of only one ME.
See [Feature #16614] for more details.
Now, rb_call_info contains how to call the method with tuple of
(mid, orig_argc, flags, kwarg). Most of cases, kwarg == NULL and
mid+argc+flags only requires 64bits. So this patch packed
rb_call_info to VALUE (1 word) on such cases. If we can not
represent it in VALUE, then use imemo_callinfo which contains
conventional callinfo (rb_callinfo, renamed from rb_call_info).
iseq->body->ci_kw_size is removed because all of callinfo is VALUE
size (packed ci or a pointer to imemo_callinfo).
To access ci information, we need to use these functions:
vm_ci_mid(ci), _flag(ci), _argc(ci), _kwarg(ci).
struct rb_call_info_kw_arg is renamed to rb_callinfo_kwarg.
rb_funcallv_with_cc() and rb_method_basic_definition_p_with_cc()
is temporary removed because cd->ci should be marked.
Let me quote ISO/IEC 9899:2018 section 6.5.15:
> Constraints
>
> The first operand shall have scalar type.
> One of the following shall hold for the second and third operands:
> — both operands have arithmetic type;
> — both operands have the same structure or union type;
> — both operands have void type;
(snip)
Here, `*option` is a const struct rb_compile_option_struct. OTOH
`COMPILE_OPTION_DEFAULT` is a struct rb_compile_option_struct, without
const. These two are _not_ the "same structure or union type". Hence
the expression renders undefined behaviour. COMPILE_OPTION_DEFAULT is
not a const because `RubyVM::InstructionSequence.compile_option=`
touches its internals on-the-fly. There is no way to meet the
constraints quoted above.
Using ternary operator here was a mistake at the first place. Let's
just replace it with a normal `if` statement.
Saves comitters' daily life by avoid #include-ing everything from
internal.h to make each file do so instead. This would significantly
speed up incremental builds.
We take the following inclusion order in this changeset:
1. "ruby/config.h", where _GNU_SOURCE is defined (must be the very
first thing among everything).
2. RUBY_EXTCONF_H if any.
3. Standard C headers, sorted alphabetically.
4. Other system headers, maybe guarded by #ifdef
5. Everything else, sorted alphabetically.
Exceptions are those win32-related headers, which tend not be self-
containing (headers have inclusion order dependencies).
(This is the second try of 036bc1da6c6c9b0fa9b7f5968d897a9554dd770e.)
If iseq is GC'ed, the pointer of iseq may be reused, which may hide a
deprecation warning of keyword argument change.
http://ci.rvm.jp/results/trunk-test1@phosphorus-docker/2474221
```
1) Failure:
TestKeywordArguments#test_explicit_super_kwsplat [/tmp/ruby/v2/src/trunk-test1/test/ruby/test_keyword.rb:549]:
--- expected
+++ actual
@@ -1 +1 @@
-/The keyword argument is passed as the last hash parameter.* for `m'/m
+""
```
This change ad-hocly adds iseq_unique_id for each iseq, and use it
instead of iseq pointer. This covers the case where caller is GC'ed.
Still, the case where callee is GC'ed, is not covered.
But anyway, it is very rare that iseq is GC'ed. Even when it occurs, it
just hides some warnings. It's no big deal.
If iseq is GC'ed, the pointer of iseq may be reused, which may hide a
deprecation warning of keyword argument change.
http://ci.rvm.jp/results/trunk-test1@phosphorus-docker/2474221
```
1) Failure:
TestKeywordArguments#test_explicit_super_kwsplat [/tmp/ruby/v2/src/trunk-test1/test/ruby/test_keyword.rb:549]:
--- expected
+++ actual
@@ -1 +1 @@
-/The keyword argument is passed as the last hash parameter.* for `m'/m
+""
```
This change ad-hocly adds iseq_unique_id for each iseq, and use it
instead of iseq pointer. This covers the case where caller is GC'ed.
Still, the case where callee is GC'ed, is not covered.
But anyway, it is very rare that iseq is GC'ed. Even when it occurs, it
just hides some warnings. It's no big deal.
This commit introduces an "inline ivar cache" struct. The reason we
need this is so compaction can differentiate from an ivar cache and a
regular inline cache. Regular inline caches contain references to
`VALUE` and ivar caches just contain references to the ivar index. With
this new struct we can easily update references for inline caches (but
not inline var caches as they just contain an int)
These functions are used from within a compilation unit so we can
make them static, for better binary size. This changeset reduces
the size of generated ruby binary from 26,590,128 bytes to
26,584,472 bytes on my macihne.
We incorrectly assumed that the `file` argument should be the file name and
caused https://github.com/scoutapp/scout_apm_ruby/issues/307 because
exception backtrace did not contain correct path. This documentation
clarifies the role of the different arguments and provides extra
examples.
This removes the security features added by $SAFE = 1, and warns for access
or modification of $SAFE from Ruby-level, as well as warning when calling
all public C functions related to $SAFE.
This modifies some internal functions that took a safe level argument
to no longer take the argument.
rb_require_safe now warns, rb_require_string has been added as a
version that takes a VALUE and does not warn.
One public C function that still takes a safe level argument and that
this doesn't warn for is rb_eval_cmd. We may want to consider
adding an alternative method that does not take a safe level argument,
and warn for rb_eval_cmd.
Looking at the list of symbols inside of libruby-static.a, I found
hundreds of functions that are defined, but used from nowhere.
There can be reasons for each of them (e.g. some functions are
specific to some platform, some are useful when debugging, etc).
However it seems the functions deleted here exist for no reason.
This changeset reduces the size of ruby binary from 26,671,456
bytes to 26,592,864 bytes on my machine.
Fixes [Bug #16332]
Constant access was changed to no longer allow top-level constant access
through `nil`, but `defined?` wasn't changed at the same time to stay
consistent.
Use a separate defined type to distinguish between a constant
referenced from the current lexical scope and one referenced from
another namespace.
Support loading builtin features written in Ruby, which implement
with C builtin functions.
[Feature #16254]
Several features:
(1) Load .rb file at boottime with native binary.
Now, prelude.rb is loaded at boottime. However, this file is contained
into the interpreter as a text format and we need to compile it.
This patch contains a feature to load from binary format.
(2) __builtin_func() in Ruby call func() written in C.
In Ruby file, we can write `__builtin_func()` like method call.
However this is not a method call, but special syntax to call
a function `func()` written in C. C functions should be defined
in a file (same compile unit) which load this .rb file.
Functions (`func` in above example) should be defined with
(a) 1st parameter: rb_execution_context_t *ec
(b) rest parameters (0 to 15).
(c) VALUE return type.
This is very similar requirements for functions used by
rb_define_method(), however `rb_execution_context_t *ec`
is new requirement.
(3) automatic C code generation from .rb files.
tool/mk_builtin_loader.rb creates a C code to load .rb files
needed by miniruby and ruby command. This script is run by
BASERUBY, so *.rb should be written in BASERUBY compatbile
syntax. This script load a .rb file and find all of __builtin_
prefix method calls, and generate a part of C code to export
functions.
tool/mk_builtin_binary.rb creates a C code which contains
binary compiled Ruby files needed by ruby command.
We need to ensure that labels are pinned while disassembling. If the
compactor runs during disassembly, references to these labels could go
bad, so this commit just ensures that the labels can't move until we're
done.
To perform a regular method call, the VM needs two structs,
`rb_call_info` and `rb_call_cache`. At the moment, we allocate these two
structures in separate buffers. In the worst case, the CPU needs to read
4 cache lines to complete a method call. Putting the two structures
together reduces the maximum number of cache line reads to 2.
Combining the structures also saves 8 bytes per call site as the current
layout uses separate two pointers for the call info and the call cache.
This saves about 2 MiB on Discourse.
This change improves the Optcarrot benchmark at least 3%. For more
details, see attached bugs.ruby-lang.org ticket.
Complications:
- A new instruction attribute `comptime_sp_inc` is introduced to
calculate SP increase at compile time without using call caches. At
compile time, a `TS_CALLDATA` operand points to a call info struct, but
at runtime, the same operand points to a call data struct. Instruction
that explicitly define `sp_inc` also need to define `comptime_sp_inc`.
- MJIT code for copying call cache becomes slightly more complicated.
- This changes the bytecode format, which might break existing tools.
[Misc #16258]