Compare with the C methods, A built-in methods written in Ruby is
slower if only mandatory parameters are given because it needs to
check the argumens and fill default values for optional and keyword
parameters (C methods can check the number of parameters with `argc`,
so there are no overhead). Passing mandatory arguments are common
(optional arguments are exceptional, in many cases) so it is important
to provide the fast path for such common cases.
`Primitive.mandatory_only?` is a special builtin function used with
`if` expression like that:
```ruby
def self.at(time, subsec = false, unit = :microsecond, in: nil)
if Primitive.mandatory_only?
Primitive.time_s_at1(time)
else
Primitive.time_s_at(time, subsec, unit, Primitive.arg!(:in))
end
end
```
and it makes two ISeq,
```
def self.at(time, subsec = false, unit = :microsecond, in: nil)
Primitive.time_s_at(time, subsec, unit, Primitive.arg!(:in))
end
def self.at(time)
Primitive.time_s_at1(time)
end
```
and (2) is pointed by (1). Note that `Primitive.mandatory_only?`
should be used only in a condition of an `if` statement and the
`if` statement should be equal to the methdo body (you can not
put any expression before and after the `if` statement).
A method entry with `mandatory_only?` (`Time.at` on the above case)
is marked as `iseq_overload`. When the method will be dispatch only
with mandatory arguments (`Time.at(0)` for example), make another
method entry with ISeq (2) as mandatory only method entry and it
will be cached in an inline method cache.
The idea is similar discussed in https://bugs.ruby-lang.org/issues/16254
but it only checks mandatory parameters or more, because many cases
only mandatory parameters are given. If we find other cases (optional
or keyword parameters are used frequently and it hurts performance),
we can extend the feature.
`RubyVM.keep_script_lines` enables to keep script lines
for each ISeq and AST. This feature is for debugger/REPL
support.
```ruby
RubyVM.keep_script_lines = true
RubyVM::keep_script_lines = true
eval("def foo = nil\ndef bar = nil")
pp RubyVM::InstructionSequence.of(method(:foo)).script_lines
```
When YJIT make calls to routines without reconstructing interpreter
state through jit_prepare_routine_call(), it relies on the routine to
never allocate, raise, and push/pop control frames. Comment about this
on the routines that YJTI calls.
This is probably something we should dynamically verify on debug builds.
It's hard to statically verify this as it requires verifying all
functions in the call tree. Maybe something to look at in the future.
Make sure `opt_getinlinecache` is in a block all on its own, and
invalidate it from the interpreter when `opt_setinlinecache`.
It will recompile with a filled cache the second time around.
This lets YJIT runs well when the IC for constant is cold.
* Tie lifetime of uJIT blocks to iseqs
Blocks weren't being freed when iseqs are collected.
* Add rb_dary. Use it for method dependency table
* Keep track of blocks per iseq
Remove global version_tbl
* Block version bookkeeping fix
* dary -> darray
* free ujit_blocks
* comment about size of ujit_blocks
This fixes issues with paths being loaded twice in certain cases
when symlinks are used.
It took me multiple attempts to get this working. My original
attempt tried to convert paths to realpaths before adding them
to $LOADED_FEATURES. Unfortunately, this doesn't work well
with the loaded feature index, which is based off load paths
and not realpaths. While I was able to get require working, I'm
fairly sure the loaded feature index was not being used as
expected, which would have significant performance implications.
Additionally, I was never able to get that approach working with
autoload when autoloading a non-realpath file. It also broke
some specs.
This takes a more conservative approach. Directly before loading the
file, if the file with the same realpath has been required, the
loading of the file is skipped. The realpaths are stored as
fstrings in a hidden hash.
When rebuilding the loaded feature index, the hash of realpaths
is also rebuilt. I'm guessing this makes rebuilding process
slower, but I don think that is a hot path. In general, modifying
loaded features is only done when reloading, and that tends to be
in non-production environments.
Change test_require_with_loaded_features_pop test to use 30 threads
and 300 iterations, instead of 4 threads and 1000 iterations.
I saw only sporadic failures with 4/1000, but consistent failures
30/300 threads. These failures were due to the fact that the
concurrent deletions from $LOADED_FEATURES in other threads can
result in rb_ary_entry returning nil when rebuilding the loaded
features index.
To avoid concurrency issues when rebuilding the loaded features
index, the building of the index itself is left alone, and
afterwards, a separate loop is done on a copy of the loaded feature
snapshot in order to rebuild the realpaths hash.
Fixes [Bug #17885]
This fixes issues with paths being loaded twice in certain cases
when symlinks are used.
It took me multiple attempts to get this working. My original
attempt tried to convert paths to realpaths before adding them
to $LOADED_FEATURES. Unfortunately, this doesn't work well
with the loaded feature index, which is based off load paths
and not realpaths. While I was able to get require working, I'm
fairly sure the loaded feature index was not being used as
expected, which would have significant performance implications.
Additionally, I was never able to get that approach working with
autoload when autoloading a non-realpath file. It also broke
some specs.
This takes a more conservative approach. Directly before loading the
file, if the file with the same realpath has been required, the
loading of the file is skipped. The realpaths are stored as
fstrings in a hidden hash.
When rebuilding the loaded feature index, the hash of realpaths
is also rebuilt. I'm guessing this makes rebuilding process
slower, but I don think that is a hot path. In general, modifying
loaded features is only done when reloading, and that tends to be
in non-production environments.
Change test_require_with_loaded_features_pop test to use 30 threads
and 300 iterations, instead of 4 threads and 1000 iterations.
I saw only sporadic failures with 4/1000, but consistent failures
30/300 threads. These failures were due to the fact that the
concurrent deletions from $LOADED_FEATURES in other threads can
result in rb_ary_entry returning nil when rebuilding the loaded
features index.
To avoid concurrency issues when rebuilding the loaded features
index, the building of the index itself is left alone, and
afterwards, a separate loop is done on a copy of the loaded feature
snapshot in order to rebuild the realpaths hash.
Fixes [Bug #17885]
The altstack memory of a thread may be free'ed even after the VM is
destructed. After that, GC is no longer available, so calling xfree
may lead to a segfault.
This changeset uses the bare free function to free the altstack memory
instead of xfree. [Bug #18126]
Redo of 34a2acdac788602c14bf05fb616215187badd504 and
931138b00696419945dc03e10f033b1f53cd50f3 which were reverted.
GitHub PR #4340.
This change implements a cache for class variables. Previously there was
no cache for cvars. Cvar access is slow due to needing to travel all the
way up th ancestor tree before returning the cvar value. The deeper the
ancestor tree the slower cvar access will be.
The benefits of the cache are more visible with a higher number of
included modules due to the way Ruby looks up class variables. The
benchmark here includes 26 modules and shows with the cache, this branch
is 6.5x faster when accessing class variables.
```
compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105c) [x86_64-darwin19]
built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be009) [x86_64-darwin19]
| |compare-ruby|built-ruby|
|:--------|-----------:|---------:|
|vm_cvar | 5.681M| 36.980M|
| | -| 6.51x|
```
Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails
application. ActiveRecord::Base.logger has 71 ancestors. The more
ancestors a tree has, the more clear the speed increase. IE if Base had
only one ancestor we'd see no improvement. This benchmark is run on a
vanilla Rails application.
Benchmark code:
```ruby
require "benchmark/ips"
require_relative "config/environment"
Benchmark.ips do |x|
x.report "logger" do
ActiveRecord::Base.logger
end
end
```
Ruby 3.0 master / Rails 6.1:
```
Warming up --------------------------------------
logger 155.251k i/100ms
Calculating -------------------------------------
```
Ruby 3.0 with cvar cache / Rails 6.1:
```
Warming up --------------------------------------
logger 1.546M i/100ms
Calculating -------------------------------------
logger 14.857M (± 4.8%) i/s - 74.198M in 5.006202s
```
Lastly we ran a benchmark to demonstate the difference between master
and our cache when the number of modules increases. This benchmark
measures 1 ancestor, 30 ancestors, and 100 ancestors.
Ruby 3.0 master:
```
Warming up --------------------------------------
1 module 1.231M i/100ms
30 modules 432.020k i/100ms
100 modules 145.399k i/100ms
Calculating -------------------------------------
1 module 12.210M (± 2.1%) i/s - 61.553M in 5.043400s
30 modules 4.354M (± 2.7%) i/s - 22.033M in 5.063839s
100 modules 1.434M (± 2.9%) i/s - 7.270M in 5.072531s
Comparison:
1 module: 12209958.3 i/s
30 modules: 4354217.8 i/s - 2.80x (± 0.00) slower
100 modules: 1434447.3 i/s - 8.51x (± 0.00) slower
```
Ruby 3.0 with cvar cache:
```
Warming up --------------------------------------
1 module 1.641M i/100ms
30 modules 1.655M i/100ms
100 modules 1.620M i/100ms
Calculating -------------------------------------
1 module 16.279M (± 3.8%) i/s - 82.038M in 5.046923s
30 modules 15.891M (± 3.9%) i/s - 79.459M in 5.007958s
100 modules 16.087M (± 3.6%) i/s - 81.005M in 5.041931s
Comparison:
1 module: 16279458.0 i/s
100 modules: 16087484.6 i/s - same-ish: difference falls within error
30 modules: 15891406.2 i/s - same-ish: difference falls within error
```
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
This change implements a cache for class variables. Previously there was
no cache for cvars. Cvar access is slow due to needing to travel all the
way up th ancestor tree before returning the cvar value. The deeper the
ancestor tree the slower cvar access will be.
The benefits of the cache are more visible with a higher number of
included modules due to the way Ruby looks up class variables. The
benchmark here includes 26 modules and shows with the cache, this branch
is 6.5x faster when accessing class variables.
```
compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105ca45) [x86_64-darwin19]
built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be0093ae) [x86_64-darwin19]
| |compare-ruby|built-ruby|
|:--------|-----------:|---------:|
|vm_cvar | 5.681M| 36.980M|
| | -| 6.51x|
```
Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails
application. ActiveRecord::Base.logger has 71 ancestors. The more
ancestors a tree has, the more clear the speed increase. IE if Base had
only one ancestor we'd see no improvement. This benchmark is run on a
vanilla Rails application.
Benchmark code:
```ruby
require "benchmark/ips"
require_relative "config/environment"
Benchmark.ips do |x|
x.report "logger" do
ActiveRecord::Base.logger
end
end
```
Ruby 3.0 master / Rails 6.1:
```
Warming up --------------------------------------
logger 155.251k i/100ms
Calculating -------------------------------------
```
Ruby 3.0 with cvar cache / Rails 6.1:
```
Warming up --------------------------------------
logger 1.546M i/100ms
Calculating -------------------------------------
logger 14.857M (± 4.8%) i/s - 74.198M in 5.006202s
```
Lastly we ran a benchmark to demonstate the difference between master
and our cache when the number of modules increases. This benchmark
measures 1 ancestor, 30 ancestors, and 100 ancestors.
Ruby 3.0 master:
```
Warming up --------------------------------------
1 module 1.231M i/100ms
30 modules 432.020k i/100ms
100 modules 145.399k i/100ms
Calculating -------------------------------------
1 module 12.210M (± 2.1%) i/s - 61.553M in 5.043400s
30 modules 4.354M (± 2.7%) i/s - 22.033M in 5.063839s
100 modules 1.434M (± 2.9%) i/s - 7.270M in 5.072531s
Comparison:
1 module: 12209958.3 i/s
30 modules: 4354217.8 i/s - 2.80x (± 0.00) slower
100 modules: 1434447.3 i/s - 8.51x (± 0.00) slower
```
Ruby 3.0 with cvar cache:
```
Warming up --------------------------------------
1 module 1.641M i/100ms
30 modules 1.655M i/100ms
100 modules 1.620M i/100ms
Calculating -------------------------------------
1 module 16.279M (± 3.8%) i/s - 82.038M in 5.046923s
30 modules 15.891M (± 3.9%) i/s - 79.459M in 5.007958s
100 modules 16.087M (± 3.6%) i/s - 81.005M in 5.041931s
Comparison:
1 module: 16279458.0 i/s
100 modules: 16087484.6 i/s - same-ish: difference falls within error
30 modules: 15891406.2 i/s - same-ish: difference falls within error
```
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
We can take advantage of fstrings to de-duplicate the defined strings.
This means we don't need to keep the list of defined strings on the VM
(or register them as mark objects)
rb_funcall* (rb_funcall(), rb_funcallv(), ...) functions invokes
Ruby's method with given receiver. Ruby 2.7 introduced inline method
cache with static memory area. However, Ruby 3.0 reimplemented the
method cache data structures and the inline cache was removed.
Without inline cache, rb_funcall* searched methods everytime.
Most of cases per-Class Method Cache (pCMC) will be helped but
pCMC requires VM-wide locking and it hurts performance on
multi-Ractor execution, especially all Ractors calls methods
with rb_funcall*.
This patch introduced Global Call-Cache Cache Table (gccct) for
rb_funcall*. Call-Cache was introduced from Ruby 3.0 to manage
method cache entry atomically and gccct enables method-caching
without VM-wide locking. This table solves the performance issue
on multi-ractor execution.
[Bug #17497]
Ruby-level method invocation does not use gccct because it has
inline-method-cache and the table size is limited. Basically
rb_funcall* is not used frequently, so 1023 entries can be enough.
We will revisit the table size if it is not enough.
constant cache `IC` is accessed by non-atomic manner and there are
thread-safety issues, so Ruby 3.0 disables to use const cache on
non-main ractors.
This patch enables it by introducing `imemo_constcache` and allocates
it by every re-fill of const cache like `imemo_callcache`.
[Bug #17510]
Now `IC` only has one entry `IC::entry` and it points to
`iseq_inline_constant_cache_entry`, managed by T_IMEMO object.
`IC` is atomic data structure so `rb_mjit_before_vm_ic_update()` and
`rb_mjit_after_vm_ic_update()` is not needed.
separate some fields from rb_ractor_t to rb_ractor_pub and put it
at the beggining of rb_ractor_t and declare it in vm_core.h so
vm_core.h can access rb_ractor_pub fields.
Now rb_ec_ractor_hooks() is a complete inline function and no
MJIT related issue.
Ractor has several restrictions to keep each ractor being isolated
and some operation such as `CONST="foo"` in non-main ractor raises
an exception. This kind of operation raises an error but there is
confusion (some code raises RuntimeError and some code raises
NameError).
To make clear we introduce Ractor::IsolationError which is raised
when the isolation between ractors is violated.
`cd` is passed to method call functions to method invocation
functions, but `cd` can be manipulated by other ractors simultaneously
so it contains thread-safety issue.
To solve this issue, this patch stores `ci` and found `cc` to `calling`
and stops to pass `cd`.
On windows, MJIT doesn't work without this patch because of
the declaration of ruby_single_main_ractor. This patch fix this
issue and move the definition of it from ractor.c to vm.c to locate
near place of ruby_current_vm_ptr.
ruby_multi_ractor was a flag that indicates the interpreter doesn't
make any additional ractors (single ractor mode).
Instead of boolean flag, ruby_single_main_ractor pointer is introduced
which keeps main ractor's pointer if single ractor mode. If additional
ractors are created, ruby_single_main_ractor becomes NULL.
C extensions can violate the ractor-safety, so only ractor-safe
C extensions (C methods) can run on non-main ractors.
rb_ext_ractor_safe(true) declares that the successive
defined methods are ractor-safe. Otherwiwze, defined methods
checked they are invoked in main ractor and raise an error
if invoked at non-main ractors.
[Feature #17307]
The timer function used on windows system set timer interrupt
flag of current main ractor's executing ec and thread can detect
the end of time slice. However, to set all ec->interrupt_flag for
all running ractors, it is requires to synchronize with other ractors.
However, timer thread can not acquire the ractor-wide lock because
of some limitation.
To solve this issue, this patch introduces USE_VM_CLOCK compile option
to introduce rb_vm_t::clock. This clock will be incremented by the
timer thread and each thread can check the incrementing by comparison
with previous checked clock. At last, on windows platform this patch
introduces some overhead, but I think there is no critical performance
issue because of this modification.
Ractor.make_shareable() supports Proc object if
(1) a Proc only read outer local variables (no assignments)
(2) read outer local variables are shareable.
Read local variables are stored in a snapshot, so after making
shareable Proc, any assignments are not affeect like that:
```ruby
a = 1
pr = Ractor.make_shareable(Proc.new{p a})
pr.call #=> 1
a = 2
pr.call #=> 1 # `a = 2` doesn't affect
```
[Feature #17284]
Setting this to true disables the deadlock detector. It should
only be used in cases where the deadlock could be broken via some
external means, such as via a signal.
Now that $SAFE is no longer used, replace the safe_level_ VM flag
with ignore_deadlock for storing the setting.
Fixes [Bug #13768]
To access TLS, it is faster to use language TLS specifier instead
of using pthread_get/setspecific functions.
Original proposal is: Use native thread locals. #3665
iv_index_tbl manages instance variable indexes (ID -> index).
This data structure should be synchronized with other ractors
so introduce some VM locks.
This patch also introduced atomic ivar cache used by
set/getinlinecache instructions. To make updating ivar cache (IVC),
we changed iv_index_tbl data structure to manage (ID -> entry)
and an entry points serial and index. IVC points to this entry so
that cache update becomes atomically.
(1) recorded_lock_rec > current_lock_rec should not be occurred
on rb_ec_vm_lock_rec_release().
(2) should be release VM lock at EXEC_TAG(), not POP_TAG().
(3) some refactoring.
If a ractor getting a VM lock (monitor) raises an exception,
unlock can be skipped. To release VM lock correctly on exception
(or other jumps with JUMP_TAG), EC_POP_TAG() releases VM lock.
This commit introduces Ractor mechanism to run Ruby program in
parallel. See doc/ractor.md for more details about Ractor.
See ticket [Feature #17100] to see the implementation details
and discussions.
[Feature #17100]
This commit does not complete the implementation. You can find
many bugs on using Ractor. Also the specification will be changed
so that this feature is experimental. You will see a warning when
you make the first Ractor with `Ractor.new`.
I hope this feature can help programmers from thread-safety issues.
If the thread for the current EC has been killed, don't check
the VM ptr for the EC (which gets it via the thread), as that will
have already been freed.
Fixes [Bug #16907]
According to MSVC manual (*1), cl.exe can skip including a header file
when that:
- contains #pragma once, or
- starts with #ifndef, or
- starts with #if ! defined.
GCC has a similar trick (*2), but it acts more stricter (e. g. there
must be _no tokens_ outside of #ifndef...#endif).
Sun C lacked #pragma once for a looong time. Oracle Developer Studio
12.5 finally implemented it, but we cannot assume such recent version.
This changeset modifies header files so that each of them include
strictly one #ifndef...#endif. I believe this is the most portable way
to trigger compiler optimizations. [Bug #16770]
*1: https://docs.microsoft.com/en-us/cpp/preprocessor/once
*2: https://gcc.gnu.org/onlinedocs/cppinternals/Guard-Macros.html
A new (not-initialized-yet) pthread attempts to allocate sigaltstack by
using xmalloc. It may cause GC, but because the thread is not
initialized yet, ruby_native_thread_p() returns false, which leads to
"[FATAL] failed to allocate memory" and exit.
In fact, we can observe the error message in the log of OpenBSD CI:
https://rubyci.org/logs/rubyci.s3.amazonaws.com/openbsd-current/ruby-master/log/20200306T083005Z.log.html.gz
This changeset allocates sigaltstack before pthread is created.
This patch contains several ideas:
(1) Disposable inline method cache (IMC) for race-free inline method cache
* Making call-cache (CC) as a RVALUE (GC target object) and allocate new
CC on cache miss.
* This technique allows race-free access from parallel processing
elements like RCU.
(2) Introduce per-Class method cache (pCMC)
* Instead of fixed-size global method cache (GMC), pCMC allows flexible
cache size.
* Caching CCs reduces CC allocation and allow sharing CC's fast-path
between same call-info (CI) call-sites.
(3) Invalidate an inline method cache by invalidating corresponding method
entries (MEs)
* Instead of using class serials, we set "invalidated" flag for method
entry itself to represent cache invalidation.
* Compare with using class serials, the impact of method modification
(add/overwrite/delete) is small.
* Updating class serials invalidate all method caches of the class and
sub-classes.
* Proposed approach only invalidate the method cache of only one ME.
See [Feature #16614] for more details.
Now, rb_call_info contains how to call the method with tuple of
(mid, orig_argc, flags, kwarg). Most of cases, kwarg == NULL and
mid+argc+flags only requires 64bits. So this patch packed
rb_call_info to VALUE (1 word) on such cases. If we can not
represent it in VALUE, then use imemo_callinfo which contains
conventional callinfo (rb_callinfo, renamed from rb_call_info).
iseq->body->ci_kw_size is removed because all of callinfo is VALUE
size (packed ci or a pointer to imemo_callinfo).
To access ci information, we need to use these functions:
vm_ci_mid(ci), _flag(ci), _argc(ci), _kwarg(ci).
struct rb_call_info_kw_arg is renamed to rb_callinfo_kwarg.
rb_funcallv_with_cc() and rb_method_basic_definition_p_with_cc()
is temporary removed because cd->ci should be marked.
This removes the warnings added in 2.7, and changes the behavior
so that a final positional hash is not treated as keywords or
vice-versa.
To handle the arg_setup_block splat case correctly with keyword
arguments, we need to check if we are taking a keyword hash.
That case didn't have a test, but it affects real-world code,
so add a test for it.
This removes rb_empty_keyword_given_p() and related code, as
that is not needed in Ruby 3. The empty keyword case is the
same as the no keyword case in Ruby 3.
This changes rb_scan_args to implement keyword argument
separation for C functions when the : character is used.
For backwards compatibility, it returns a duped hash.
This is a bad idea for performance, but not duping the hash
breaks at least Enumerator::ArithmeticSequence#inspect.
Instead of having RB_PASS_CALLED_KEYWORDS be a number,
simplify the code by just making it be rb_keyword_given_p().
Saves comitters' daily life by avoid #include-ing everything from
internal.h to make each file do so instead. This would significantly
speed up incremental builds.
We take the following inclusion order in this changeset:
1. "ruby/config.h", where _GNU_SOURCE is defined (must be the very
first thing among everything).
2. RUBY_EXTCONF_H if any.
3. Standard C headers, sorted alphabetically.
4. Other system headers, maybe guarded by #ifdef
5. Everything else, sorted alphabetically.
Exceptions are those win32-related headers, which tend not be self-
containing (headers have inclusion order dependencies).
Before this commit, Kernel#lambda can't tell the difference between a
directly passed literal block and one passed with an ampersand.
A block passed with an ampersand is semantically speaking already a
non-lambda proc. When Kernel#lambda receives a non-lambda proc, it
should simply return it.
Implementation wise, when the VM calls a method with a literal block, it
places the code for the block on the calling control frame and passes a
pointer (block handler) to the callee. Before this commit, the VM
forwards block arguments by simply forwarding the block handler, which
leaves the slot for block code unused when a control frame forwards its
block argument. I use the vacant space to indicate that a frame has
forwarded its block argument and inspect that in Kernel#lambda to detect
forwarded blocks.
This is a very ad-hoc solution and relies *heavily* on the way block
passing works in the VM. However, it's the most self-contained solution
I have.
[Bug #15620]
(This is the second try of 036bc1da6c6c9b0fa9b7f5968d897a9554dd770e.)
If iseq is GC'ed, the pointer of iseq may be reused, which may hide a
deprecation warning of keyword argument change.
http://ci.rvm.jp/results/trunk-test1@phosphorus-docker/2474221
```
1) Failure:
TestKeywordArguments#test_explicit_super_kwsplat [/tmp/ruby/v2/src/trunk-test1/test/ruby/test_keyword.rb:549]:
--- expected
+++ actual
@@ -1 +1 @@
-/The keyword argument is passed as the last hash parameter.* for `m'/m
+""
```
This change ad-hocly adds iseq_unique_id for each iseq, and use it
instead of iseq pointer. This covers the case where caller is GC'ed.
Still, the case where callee is GC'ed, is not covered.
But anyway, it is very rare that iseq is GC'ed. Even when it occurs, it
just hides some warnings. It's no big deal.
If iseq is GC'ed, the pointer of iseq may be reused, which may hide a
deprecation warning of keyword argument change.
http://ci.rvm.jp/results/trunk-test1@phosphorus-docker/2474221
```
1) Failure:
TestKeywordArguments#test_explicit_super_kwsplat [/tmp/ruby/v2/src/trunk-test1/test/ruby/test_keyword.rb:549]:
--- expected
+++ actual
@@ -1 +1 @@
-/The keyword argument is passed as the last hash parameter.* for `m'/m
+""
```
This change ad-hocly adds iseq_unique_id for each iseq, and use it
instead of iseq pointer. This covers the case where caller is GC'ed.
Still, the case where callee is GC'ed, is not covered.
But anyway, it is very rare that iseq is GC'ed. Even when it occurs, it
just hides some warnings. It's no big deal.
This commit introduces an "inline ivar cache" struct. The reason we
need this is so compaction can differentiate from an ivar cache and a
regular inline cache. Regular inline caches contain references to
`VALUE` and ivar caches just contain references to the ivar index. With
this new struct we can easily update references for inline caches (but
not inline var caches as they just contain an int)
These functions are used from within a compilation unit so we can
make them static, for better binary size. This changeset reduces
the size of generated ruby binary from 26,590,128 bytes to
26,584,472 bytes on my macihne.
This removes the security features added by $SAFE = 1, and warns for access
or modification of $SAFE from Ruby-level, as well as warning when calling
all public C functions related to $SAFE.
This modifies some internal functions that took a safe level argument
to no longer take the argument.
rb_require_safe now warns, rb_require_string has been added as a
version that takes a VALUE and does not warn.
One public C function that still takes a safe level argument and that
this doesn't warn for is rb_eval_cmd. We may want to consider
adding an alternative method that does not take a safe level argument,
and warn for rb_eval_cmd.
Looking at the list of symbols inside of libruby-static.a, I found
hundreds of functions that are defined, but used from nowhere.
There can be reasons for each of them (e.g. some functions are
specific to some platform, some are useful when debugging, etc).
However it seems the functions deleted here exist for no reason.
This changeset reduces the size of ruby binary from 26,671,456
bytes to 26,592,864 bytes on my machine.
Add an experimental `__builtin_inline!(c_expression)` special intrinsic
which run a C code snippet.
In `c_expression`, you can access the following variables:
* ec (rb_execution_context_t *)
* self (const VALUE)
* local variables (const VALUE)
Not that you can read these variables, but you can not write them.
You need to return from this expression and return value will be a
result of __builtin_inline!().
Examples:
`def foo(x) __builtin_inline!('return rb_p(x);'); end` calls `p(x)`.
`def double(x) __builtin_inline!('return INT2NUM(NUM2INT(x) * 2);')`
returns x*2.
Support loading builtin features written in Ruby, which implement
with C builtin functions.
[Feature #16254]
Several features:
(1) Load .rb file at boottime with native binary.
Now, prelude.rb is loaded at boottime. However, this file is contained
into the interpreter as a text format and we need to compile it.
This patch contains a feature to load from binary format.
(2) __builtin_func() in Ruby call func() written in C.
In Ruby file, we can write `__builtin_func()` like method call.
However this is not a method call, but special syntax to call
a function `func()` written in C. C functions should be defined
in a file (same compile unit) which load this .rb file.
Functions (`func` in above example) should be defined with
(a) 1st parameter: rb_execution_context_t *ec
(b) rest parameters (0 to 15).
(c) VALUE return type.
This is very similar requirements for functions used by
rb_define_method(), however `rb_execution_context_t *ec`
is new requirement.
(3) automatic C code generation from .rb files.
tool/mk_builtin_loader.rb creates a C code to load .rb files
needed by miniruby and ruby command. This script is run by
BASERUBY, so *.rb should be written in BASERUBY compatbile
syntax. This script load a .rb file and find all of __builtin_
prefix method calls, and generate a part of C code to export
functions.
tool/mk_builtin_binary.rb creates a C code which contains
binary compiled Ruby files needed by ruby command.
To perform a regular method call, the VM needs two structs,
`rb_call_info` and `rb_call_cache`. At the moment, we allocate these two
structures in separate buffers. In the worst case, the CPU needs to read
4 cache lines to complete a method call. Putting the two structures
together reduces the maximum number of cache line reads to 2.
Combining the structures also saves 8 bytes per call site as the current
layout uses separate two pointers for the call info and the call cache.
This saves about 2 MiB on Discourse.
This change improves the Optcarrot benchmark at least 3%. For more
details, see attached bugs.ruby-lang.org ticket.
Complications:
- A new instruction attribute `comptime_sp_inc` is introduced to
calculate SP increase at compile time without using call caches. At
compile time, a `TS_CALLDATA` operand points to a call info struct, but
at runtime, the same operand points to a call data struct. Instruction
that explicitly define `sp_inc` also need to define `comptime_sp_inc`.
- MJIT code for copying call cache becomes slightly more complicated.
- This changes the bytecode format, which might break existing tools.
[Misc #16258]
On Android, a signal handler that is not SIG_DFL is set by default for
SIGSEGV. Ruby's install_sighandler inserts Ruby's handler only when the
signal has no handler, so it does not insert Ruby's SEGV report handler,
which caused some test failures.
This changeset forces to install Ruby's handler for some fatal signals
(sigbus, sigsegv, and sigill). They keep the original handlers, and
call them when the interpreter receives the signals.
Just refactoring.
The name "rb_bug_context" is completely unclear for me.
(Can you see that "context" means "machine register context"?)
The context is available only when a fatal signal (sigbus, sigsegv, or
sigill) is received; in fact, the function is used only for fatal
signals. So, I think the name should be changed.
ko1 cannot remember why he introduced the function. And it is not used.
After it is removed, the argument "base_block" of
rb_iseq_compile_with_option is always zero.
This reverts commits: 10d6a3aca78ba48c1b85fba8627dc1dd883de5ba6c6a25feca167e6b48f17cb96d41a53207979278595b3c4fdd1521f7cf89c11c5e69accf336082033632a812c0f56506be0d86427a3219 .
The reason for the revert is that we observe ABA problem around
inline method cache. When a cache misshits, we search for a
method entry. And if the entry is identical to what was cached
before, we reuse the cache. But the commits we are reverting here
introduced situations where a method entry is freed, then the
identical memory region is used for another method entry. An
inline method cache cannot detect that ABA.
Here is a code that reproduce such situation:
```ruby
require 'prime'
class << Integer
alias org_sqrt sqrt
def sqrt(n)
raise
end
GC.stress = true
Prime.each(7*37){} rescue nil # <- Here we populate CC
class << Object.new; end
# These adjacent remove-then-alias maneuver
# frees a method entry, then immediately
# reuses it for another.
remove_method :sqrt
alias sqrt org_sqrt
end
Prime.each(7*37).to_a # <- SEGV
```
Most (if not all) of the fields of rb_method_definition_t are never
meant to be modified once after they are stored. Marking them const
makes it possible for compilers to warn on unintended modifications.
This approach uses a flag bit on the final hash object in the regular splat,
as opposed to a previous approach that used a VM frame flag. The hash flag
approach is less invasive, and handles some cases that the VM frame flag
approach does not, such as saving the argument splat array and splatting it
later:
ruby2_keywords def foo(*args)
@args = args
bar
end
def bar
baz(*@args)
end
def baz(*args, **kw)
[args, kw]
end
foo(a:1) #=> [[], {a: 1}]
foo({a: 1}, **{}) #=> [[{a: 1}], {}]
foo({a: 1}) #=> 2.7: [[], {a: 1}] # and warning
foo({a: 1}) #=> 3.0: [[{a: 1}], {}]
It doesn't handle some cases that the VM frame flag handles, such as when
the final hash object is replaced using Hash#merge, but those cases are
probably less common and are unlikely to properly support keyword
argument separation.
Use ruby2_keywords to handle argument delegation in the delegate library.
The built-in version operates on a buffer of 5 words, much smaller than
the size of jmp_buf defined in libc.
Note, powerpc requires 5 words, while arm and x86_64 just require 3.
Also add keyword argument separation warnings for Class#new and Method#call.
To allow for keyword argument to required positional hash converstion in
cfuncs, add a vm frame flag indicating the cfunc was called with an empty
keyword hash (which was removed before calling the cfunc). The cfunc can
check this frame flag and add back an empty hash if it is passing its
arguments to another Ruby method. Add rb_empty_keyword_given_p function
for checking if called with an empty keyword hash, and
rb_add_empty_keyword for adding back an empty hash to argv.
All of this empty keyword argument support is only for 2.7. It will be
removed in 3.0 as Ruby 3 will not convert empty keyword arguments to
required positional hash arguments. Comment all of the relevent code
to make it obvious this is expected to be removed.
Add rb_funcallv_kw as an public C-API function, just like rb_funcallv
but with a keyword flag. This is used by rb_obj_call_init (internals
of Class#new). This also required expected call_type enum with
CALL_FCALL_KW, similar to the recent addition of CALL_PUBLIC_KW.
Add rb_vm_call_kw as a internal function, used by call_method_data
(internals of Method#call and UnboundMethod#bind_call). Add tests
for UnboundMethod#bind_call keyword handling.
The kw_splat flag is whether the original call passes keyword or not.
Some types of methods (e.g., bmethod and sym_proc) drops the
information. This change tries to propagate the flag to the final
callee, as far as I can.
This syntax means the method should be treated as a method that
uses keyword arguments, but no specific keyword arguments are
supported, and therefore calling the method with keyword arguments
will raise an ArgumentError. It is still allowed to double splat
an empty hash when calling the method, as that does not pass
any keyword arguments.
After 5e86b005c0, I now think ANYARGS is
dangerous and should be extinct. This commit deletes ANYARGS from
rb_thread_create, which seems very safe to do.
After 5e86b005c0, I now think ANYARGS is
dangerous and should be extinct. This commit deletes ANYARGS from
rb_ensure, which also revealed many arity / type mismatches.
After 5e86b005c0, I now think ANYARGS is
dangerous and should be extinct. This commit deletes ANYARGS from
struct vm_ifunc, but in doing so we also have to decouple the usage
of this struct in compile.c, which (I think) is an abuse of ANYARGS.
If `vm_stack` is left dangling in a forked process, the gc attempts to scan
it, but it is invalid and will cause a segfault. Therefore, we clear it
before forking.
In order to simplify this, `rb_ec_clear_vm_stack` was introduced.
Before this commit, classes and modules would be registered with the
VM's `defined_module_hash`. The key was the ID of the class, but that
meant that it was possible for hash collisions to occur. The compactor
doesn't allow classes in the `defined_module_hash` to move, but if there
is a conflict, then it's possible a class would be removed from the hash
and not get pined.
This commit changes the key / value of the hash just to be the class
itself, thus preventing movement.
For some reason symbols (or classes) are being overridden in trunk
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67598 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This commit adds the new method `GC.compact` and compacting GC support.
Please see this issue for caveats:
https://bugs.ruby-lang.org/issues/15626
[Feature #15626]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67576 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Because hard to specify commits related to r67479 only.
So please commit again.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67499 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This commit adds the new method `GC.compact` and compacting GC support.
Please see this issue for caveats:
https://bugs.ruby-lang.org/issues/15626
[Feature #15626]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67479 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
cfp->bp was (re-)introduced by Kokubun san, but VM doesn't use it
because I (ko1) want to remove it in a future. But using it make
leave instruction fast because of sp consisntency check.
So now VM uses cfp->bp.
To use cfp->bp, I checked the value and I found that it is not a
"initial value of sp" but a "initial value of ep". Fix this problem
and fix all bp references (this is why bp is renamed to bp_).
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67342 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
zlib and bignum both contain unblocking functions which are
async-signal-safe and do not require spawning additional
threads.
We can execute those functions directly in signal handlers
without incurring overhead of extra threads, so provide C-API
users the ability to deal with that. Other C-API users may
have similar need.
This flexible API can supercede existing uses of
rb_thread_call_without_gvl and rb_thread_call_without_gvl2 by
introducing a flags argument to control behavior.
Note: this API is NOT finalized. It needs approval from other
committers. I prefer shorter name than previous
rb_thread_call_without_gvl* functions because my eyes requires
big fonts.
[Bug #15499]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66712 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* ruby.c (process_options): script_compiled events are missed on
command line -e or specified file. this commit fix it.
[Bug #15471]
This patch should be backport to Ruby 2.6 branch.
* vm_core.h (rb_exec_event_hook_script_compiled): introduce utility
function to invoke a script_compiled event.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66595 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* iseq.c: before this patch, RubyVM::InstructionSequence.of(src) (ISeq in
short) returns different ISeq (wrapper) objects point to one ISeq internal
object. This patch changes this behavior to cache created ISeq (wrapper)
objects and return same ISeq object for an internal ISeq object.
* iseq.h (ISEQ_EXECUTABLE_P): introduced to check executable ISeq objects.
* iseq.h (ISEQ_COMPILE_DATA_ALLOC): reordr setting flag line to avoid
ISEQ_USE_COMPILE_DATA but compiled_data == NULL case.
* vm_core.h (rb_iseq_t): introduce `rb_iseq_t::wrapper` and
`rb_iseq_t::aux::exec`. Move `rb_iseq_t::local_hooks` to
`rb_iseq_t::aux::exec::local_hooks`.
* test/ruby/test_iseq.rb: add ISeq.of() tests.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66246 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
postponed_job is safe to use in signal handlers, but is not
thread-safe for MJIT. Implement a workqueue for MJIT
thread-safety.
[Bug #15316]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66100 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* vm_trace.c: `TracePoint#enable(target_line:)` is supported.
This option enables a hook only at specified target_line.
target_line should be combination with target and :line event.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66008 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* vm_trace.c (rb_tracepoint_enable_for_target): support targetting
TracePoint. [Feature #15289]
Tragetting TracePoint is only enabled on specified method, proc
and so on, example: `tp.enable(target: code)`.
`code` should be consisted of InstructionSeuqnece (iseq)
(RubyVM::InstructionSeuqnece.of(code) should not return nil)
If code is a tree of iseq, TracePoint is enabled on all of
iseqs in a tree.
Enabled tragetting TracePoints can not enabled again with
and without target.
* vm_core.h (rb_iseq_t): introduce `rb_iseq_t::local_hooks`
to store local hooks.
`rb_iseq_t::aux::trace_events` is renamed to
`global_trace_events` to contrast with `local_hooks`.
* vm_core.h (rb_hook_list_t): add `rb_hook_list_t::running`
to represent how many Threads/Fibers are used this list.
If this field is 0, nobody using this hooks and we can
delete it.
This is why we can remove code from cont.c.
* vm_core.h (rb_vm_t): because of above change, we can eliminate
`rb_vm_t::trace_running` field.
Also renamed from `rb_vm_t::event_hooks` to `global_hooks`.
* vm_core.h, vm.c (ruby_vm_event_enabled_global_flags): renamed
from `ruby_vm_event_enabled_flags.
* vm_core.h, vm.c (ruby_vm_event_local_num): added to count
enabled targetting TracePoints.
* vm_core.h, vm_trace.c (rb_exec_event_hooks): accepts
hook list.
* vm_core.h (rb_vm_global_hooks): added for convinience.
* method.h (rb_method_bmethod_t): added to maintain Proc
and `rb_hook_list_t` for bmethod (defined by define_method).
* prelude.rb (TracePoint#enable): extracet a keyword parameter
(because it is easy than writing in C).
It calls `TracePoint#__enable` internal method written in C.
* vm_insnhelper.c (vm_trace): check also iseq->local_hooks.
* vm.c (invoke_bmethod): check def->body.bmethod.hooks.
* vm.c (hook_before_rewind): check iseq->local_hooks
and def->body.bmethod.hooks before rewind by exception.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66003 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
We already use "static inline" heavily and there should be no
penalty for modern compilers; this adds type-checking, too.
This will make future changes easier-to-review.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65775 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* vm_core.h: remove `rb_execution_context_t::passed_bmethod_me`
and fix functions to pass the `me` directly.
`passed_bmethod_me` was used to make bmethod (methods defined by
`defined_method`). `rb_vm_invoke_bmethod` invoke `Proc` with `me`
information as method frame (`lambda` frame, actually).
If the proc call is not bmethod call, `passed_bmethod_me` should
be NULL. However, there is a bug which passes wrong `me` for
normal block call.
http://ci.rvm.jp/results/trunk-asserts@silicon-docker/1449470
This is because wrong `me` was remained in `passed_bmethod_me`
(and used incorrectly it after collected by GC).
We need to clear `passed_bmethod_me` just after bmethod call,
but clearing is not enough.
To solve this issue, I removed `passed_bmethod_me` and pass `me`
information as a function parameter of `rb_vm_invoke_bmethod`,
`invoke_block_from_c_proc` and `invoke_iseq_block_from_c` in vm.c.
* vm.c (invoke_iseq_block_from_c): the number of parameters is too
long so that I try to specify `ALWAYS_INLINE`.
* vm.c (invoke_block_from_c_proc): ditto.
* vm_insnhelper.c (vm_yield_with_cfunc): now there are no pathes
to use bmethod here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65636 b2dd03c8-39d4-4d8f-98ff-823fe69b080e