This change fixes some cases where YJIT fails to fire tracing events.
Most of the situations YJIT did not handle correctly involves enabling
tracing while running inside generated code.
A new operation to invalidate all generated code is added, which uses
patching to make generated code exit at the next VM instruction
boundary. A new routine called `jit_prepare_routine_call()` is
introduced to facilitate this and should be used when generating code
that could allocate, or could otherwise use `RB_VM_LOCK_ENTER()`.
The `c_return` event is fired in the middle of an instruction as opposed
to at an instruction boundary, so it requires special handling. C method
call return points are patched to go to a fucntion which does everything
the interpreter does, including firing the `c_return` event. The
generated code for C method calls normally does not fire the event.
Invalided code should not change after patching so the exits are not
clobbered. A new variable is introduced to track the region of code that
should not change.
The FIXME is there so we remember to investigate why insns clears the
temporary array. Is this necessary? If it's not we can remove it from
both.
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
Lazily compile out a chain of checks for different known classes and
whether `self` embeds its ivars or not.
* Remove trailing whitespaces
* Get proper addresss in Capstone disassembly
* Lowercase address in Capstone disassembly
Capstone uses lowercase for jump targets in generated listings. Let's
match it.
* Use the same successor in getivar guard chains
Cuts down on duplication
* Address reviews
* Fix copypasta error
* Add a comment
* ujit: implement opt_getinlinecache
Aggressively bet that writes to constants don't happen and invalidate
all opt_getinlinecache blocks on any and all constant writes.
Use alignment padding on block_t to track this assumption. No change to
sizeof(block_t).
* Fix compile warnings when not RUBY_DEBUG
* Fix reversed condition
* Switch to st_table to keep track of assumptions
Co-authored-by: Aaron Patterson <aaron.patterson@gmail.com>
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
* uJIT: don't compile tailcalls
* Don't compile calls to protected methods
We need to generate extra code to check whether the call goes through if
we want to support these.
* Fix copy pasta
* Update blockids in branches
* Update dependencies
* Tie lifetime of uJIT blocks to iseqs
Blocks weren't being freed when iseqs are collected.
* Add rb_dary. Use it for method dependency table
* Keep track of blocks per iseq
Remove global version_tbl
* Block version bookkeeping fix
* dary -> darray
* free ujit_blocks
* comment about size of ujit_blocks
For the `test-bundled-gems`, make `debug.so` with extconf.rb and
`make` command directly because `rake-compiler` assume ruby is
installed (but `test-bundled-gems` can run without installation).
Formerly, TypeProf was tested with the latest RBS code during
`make test-bundled-gems`. However, when a new version of rbs is
released, and if it is incompatible with TypeProf,
`make test-bundled-gems` starts failing, which was annoying.
By this change, TypeProf is tested with the bundled version of RBS.
Formerly, TypeProf was tested with the latest RBS code during
`make test-bundled-gems`. However, when a new version of rbs is
released, and if it is incompatible with TypeProf,
`make test-bundled-gems` starts failing, which was annoying.
By this change, TypeProf is tested with the bundled version of RBS.
- Change Unicode version to 13.0.0
- Change Emoji version to 13.0
- Adjust to moved locations of emoji-data.txt and emoji-variation-sequences.txt
by splitting these files from $(UNICODE_EMOJI_FILES) and putting them into
a new group $(UNICODE_UCD_EMOJI_FILES)
Redo of 34a2acdac788602c14bf05fb616215187badd504 and
931138b00696419945dc03e10f033b1f53cd50f3 which were reverted.
GitHub PR #4340.
This change implements a cache for class variables. Previously there was
no cache for cvars. Cvar access is slow due to needing to travel all the
way up th ancestor tree before returning the cvar value. The deeper the
ancestor tree the slower cvar access will be.
The benefits of the cache are more visible with a higher number of
included modules due to the way Ruby looks up class variables. The
benchmark here includes 26 modules and shows with the cache, this branch
is 6.5x faster when accessing class variables.
```
compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105c) [x86_64-darwin19]
built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be009) [x86_64-darwin19]
| |compare-ruby|built-ruby|
|:--------|-----------:|---------:|
|vm_cvar | 5.681M| 36.980M|
| | -| 6.51x|
```
Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails
application. ActiveRecord::Base.logger has 71 ancestors. The more
ancestors a tree has, the more clear the speed increase. IE if Base had
only one ancestor we'd see no improvement. This benchmark is run on a
vanilla Rails application.
Benchmark code:
```ruby
require "benchmark/ips"
require_relative "config/environment"
Benchmark.ips do |x|
x.report "logger" do
ActiveRecord::Base.logger
end
end
```
Ruby 3.0 master / Rails 6.1:
```
Warming up --------------------------------------
logger 155.251k i/100ms
Calculating -------------------------------------
```
Ruby 3.0 with cvar cache / Rails 6.1:
```
Warming up --------------------------------------
logger 1.546M i/100ms
Calculating -------------------------------------
logger 14.857M (± 4.8%) i/s - 74.198M in 5.006202s
```
Lastly we ran a benchmark to demonstate the difference between master
and our cache when the number of modules increases. This benchmark
measures 1 ancestor, 30 ancestors, and 100 ancestors.
Ruby 3.0 master:
```
Warming up --------------------------------------
1 module 1.231M i/100ms
30 modules 432.020k i/100ms
100 modules 145.399k i/100ms
Calculating -------------------------------------
1 module 12.210M (± 2.1%) i/s - 61.553M in 5.043400s
30 modules 4.354M (± 2.7%) i/s - 22.033M in 5.063839s
100 modules 1.434M (± 2.9%) i/s - 7.270M in 5.072531s
Comparison:
1 module: 12209958.3 i/s
30 modules: 4354217.8 i/s - 2.80x (± 0.00) slower
100 modules: 1434447.3 i/s - 8.51x (± 0.00) slower
```
Ruby 3.0 with cvar cache:
```
Warming up --------------------------------------
1 module 1.641M i/100ms
30 modules 1.655M i/100ms
100 modules 1.620M i/100ms
Calculating -------------------------------------
1 module 16.279M (± 3.8%) i/s - 82.038M in 5.046923s
30 modules 15.891M (± 3.9%) i/s - 79.459M in 5.007958s
100 modules 16.087M (± 3.6%) i/s - 81.005M in 5.041931s
Comparison:
1 module: 16279458.0 i/s
100 modules: 16087484.6 i/s - same-ish: difference falls within error
30 modules: 15891406.2 i/s - same-ish: difference falls within error
```
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
by a race condition by multiple Ractors.
Atmically incrementing body->total_calls may have its own cost, so for
now we intentionally leave the unreliable total_calls. So we allow an
ISeq to be never pushed when you use multiple Ractors. However, if you
enqueue a single ccan node twice, get_from_list loops infinitely. Thus
this patch takes care of such a situation.
Rational literals are those integers suffixed with `r`. They tend to
be a part of more complex expressions like `123/456r`, but in theory
they can live alone. When such "bare" rational literals are passed to
case-when branch, we have to take care of them. Fixes [Bug #17854]
This change implements a cache for class variables. Previously there was
no cache for cvars. Cvar access is slow due to needing to travel all the
way up th ancestor tree before returning the cvar value. The deeper the
ancestor tree the slower cvar access will be.
The benefits of the cache are more visible with a higher number of
included modules due to the way Ruby looks up class variables. The
benchmark here includes 26 modules and shows with the cache, this branch
is 6.5x faster when accessing class variables.
```
compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105ca45) [x86_64-darwin19]
built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be0093ae) [x86_64-darwin19]
| |compare-ruby|built-ruby|
|:--------|-----------:|---------:|
|vm_cvar | 5.681M| 36.980M|
| | -| 6.51x|
```
Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails
application. ActiveRecord::Base.logger has 71 ancestors. The more
ancestors a tree has, the more clear the speed increase. IE if Base had
only one ancestor we'd see no improvement. This benchmark is run on a
vanilla Rails application.
Benchmark code:
```ruby
require "benchmark/ips"
require_relative "config/environment"
Benchmark.ips do |x|
x.report "logger" do
ActiveRecord::Base.logger
end
end
```
Ruby 3.0 master / Rails 6.1:
```
Warming up --------------------------------------
logger 155.251k i/100ms
Calculating -------------------------------------
```
Ruby 3.0 with cvar cache / Rails 6.1:
```
Warming up --------------------------------------
logger 1.546M i/100ms
Calculating -------------------------------------
logger 14.857M (± 4.8%) i/s - 74.198M in 5.006202s
```
Lastly we ran a benchmark to demonstate the difference between master
and our cache when the number of modules increases. This benchmark
measures 1 ancestor, 30 ancestors, and 100 ancestors.
Ruby 3.0 master:
```
Warming up --------------------------------------
1 module 1.231M i/100ms
30 modules 432.020k i/100ms
100 modules 145.399k i/100ms
Calculating -------------------------------------
1 module 12.210M (± 2.1%) i/s - 61.553M in 5.043400s
30 modules 4.354M (± 2.7%) i/s - 22.033M in 5.063839s
100 modules 1.434M (± 2.9%) i/s - 7.270M in 5.072531s
Comparison:
1 module: 12209958.3 i/s
30 modules: 4354217.8 i/s - 2.80x (± 0.00) slower
100 modules: 1434447.3 i/s - 8.51x (± 0.00) slower
```
Ruby 3.0 with cvar cache:
```
Warming up --------------------------------------
1 module 1.641M i/100ms
30 modules 1.655M i/100ms
100 modules 1.620M i/100ms
Calculating -------------------------------------
1 module 16.279M (± 3.8%) i/s - 82.038M in 5.046923s
30 modules 15.891M (± 3.9%) i/s - 79.459M in 5.007958s
100 modules 16.087M (± 3.6%) i/s - 81.005M in 5.041931s
Comparison:
1 module: 16279458.0 i/s
100 modules: 16087484.6 i/s - same-ish: difference falls within error
30 modules: 15891406.2 i/s - same-ish: difference falls within error
```
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
* Add a benchmark-driver runner for Ractor
* Process.clock_gettime(Process:CLOCK_MONOTONIC) could be slow
in Ruby 3.0 Ractor
* Fetching Time could also be slow
* Fix a comment
* Assert overriding a private method
* Rename `rb_scheduler` to `rb_fiber_scheduler`.
* Use public interface if available.
* Use `rb_check_funcall` where possible.
* Don't use `unblock` unless the fiber was non-blocking.
Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
Partially reversing a4f3e1762a like 21df4dce53.
We usually run them through make check which has the dependency, and test-all and test-spec without the dependency are useful for running only individual tests.
run, runruby, ... accept RUNOPT and RUNOPT0 configuration to pass
some commandline argument like that:
$(BTESTRUBY) $(RUNOPT0) $(TESTRUN_SCRIPT) $(RUNOPT)
RUNOPT0 is options for ruby interpreter (-w, -v, ...)
RUNOPT is options for the script (ARGV/ARGF)
accessing theap needs complicating synchronization but it reduce
performance on multi-ractor mode. So simply stop using theap
on multi-ractor mode. In future, theap should be replaced with
more cleaver memory strategy.
To manage ractor-local data for C extension, the following APIs
are defined.
* rb_ractor_local_storage_value_newkey
* rb_ractor_local_storage_value
* rb_ractor_local_storage_value_set
* rb_ractor_local_storage_ptr_newkey
* rb_ractor_local_storage_ptr
* rb_ractor_local_storage_ptr_set
At first, you need to create a key of storage by
rb_ractor_local_(value|ptr)_newkey().
For ptr storage, it accepts the type of storage,
how to mark and how to free with ractor's lifetime.
rb_ractor_local_storage_value/set are used to access a VALUE
and rb_ractor_local_storage_ptr/set are used to access a pointer.
random.c uses this API.
* memory_view.c: remove a reference in view->obj at rb_memory_view_release
* memory_view.c: keep references of memory-view-exported objects
* Update common.mk
* memory_view.c: Use st_update
To make some kind of Ractor related extensions, some functions
should be exposed.
* include/ruby/thread_native.h
* rb_native_mutex_*
* rb_native_cond_*
* include/ruby/ractor.h
* RB_OBJ_SHAREABLE_P(obj)
* rb_ractor_shareable_p(obj)
* rb_ractor_std*()
* rb_cRactor
and rm ractor_pub.h
and rename srcdir/ractor.h to srcdir/ractor_core.h
(to avoid conflict with include/ruby/ractor.h)
Introduce new method Ractor.make_shareable(obj) which tries to make
obj shareable object. Protocol is here.
(1) If obj is shareable, it is shareable.
(2) If obj is not a shareable object and if obj can be shareable
object if it is frozen, then freeze obj. If obj has reachable
objects (rs), do rs.each{|o| Ractor.make_shareable(o)}
recursively (recursion is not Ruby-level, but C-level).
(3) Otherwise, raise Ractor::Error. Now T_DATA is not a shareable
object even if the object is frozen.
If the method finished without error, given obj is marked as
a sharable object.
To allow makng a shareable frozen T_DATA object, then set
`RUBY_TYPED_FROZEN_SHAREABLE` as type->flags. On default,
this flag is not set. It means user defined T_DATA objects are
not allowed to become shareable objects when it is frozen.
You can make any object shareable by setting FL_SHAREABLE flag,
so if you know that the T_DATA object is shareable (== thread-safe),
set this flag, at creation time for example. `Ractor` object is one
example, which is not a frozen, but a shareable object.
iv_index_tbl manages instance variable indexes (ID -> index).
This data structure should be synchronized with other ractors
so introduce some VM locks.
This patch also introduced atomic ivar cache used by
set/getinlinecache instructions. To make updating ivar cache (IVC),
we changed iv_index_tbl data structure to manage (ID -> entry)
and an entry points serial and index. IVC points to this entry so
that cache update becomes atomically.
generic_ivtbl is a process global table to maintain instance variables
for non T_OBJECT/T_CLASS/... objects. So we need to protect them
for multi-Ractor exection.
Hint: we can make them Ractor local for unshareable objects, but
now it is premature optimization.
enc_table which manages Encoding information. rb_encoding_list
also manages Encoding objects. Both are accessed/modified by ractors
simultaneously so that they should be synchronized.
For enc_table, this patch introduced GLOBAL_ENC_TABLE_ENTER/LEAVE/EVAL
to access this table with VM lock. To make shortcut, three new global
variables global_enc_ascii, global_enc_utf_8, global_enc_us_ascii are
also introduced.
For rb_encoding_list, we split it to rb_default_encoding_list (256 entries)
and rb_additional_encoding_list. rb_default_encoding_list is fixed sized Array
so we don't need to synchronized (and most of apps only needs it). To manage
257 or more encoding objects, they are stored into rb_additional_encoding_list.
To access rb_additional_encoding_list., VM lock is needed.
Try update and extract bundled gems only when baseruby is
available. It should be done only when installing from
developemental build and not from the tarball, but it is not
obvious to differentiate them.
* Add buffer protocol
* Modify for some review comments
* Per-object buffer availability
* Rename to MemoryView from Buffer and make compilable
* Support integral repeat count in memory view format
* Support 'x' for padding bytes
* Add rb_memory_view_parse_item_format
* Check type in rb_memory_view_register
* Update dependencies in common.mk
* Add test of MemoryView
* Add test of rb_memory_view_init_as_byte_array
* Add native size format test
* Add MemoryView test utilities
* Add test of rb_memory_view_fill_contiguous_strides
* Skip spaces in format string
* Support endianness specifiers
* Update documentation
* Support alignment
* Use RUBY_ALIGNOF
* Fix format parser to follow the pack format
* Support the _ modifier
* Parse count specifiers in get_format_size function.
* Use STRUCT_ALIGNOF
* Fix test
* Fix test
* Fix total size for the case with tail padding
* Fix rb_memory_view_get_item_pointer
* Fix rb_memory_view_parse_item_format again
* random.c: separate abstract rb_random_t and rb_random_mt_t for
Mersenne Twister implementation.
* include/ruby/random.h: the interface for extensions of Random
class.
* DLL imported symbol reference is not constant on Windows.
* check if properly initialized.
This commit introduces Ractor mechanism to run Ruby program in
parallel. See doc/ractor.md for more details about Ractor.
See ticket [Feature #17100] to see the implementation details
and discussions.
[Feature #17100]
This commit does not complete the implementation. You can find
many bugs on using Ractor. Also the specification will be changed
so that this feature is experimental. You will see a warning when
you make the first Ractor with `Ractor.new`.
I hope this feature can help programmers from thread-safety issues.
These days I don't use `make benchmark`. The YAML files should be
executable with bare `benchmark-driver` CLI without passing
`RUBYOPT=-Ibenchmark/lib`.
A prerequisite to fix https://bugs.ruby-lang.org/issues/15589 with JIT.
This commit alone doesn't make a significant difference yet, but I thought
this commit should be committed independently.
This method override was discussed in [Misc #16961].
* Do not chdir in the runner process, to access miniruby. Chdir
in worker processes instead.
* GNU make does not export newly added environment variables by
default, set PARALLEL_TESTS_EXECUTABLE in the runner.
As fork(2) is deprecated, its calls must be guarded by
`COMPILER_WARNING_IGNORED(-Wdeprecated-declarations)`.
All usages of fork(2) in process have been alread guarded. A new call
to fork(2) was added in ruby.c with f22c4ff359.
This caused a build failure on Solaris 11.
It may hide a bug to guard big code unnecessarily, so this change
introduces a simple wrapper "rb_fork" whose definition is guarded, and
replaces all calls to fork(2) with the wrapper function.
This reverts commit 443389effc.
This reverts commit d94960f22e.
Inclusion of header files must be explicit. Every file shall directly
include what is necessary.
https://github.com/include-what-you-use/include-what-you-use says:
> When every file includes what it uses, then it is possible to edit any
> file and remove unused headers, without fear of accidentally breaking
> the upwards dependencies of that file. It also becomes easy to
> automatically track and update dependencies in the source code.
Though we don't use iwyu itself, the principle quoted above is a good
thing that we can agree.
Now that include guards were added to every and all of the headers
inside of our project this changeset does not increase compile time, at
least on my machine.