I'm planning to introduce mjit_compiler.rb, and I want to make this
consistent with it. Consistency with compile.c doesn't seem important
for MJIT anyway.
* Optimize Marshal dump of large fixnum
Marshal's FIXNUM type only supports 31-bit fixnums, so on 64-bit
platforms the 63-bit fixnums need to be represented in Marshal's
BIGNUM.
Previously this was done by converting to a bugnum and serializing the
bignum object.
This commit avoids allocating the intermediate bignum object, instead
outputting the T_FIXNUM directly to a Marshal bignum. This maintains the
same representation as the previous implementation, including not using
LINKs for these large fixnums (an artifact of the previous
implementation always allocating a new BIGNUM).
This commit also avoids unnecessary st_lookups on immediate values,
which we know will not be in that table.
* Fastpath for loading FIXNUM from Marshal bignum
* Run update-deps
Creates simple bin stubs to load the extracted executable files.
After only extracted under `gems` directory, the gems are considered
installed but the executable scripts are not found.
Also the second argument is now the parent of the previous second and
third arguments.
Since #6006, we no longer avoid executing GC on mjit_worker.c and thus
there's no need to carefully change how we write code whether you're in
mjit.c or mjit_worker.c anymore.
The annocheck supports ELF format binaries compiled for any OS and for any
architecture. It can work in not only Linux but also FreeBSD and Solaris.
It is designed to be independent of the host OS and the architecture.
Even in Mac, as the binaries are Mach-O foramt, the annocheck fails correctly
with the message.
e.g. Test binaries compiled for Mac OSX 10.13.6 (target_os: darwin17) in Fedora 35.
```
$ cat /etc/fedora-release
Fedora release 35 (Thirty Five)
$ file ruby
ruby: Mach-O 64-bit x86_64 executable, flags:<NOUNDEFS|DYLDLINK|TWOLEVEL|WEAK_DEFINES|BINDS_TO_WEAK|PIE>
$ annocheck ruby
annocheck: Version 10.66.
annocheck: Warning: ruby: is not an ELF format file.
```
See <https://sourceware.org/bugzilla/show_bug.cgi?id=29173> for details.
In December 2021, we opened an [issue] to solicit feedback regarding the
porting of the YJIT codebase from C99 to Rust. There were some
reservations, but this project was given the go ahead by Ruby core
developers and Matz. Since then, we have successfully completed the port
of YJIT to Rust.
The new Rust version of YJIT has reached parity with the C version, in
that it passes all the CRuby tests, is able to run all of the YJIT
benchmarks, and performs similarly to the C version (because it works
the same way and largely generates the same machine code). We've even
incorporated some design improvements, such as a more fine-grained
constant invalidation mechanism which we expect will make a big
difference in Ruby on Rails applications.
Because we want to be careful, YJIT is guarded behind a configure
option:
```shell
./configure --enable-yjit # Build YJIT in release mode
./configure --enable-yjit=dev # Build YJIT in dev/debug mode
```
By default, YJIT does not get compiled and cargo/rustc is not required.
If YJIT is built in dev mode, then `cargo` is used to fetch development
dependencies, but when building in release, `cargo` is not required,
only `rustc`. At the moment YJIT requires Rust 1.60.0 or newer.
The YJIT command-line options remain mostly unchanged, and more details
about the build process are documented in `doc/yjit/yjit.md`.
The CI tests have been updated and do not take any more resources than
before.
The development history of the Rust port is available at the following
commit for interested parties:
1fd9573d8b
Our hope is that Rust YJIT will be compiled and included as a part of
system packages and compiled binaries of the Ruby 3.2 release. We do not
anticipate any major problems as Rust is well supported on every
platform which YJIT supports, but to make sure that this process works
smoothly, we would like to reach out to those who take care of building
systems packages before the 3.2 release is shipped and resolve any
issues that may come up.
[issue]: https://bugs.ruby-lang.org/issues/18481
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
Co-authored-by: Noah Gibbs <the.codefolio.guy@gmail.com>
Co-authored-by: Kevin Newton <kddnewton@gmail.com>
This is another attempt to fix out-of-src build with
--with-static-linked-ext. The first attempt was
4f1888bda70981d9f5b1bf55ab692e0ce18e79f4 but reverted because it broke
out-of-src build from pre-generated sources via `make dist`.
This patch fixes the second trans C source gen, mentioned in the
previous commit message, by passing MINIRUBY as well as when invoking
from common.mk
Currently some specs are broken because `rspec-mocks-3.10.3` is used,
which has some breaking changes, apparently.
This change makes ruby-core install the same gems installed upstream for
running bundle specs, so that things never break with 3rd party
releases.
We found that we need to make Ruby objects while locking the environ
to ENV operation atomically, so we decided to use `RB_VM_LOCK_ENTER()`
instead of `env_lock`.
* YJIT: Implement optimized_method_struct_aref
* YJIT: Implement struct_aref without method call
Struct member reads can be compiled directly into a memory read (with
either one or two levels of indirection).
* YJIT: Implement optimized struct aset
* YJIT: Update tests for struct access
* YJIT: Add counters for remaining optimized methods
* Check for INT32_MAX overflow
It only takes a struct with 0x7fffffff/8+1 members. Also add some
cheap compile time checks.
* Add tests for non-embedded struct aref/aset
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
The implementation of a local variable tables was represented as `ID*`,
but it was very hacky: the first element is not an ID but the size of
the table, and, the last element is (sometimes) a link to the next local
table only when the id tables are a linked list.
This change converts the hacky implementation to a normal struct.
* `GC.measure_total_time = true` enables total time measurement (default: true)
* `GC.measure_total_time` returns current flag.
* `GC.total_time` returns measured total time in nano seconds.
* `GC.stat(:time)` (and Hash) returns measured total time in milli seconds.
This provides a significant speedup for symbol, true, false,
nil, and 0-9, class/module, and a small speedup in most other cases.
Speedups (using included benchmarks):
:symbol :: 60%
0-9 :: 50%
Class/Module :: 50%
nil/true/false :: 20%
integer :: 10%
[] :: 10%
"" :: 3%
One reason this approach is faster is it reduces the number of
VM instructions for each interpolated value.
Initial idea, approach, and benchmarks from Eric Wong. I applied
the same approach against the master branch, updating it to handle
the significant internal changes since this was first proposed 4
years ago (such as CALL_INFO/CALL_CACHE -> CALL_DATA). I also
expanded it to optimize true/false/nil/0-9/class/module, and added
handling of missing methods, refined methods, and RUBY_DEBUG.
This renames the tostring insn to anytostring, and adds an
objtostring insn that implements the optimization. This requires
making a few functions non-static, and adding some non-static
functions.
This disables 4 YJIT tests. Those tests should be reenabled after
YJIT optimizes the new objtostring insn.
Implements [Feature #13715]
Co-authored-by: Eric Wong <e@80x24.org>
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
Co-authored-by: Yusuke Endoh <mame@ruby-lang.org>
Co-authored-by: Koichi Sasada <ko1@atdot.net>
`duphash` showed up in the top-20 most frequent exit ops for @jhawthorn's benchmark that renders github.com/about
The implementation was almost exactly the same as `duparray`
Co-authored-by: John Hawthorn <john@hawthorn.email>
Co-authored-by: John Hawthorn <john@hawthorn.email>
* process.c: Add Process._fork
This API is supposed for application monitoring libraries to hook fork
event.
[Feature #17795]
Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
`RubyVM.keep_script_lines` enables to keep script lines
for each ISeq and AST. This feature is for debugger/REPL
support.
```ruby
RubyVM.keep_script_lines = true
RubyVM::keep_script_lines = true
eval("def foo = nil\ndef bar = nil")
pp RubyVM::InstructionSequence.of(method(:foo)).script_lines
```
The new backend isn't used at the moment and adds to our diff against
upstream so remove it for now. We can reverse the removal later with git
history.
For upstreaming, we want functions we export either prefixed with "rb_"
or made static. Historically we haven't been following this rule, so we
were "leaking" a lot of symbols as `make leak-globals` would tell us.
This change unifies everything YJIT into a single compilation unit,
yjit.o, and makes everything unprefixed static to pass `make leak-globals`.
This manual "unified build" setup is similar to that of vm.o.
Having everything in one compilation unit allows static functions to
be visible across YJIT files and removes the need for declarations in
headers in some cases. Unnecessary declarations were removed.
Other changes of note:
- switched to MJIT_SYMBOL_EXPORT_BEGIN which indicates stuff as being
off limits for native extensions
- the first include of each YJIT file is change to be "internal.h"
- undefined MAP_STACK before explicitly redefining it since it
collide's with a definition in system headers. Consider renaming?
This change fixes some cases where YJIT fails to fire tracing events.
Most of the situations YJIT did not handle correctly involves enabling
tracing while running inside generated code.
A new operation to invalidate all generated code is added, which uses
patching to make generated code exit at the next VM instruction
boundary. A new routine called `jit_prepare_routine_call()` is
introduced to facilitate this and should be used when generating code
that could allocate, or could otherwise use `RB_VM_LOCK_ENTER()`.
The `c_return` event is fired in the middle of an instruction as opposed
to at an instruction boundary, so it requires special handling. C method
call return points are patched to go to a fucntion which does everything
the interpreter does, including firing the `c_return` event. The
generated code for C method calls normally does not fire the event.
Invalided code should not change after patching so the exits are not
clobbered. A new variable is introduced to track the region of code that
should not change.
The FIXME is there so we remember to investigate why insns clears the
temporary array. Is this necessary? If it's not we can remove it from
both.
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
Lazily compile out a chain of checks for different known classes and
whether `self` embeds its ivars or not.
* Remove trailing whitespaces
* Get proper addresss in Capstone disassembly
* Lowercase address in Capstone disassembly
Capstone uses lowercase for jump targets in generated listings. Let's
match it.
* Use the same successor in getivar guard chains
Cuts down on duplication
* Address reviews
* Fix copypasta error
* Add a comment
* ujit: implement opt_getinlinecache
Aggressively bet that writes to constants don't happen and invalidate
all opt_getinlinecache blocks on any and all constant writes.
Use alignment padding on block_t to track this assumption. No change to
sizeof(block_t).
* Fix compile warnings when not RUBY_DEBUG
* Fix reversed condition
* Switch to st_table to keep track of assumptions
Co-authored-by: Aaron Patterson <aaron.patterson@gmail.com>
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
* uJIT: don't compile tailcalls
* Don't compile calls to protected methods
We need to generate extra code to check whether the call goes through if
we want to support these.
* Fix copy pasta
* Update blockids in branches
* Update dependencies
* Tie lifetime of uJIT blocks to iseqs
Blocks weren't being freed when iseqs are collected.
* Add rb_dary. Use it for method dependency table
* Keep track of blocks per iseq
Remove global version_tbl
* Block version bookkeeping fix
* dary -> darray
* free ujit_blocks
* comment about size of ujit_blocks
For the `test-bundled-gems`, make `debug.so` with extconf.rb and
`make` command directly because `rake-compiler` assume ruby is
installed (but `test-bundled-gems` can run without installation).
Formerly, TypeProf was tested with the latest RBS code during
`make test-bundled-gems`. However, when a new version of rbs is
released, and if it is incompatible with TypeProf,
`make test-bundled-gems` starts failing, which was annoying.
By this change, TypeProf is tested with the bundled version of RBS.
Formerly, TypeProf was tested with the latest RBS code during
`make test-bundled-gems`. However, when a new version of rbs is
released, and if it is incompatible with TypeProf,
`make test-bundled-gems` starts failing, which was annoying.
By this change, TypeProf is tested with the bundled version of RBS.
- Change Unicode version to 13.0.0
- Change Emoji version to 13.0
- Adjust to moved locations of emoji-data.txt and emoji-variation-sequences.txt
by splitting these files from $(UNICODE_EMOJI_FILES) and putting them into
a new group $(UNICODE_UCD_EMOJI_FILES)
Redo of 34a2acdac788602c14bf05fb616215187badd504 and
931138b00696419945dc03e10f033b1f53cd50f3 which were reverted.
GitHub PR #4340.
This change implements a cache for class variables. Previously there was
no cache for cvars. Cvar access is slow due to needing to travel all the
way up th ancestor tree before returning the cvar value. The deeper the
ancestor tree the slower cvar access will be.
The benefits of the cache are more visible with a higher number of
included modules due to the way Ruby looks up class variables. The
benchmark here includes 26 modules and shows with the cache, this branch
is 6.5x faster when accessing class variables.
```
compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105c) [x86_64-darwin19]
built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be009) [x86_64-darwin19]
| |compare-ruby|built-ruby|
|:--------|-----------:|---------:|
|vm_cvar | 5.681M| 36.980M|
| | -| 6.51x|
```
Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails
application. ActiveRecord::Base.logger has 71 ancestors. The more
ancestors a tree has, the more clear the speed increase. IE if Base had
only one ancestor we'd see no improvement. This benchmark is run on a
vanilla Rails application.
Benchmark code:
```ruby
require "benchmark/ips"
require_relative "config/environment"
Benchmark.ips do |x|
x.report "logger" do
ActiveRecord::Base.logger
end
end
```
Ruby 3.0 master / Rails 6.1:
```
Warming up --------------------------------------
logger 155.251k i/100ms
Calculating -------------------------------------
```
Ruby 3.0 with cvar cache / Rails 6.1:
```
Warming up --------------------------------------
logger 1.546M i/100ms
Calculating -------------------------------------
logger 14.857M (± 4.8%) i/s - 74.198M in 5.006202s
```
Lastly we ran a benchmark to demonstate the difference between master
and our cache when the number of modules increases. This benchmark
measures 1 ancestor, 30 ancestors, and 100 ancestors.
Ruby 3.0 master:
```
Warming up --------------------------------------
1 module 1.231M i/100ms
30 modules 432.020k i/100ms
100 modules 145.399k i/100ms
Calculating -------------------------------------
1 module 12.210M (± 2.1%) i/s - 61.553M in 5.043400s
30 modules 4.354M (± 2.7%) i/s - 22.033M in 5.063839s
100 modules 1.434M (± 2.9%) i/s - 7.270M in 5.072531s
Comparison:
1 module: 12209958.3 i/s
30 modules: 4354217.8 i/s - 2.80x (± 0.00) slower
100 modules: 1434447.3 i/s - 8.51x (± 0.00) slower
```
Ruby 3.0 with cvar cache:
```
Warming up --------------------------------------
1 module 1.641M i/100ms
30 modules 1.655M i/100ms
100 modules 1.620M i/100ms
Calculating -------------------------------------
1 module 16.279M (± 3.8%) i/s - 82.038M in 5.046923s
30 modules 15.891M (± 3.9%) i/s - 79.459M in 5.007958s
100 modules 16.087M (± 3.6%) i/s - 81.005M in 5.041931s
Comparison:
1 module: 16279458.0 i/s
100 modules: 16087484.6 i/s - same-ish: difference falls within error
30 modules: 15891406.2 i/s - same-ish: difference falls within error
```
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
by a race condition by multiple Ractors.
Atmically incrementing body->total_calls may have its own cost, so for
now we intentionally leave the unreliable total_calls. So we allow an
ISeq to be never pushed when you use multiple Ractors. However, if you
enqueue a single ccan node twice, get_from_list loops infinitely. Thus
this patch takes care of such a situation.
Rational literals are those integers suffixed with `r`. They tend to
be a part of more complex expressions like `123/456r`, but in theory
they can live alone. When such "bare" rational literals are passed to
case-when branch, we have to take care of them. Fixes [Bug #17854]
This change implements a cache for class variables. Previously there was
no cache for cvars. Cvar access is slow due to needing to travel all the
way up th ancestor tree before returning the cvar value. The deeper the
ancestor tree the slower cvar access will be.
The benefits of the cache are more visible with a higher number of
included modules due to the way Ruby looks up class variables. The
benchmark here includes 26 modules and shows with the cache, this branch
is 6.5x faster when accessing class variables.
```
compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105ca45) [x86_64-darwin19]
built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be0093ae) [x86_64-darwin19]
| |compare-ruby|built-ruby|
|:--------|-----------:|---------:|
|vm_cvar | 5.681M| 36.980M|
| | -| 6.51x|
```
Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails
application. ActiveRecord::Base.logger has 71 ancestors. The more
ancestors a tree has, the more clear the speed increase. IE if Base had
only one ancestor we'd see no improvement. This benchmark is run on a
vanilla Rails application.
Benchmark code:
```ruby
require "benchmark/ips"
require_relative "config/environment"
Benchmark.ips do |x|
x.report "logger" do
ActiveRecord::Base.logger
end
end
```
Ruby 3.0 master / Rails 6.1:
```
Warming up --------------------------------------
logger 155.251k i/100ms
Calculating -------------------------------------
```
Ruby 3.0 with cvar cache / Rails 6.1:
```
Warming up --------------------------------------
logger 1.546M i/100ms
Calculating -------------------------------------
logger 14.857M (± 4.8%) i/s - 74.198M in 5.006202s
```
Lastly we ran a benchmark to demonstate the difference between master
and our cache when the number of modules increases. This benchmark
measures 1 ancestor, 30 ancestors, and 100 ancestors.
Ruby 3.0 master:
```
Warming up --------------------------------------
1 module 1.231M i/100ms
30 modules 432.020k i/100ms
100 modules 145.399k i/100ms
Calculating -------------------------------------
1 module 12.210M (± 2.1%) i/s - 61.553M in 5.043400s
30 modules 4.354M (± 2.7%) i/s - 22.033M in 5.063839s
100 modules 1.434M (± 2.9%) i/s - 7.270M in 5.072531s
Comparison:
1 module: 12209958.3 i/s
30 modules: 4354217.8 i/s - 2.80x (± 0.00) slower
100 modules: 1434447.3 i/s - 8.51x (± 0.00) slower
```
Ruby 3.0 with cvar cache:
```
Warming up --------------------------------------
1 module 1.641M i/100ms
30 modules 1.655M i/100ms
100 modules 1.620M i/100ms
Calculating -------------------------------------
1 module 16.279M (± 3.8%) i/s - 82.038M in 5.046923s
30 modules 15.891M (± 3.9%) i/s - 79.459M in 5.007958s
100 modules 16.087M (± 3.6%) i/s - 81.005M in 5.041931s
Comparison:
1 module: 16279458.0 i/s
100 modules: 16087484.6 i/s - same-ish: difference falls within error
30 modules: 15891406.2 i/s - same-ish: difference falls within error
```
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
* Add a benchmark-driver runner for Ractor
* Process.clock_gettime(Process:CLOCK_MONOTONIC) could be slow
in Ruby 3.0 Ractor
* Fetching Time could also be slow
* Fix a comment
* Assert overriding a private method
* Rename `rb_scheduler` to `rb_fiber_scheduler`.
* Use public interface if available.
* Use `rb_check_funcall` where possible.
* Don't use `unblock` unless the fiber was non-blocking.
Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
Partially reversing a4f3e1762a like 21df4dce53.
We usually run them through make check which has the dependency, and test-all and test-spec without the dependency are useful for running only individual tests.
run, runruby, ... accept RUNOPT and RUNOPT0 configuration to pass
some commandline argument like that:
$(BTESTRUBY) $(RUNOPT0) $(TESTRUN_SCRIPT) $(RUNOPT)
RUNOPT0 is options for ruby interpreter (-w, -v, ...)
RUNOPT is options for the script (ARGV/ARGF)
accessing theap needs complicating synchronization but it reduce
performance on multi-ractor mode. So simply stop using theap
on multi-ractor mode. In future, theap should be replaced with
more cleaver memory strategy.
To manage ractor-local data for C extension, the following APIs
are defined.
* rb_ractor_local_storage_value_newkey
* rb_ractor_local_storage_value
* rb_ractor_local_storage_value_set
* rb_ractor_local_storage_ptr_newkey
* rb_ractor_local_storage_ptr
* rb_ractor_local_storage_ptr_set
At first, you need to create a key of storage by
rb_ractor_local_(value|ptr)_newkey().
For ptr storage, it accepts the type of storage,
how to mark and how to free with ractor's lifetime.
rb_ractor_local_storage_value/set are used to access a VALUE
and rb_ractor_local_storage_ptr/set are used to access a pointer.
random.c uses this API.
* memory_view.c: remove a reference in view->obj at rb_memory_view_release
* memory_view.c: keep references of memory-view-exported objects
* Update common.mk
* memory_view.c: Use st_update
To make some kind of Ractor related extensions, some functions
should be exposed.
* include/ruby/thread_native.h
* rb_native_mutex_*
* rb_native_cond_*
* include/ruby/ractor.h
* RB_OBJ_SHAREABLE_P(obj)
* rb_ractor_shareable_p(obj)
* rb_ractor_std*()
* rb_cRactor
and rm ractor_pub.h
and rename srcdir/ractor.h to srcdir/ractor_core.h
(to avoid conflict with include/ruby/ractor.h)