Object Shapes is used for accessing instance variables and representing the
"frozenness" of objects. Object instances have a "shape" and the shape
represents some attributes of the object (currently which instance variables are
set and the "frozenness"). Shapes form a tree data structure, and when a new
instance variable is set on an object, that object "transitions" to a new shape
in the shape tree. Each shape has an ID that is used for caching. The shape
structure is independent of class, so objects of different types can have the
same shape.
For example:
```ruby
class Foo
def initialize
# Starts with shape id 0
@a = 1 # transitions to shape id 1
@b = 1 # transitions to shape id 2
end
end
class Bar
def initialize
# Starts with shape id 0
@a = 1 # transitions to shape id 1
@b = 1 # transitions to shape id 2
end
end
foo = Foo.new # `foo` has shape id 2
bar = Bar.new # `bar` has shape id 2
```
Both `foo` and `bar` instances have the same shape because they both set
instance variables of the same name in the same order.
This technique can help to improve inline cache hits as well as generate more
efficient machine code in JIT compilers.
This commit also adds some methods for debugging shapes on objects. See
`RubyVM::Shape` for more details.
For more context on Object Shapes, see [Feature: #18776]
Co-Authored-By: Aaron Patterson <tenderlove@ruby-lang.org>
Co-Authored-By: Eileen M. Uchitelle <eileencodes@gmail.com>
Co-Authored-By: John Hawthorn <john@hawthorn.email>
Object Shapes is used for accessing instance variables and representing the
"frozenness" of objects. Object instances have a "shape" and the shape
represents some attributes of the object (currently which instance variables are
set and the "frozenness"). Shapes form a tree data structure, and when a new
instance variable is set on an object, that object "transitions" to a new shape
in the shape tree. Each shape has an ID that is used for caching. The shape
structure is independent of class, so objects of different types can have the
same shape.
For example:
```ruby
class Foo
def initialize
# Starts with shape id 0
@a = 1 # transitions to shape id 1
@b = 1 # transitions to shape id 2
end
end
class Bar
def initialize
# Starts with shape id 0
@a = 1 # transitions to shape id 1
@b = 1 # transitions to shape id 2
end
end
foo = Foo.new # `foo` has shape id 2
bar = Bar.new # `bar` has shape id 2
```
Both `foo` and `bar` instances have the same shape because they both set
instance variables of the same name in the same order.
This technique can help to improve inline cache hits as well as generate more
efficient machine code in JIT compilers.
This commit also adds some methods for debugging shapes on objects. See
`RubyVM::Shape` for more details.
For more context on Object Shapes, see [Feature: #18776]
Co-Authored-By: Aaron Patterson <tenderlove@ruby-lang.org>
Co-Authored-By: Eileen M. Uchitelle <eileencodes@gmail.com>
Co-Authored-By: John Hawthorn <john@hawthorn.email>
`lldb_cruby.py` manages lldb custom commands using functions. The file
is a large list of Python functions, and an init handler to map some of
the Python functions into the debugger, to enable execution of custom
logic during a debugging session.
Since LLDB 3.7 (September 2015) there has also been support for using
python classes rather than bare functions, as long as those classes
implement a specific interface.
This PR Introduces some more defined structure to the LLDB helper
functions by switching from the function based implementation to the
class based one, and providing an auto-loading mechanism by which new
functions can be loaded.
The intention behind this change is to make working with the LLDB
helpers easier, by reducing code duplication, providing a consistent
structure and a clearer API for developers.
The current function based approach has some advantages and
disadvantages
Advantages:
- Adding new code is easy.
- All the code is self contained and searchable.
Disadvantages:
- No visible organisation of the file contents. This means
- Hard to tell which functions are utility functions and which are
available to you in a debugging session
- Lots of code duplication within lldb functions
- Large files quickly become intimidating to work with - for example,
`lldb_disasm.py` was implemented as a seperate Python module because
it was easier to start with a clean slate than add significant amounts
of code to `lldb_cruby.py`
This PR attempts, to fix the disadvantages of the current approach and
maintain, or enhance, the benefits. The new structure of a command looks
like this;
```
class TestCommand(RbBaseCommand):
# program is the keyword the user will type in lldb to execute this command
program = "test"
# help_string will be displayed in lldb when the user uses the help functions
help_string = "This is a test command to show how to implement lldb commands"
# call is where our command logic will be implemented
def call(self, debugger, command, exe_ctx, result):
pass
```
If the command fulfils the following criteria it will then be
auto-loaded when an lldb session is started:
- The package file must exist inside the `commands` directory and the
filename must end in `_command.py`
- The package must implement a class whose name ends in `Command`
- The class inherits from `RbBaseCommand` or at minimum a class that
shares the same interface as `RbBaseCommand` (at minimum this means
defining `__init__` and `__call__`, and using `__call__` to call
`call` which is defined in the subclasses).
- The class must have a class variable `package` that is a String. This
is the name of the command you'll call in the `lldb` debugger.
This commit makes `rp` report the correct array length in lldb.
When USING_RVARGC is set we use 7 bits of the flags to store the array
len rather than the usual 2, so they need to be part of the mask when
calculating the length in lldb.
When calculating whether rvargc is enabled I've used the same approach
that's used by `GC.using_rvargc?` which is to detect whether there is
more than one size pool in the current objspace.
rb_backtrace relies on the existend of RUBY_T_MASK. This is set up by
the global loading code in lldb_init()
rb_backtrace does not call lldb_init previously, and therefore would
only work if called after another lldb function that _did_ load the
globals.
Now that classes are using VWA, the RCLASS_PTR uses an offset to get the
rb_classext_t object. Doing this all the time in lldb is boring. So
script lldb to do it for us
In December 2021, we opened an [issue] to solicit feedback regarding the
porting of the YJIT codebase from C99 to Rust. There were some
reservations, but this project was given the go ahead by Ruby core
developers and Matz. Since then, we have successfully completed the port
of YJIT to Rust.
The new Rust version of YJIT has reached parity with the C version, in
that it passes all the CRuby tests, is able to run all of the YJIT
benchmarks, and performs similarly to the C version (because it works
the same way and largely generates the same machine code). We've even
incorporated some design improvements, such as a more fine-grained
constant invalidation mechanism which we expect will make a big
difference in Ruby on Rails applications.
Because we want to be careful, YJIT is guarded behind a configure
option:
```shell
./configure --enable-yjit # Build YJIT in release mode
./configure --enable-yjit=dev # Build YJIT in dev/debug mode
```
By default, YJIT does not get compiled and cargo/rustc is not required.
If YJIT is built in dev mode, then `cargo` is used to fetch development
dependencies, but when building in release, `cargo` is not required,
only `rustc`. At the moment YJIT requires Rust 1.60.0 or newer.
The YJIT command-line options remain mostly unchanged, and more details
about the build process are documented in `doc/yjit/yjit.md`.
The CI tests have been updated and do not take any more resources than
before.
The development history of the Rust port is available at the following
commit for interested parties:
1fd9573d8b
Our hope is that Rust YJIT will be compiled and included as a part of
system packages and compiled binaries of the Ruby 3.2 release. We do not
anticipate any major problems as Rust is well supported on every
platform which YJIT supports, but to make sure that this process works
smoothly, we would like to reach out to those who take care of building
systems packages before the 3.2 release is shipped and resolve any
issues that may come up.
[issue]: https://bugs.ruby-lang.org/issues/18481
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
Co-authored-by: Noah Gibbs <the.codefolio.guy@gmail.com>
Co-authored-by: Kevin Newton <kddnewton@gmail.com>
Commit dde164e968 decoupled incremental
marking from page sizes. This commit changes Ruby heap page sizes to
64KiB. Doing so will have several benefits:
1. We can use compaction on systems with 64KiB system page sizes (e.g.
PowerPC).
2. Larger page sizes will allow Variable Width Allocation to increase
slot sizes and embed larger objects.
3. Since commit 002fa28599, macOS has 64
KiB pages. Making page sizes 64 KiB will bring these systems to
parity.
I have attached some bechmark results below.
Discourse:
On Discourse, we saw much better p99 performance (e.g. for "categories"
it went from 214ms on master to 134ms on branch, for "home" it went
from 265ms to 251ms). We don’t see much change in p60, p75, and p90
performance. We also see a slight decrease in memory usage by 1.04x.
Branch RSS: 354.9MB
Master RSS: 368.2MB
railsbench:
On rails bench, we don’t see a big change in RPS or p99
performance. We don’t see a big difference in memory usage.
Branch RPS: 826.27
Master RPS: 824.85
Branch p99: 1.67
Master p99: 1.72
Branch RSS: 88.72MB
Master RSS: 88.48MB
liquid:
We don’t see a significant change in liquid performance.
Branch parse & render: 28.653 I/s
Master parse & render: 28.563 i/s
Previously, YJIT assumed that basic blocks never consume more than
1 KiB of memory. This assumption does not hold for long Ruby methods
such as the one in the following:
```ruby
eval(<<RUBY)
def set_local_a_lot
#{'_=0;'*0x40000}
end
RUBY
set_local_a_lot
```
For low `--yjit-exec-mem-size` values, one basic block could exhaust the
entire buffer.
Introduce a new field `codeblock_t::dropped_bytes` that the assembler
sets whenever it runs out of space. Check this field in
gen_single_block() to respond to out of memory situations and other
error conditions. This design avoids making the control flow graph of
existing code generation functions more complex.
Use POSIX shell in misc/test_yjit_asm.sh since bash is expanding
`0%/*/*` differently.
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
Some platforms don't want memory to be marked as writeable and
executable at the same time. When we write to the code block, we
calculate the OS page that the buffer position maps to. Then we call
`mprotect` to allow writes on that particular page. As an optimization,
we cache the "last written" aligned page which allows us to amortize the
cost of the `mprotect` call. In other words, sequential writes to the
same page will only call `mprotect` on the page once.
When we're done writing, we call `mprotect` on the entire JIT buffer.
This means we don't need to keep track of which pages were marked as
writeable, we let the OS take care of that.
Co-authored-by: John Hawthorn <john@hawthorn.email>
* YJIT: use shorter encoding for mov(r64,imm) when unambiguous
Previously, for small constants such as `mov(RAX, imm_opnd(Qundef))`,
we emit an instruction with an 8-byte immediate. This form commonly
gets the `movabs` mnemonic.
In 64-bit mode, 32-bit operands get zero extended to 64-bit to fill the
register, so when the immediate is small enough, we can save 4 bytes by
using the `mov` variant that takes a 32-bit immediate and does a zero
extension.
Not implement with this change, there is an imm32 variant of `mov` that
does sign extension we could use. When the constant is negative, we
fallback to the `movabs` form.
In railsbench, this change yields roughly a 12% code size reduction for
the outlined block.
Co-authored-by: Jemma Issroff <jemmaissroff@gmail.com>
* [ci skip] comment edit. Please squash.
Co-authored-by: Jemma Issroff <jemmaissroff@gmail.com>
* New code page allocation logic
* Fix leaked globals
* Fix leaked symbols, yjit asm tests
* Make COUNTED_EXIT take a jit argument, so we can eliminate global ocb
* Remove extra whitespace
* Change block start_pos/end_pos to be pointers instead of uint32_t
* Change branch end_pos and start_pos to end_addr, start_addr
On non RUBY_DEBUG builds, assert() compiles to nothing and the compiler
warns about uninitialized variables in those code paths. Replace
those asserts with rb_bug() to fix the warnings and do the assert in
all builds. Since yjit_asm_tests.c compiles outside of Ruby, it needed
a distinct version of rb_bug().
Also put YJIT_STATS check for function delcaration that is only defined
in YJIT_STATS builds.
This script is an lldb helper that just loops through all the comments
stored and prints out the comment along with the address corresponding
to the comment.
For example, I'm crashing in JIT code at address 0x0000000110000168.
Using the `lc` helper I can see that it's probably crashing inside the
exit back to the interpreter
```
(lldb) bt 5
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x22220021)
frame #0: 0x0000000110000168
* frame #1: 0x00000001002b5ff5 miniruby`invoke_block_from_c_bh [inlined] invoke_block(ec=0x0000000100e05350, iseq=0x0000000100c1ff10, self=0x0000000100c76cc0, captured=<unavailable>, cref=0x0000000000000000, type=<unavailable>, opt_pc=<unavailable>) at vm.c:1268:12
frame #2: 0x00000001002b5f7d miniruby`invoke_block_from_c_bh [inlined] invoke_iseq_block_from_c(ec=<unavailable>, captured=<unavailable>, self=0x0000000100c76cc0, argc=2, argv=<unavailable>, kw_splat=0, passed_block_handler=0x0000000000000000, cref=0x0000000000000000, is_lambda=<unavailable>, me=0x0000000000000000) at vm.c:1340
frame #3: 0x00000001002b5e14 miniruby`invoke_block_from_c_bh(ec=<unavailable>, block_handler=<unavailable>, argc=<unavailable>, argv=<unavailable>, kw_splat=0, passed_block_handler=0x0000000000000000, cref=0x0000000000000000, is_lambda=<unavailable>, force_blockarg=0) at vm.c:1358
frame #4: 0x000000010029860b miniruby`rb_yield_values(n=<unavailable>) at vm_eval.c:0
(lldb) lc
0x11000006d "putobject_INT2FIX_1_"
0x110000083 "leave"
0x110000087 "check for interrupts"
0x110000087 "RUBY_VM_CHECK_INTS(ec)"
0x110000098 "check for finish frame"
0x1100000ed "getlocal_WC_0"
0x110000107 "getlocal_WC_1"
0x11000012a "opt_send_without_block"
0x110000139 "opt_send_without_block"
0x11000013c "exit to interpreter"
```
This commits implements size classes in the GC for the Variable Width
Allocation feature. Unless `USE_RVARGC` compile flag is set, only a
single size class is created, maintaining current behaviour. See the
redmine ticket for more details.
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
This commit removes T_PAYLOAD since the new VWA implementation no longer
requires T_PAYLOAD types.
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
This commits implements size classes in the GC for the Variable Width
Allocation feature. Unless `USE_RVARGC` compile flag is set, only a
single size class is created, maintaining current behaviour. See the
redmine ticket for more details.
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
This commit removes T_PAYLOAD since the new VWA implementation no longer
requires T_PAYLOAD types.
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
rather than having to do this in a two step process:
1. heap_page obj
2. dump_page $2 (or whatever lldb variable heap_page set)
we can now just
dump_page_rvalue obj
Update the lldb script so it can mostly recover a Ruby stack trace from
a core file. It's still missing line numbers and dealing with CFUNCs,
but you use it like this:
```
(lldb) rbbt ec
rb_control_frame_t TYPE
0x7f6fd6555fa0 EVAL ./bootstraptest/runner.rb error!!
0x7f6fd6555f68 METHOD ./bootstraptest/runner.rb main
0x7f6fd6555f30 METHOD ./bootstraptest/runner.rb in_temporary_working_directory
0x7f6fd6555ef8 METHOD /home/aaron/git/ruby/lib/tmpdir.rb mktmpdir
0x7f6fd6555ec0 BLOCK ./bootstraptest/runner.rb block in in_temporary_working_directory
0x7f6fd6555e88 CFUNC
0x7f6fd6555e50 BLOCK ./bootstraptest/runner.rb block (2 levels) in in_temporary_working_directory
0x7f6fd6555e18 BLOCK ./bootstraptest/runner.rb block in main
0x7f6fd6555de0 METHOD ./bootstraptest/runner.rb exec_test
0x7f6fd6555da8 CFUNC
0x7f6fd6555d70 BLOCK ./bootstraptest/runner.rb block in exec_test
0x7f6fd6555d38 CFUNC
0x7f6fd6555d00 TOP /home/aaron/git/ruby/bootstraptest/test_insns.rb error!!
0x7f6fd6555cc8 CFUNC
0x7f6fd6555c90 BLOCK /home/aaron/git/ruby/bootstraptest/test_insns.rb block in <top (required)>
0x7f6fd6555c58 METHOD ./bootstraptest/runner.rb assert_equal
0x7f6fd6555c20 METHOD ./bootstraptest/runner.rb assert_check
0x7f6fd6555be8 METHOD ./bootstraptest/runner.rb show_progress
0x7f6fd6555bb0 METHOD ./bootstraptest/runner.rb with_stderr
0x7f6fd6555b78 BLOCK ./bootstraptest/runner.rb block in show_progress
0x7f6fd6555b40 BLOCK ./bootstraptest/runner.rb block in assert_check
0x7f6fd6555b08 METHOD ./bootstraptest/runner.rb get_result_string
0x7f6fd6555ad0 METHOD ./bootstraptest/runner.rb make_srcfile
0x7f6fd6555a98 CFUNC
0x7f6fd6555a60 BLOCK ./bootstraptest/runner.rb block in make_srcfile
```
Getting the main execution context is difficult (it is stored in a
thread local) so for now you must supply an ec and this will make a
backtrace
This fixes the lldb disassembler script so that it doesn't need a live
process when disassembling rb_iseq_t. I also added the PC to the output
so you can tell what the VM is executing when it crashed.
For example:
```
(lldb) rbdisasm ec->cfp->iseq
PC IDX insn_name(operands)
0x56039f0a1720 0000 nop
0x56039f0a1728 0001 getlocal_WC_1( 5 )
0x56039f0a1738 0003 branchunless( 7 )
0x56039f0a1748 0005 getlocal_WC_0( 3 )
0x56039f0a1758 0007 putstring( (VALUE)0x56039f0c7eb8 )
0x56039f0a1768 0009 opt_send_without_block( (struct rb_call_data *)0x56039f09f140 )
0x56039f0a1778 0011 pop
0x56039f0a1780 0012 getglobal( ID: 0x7fd7 )
0x56039f0a1790 0014 branchunless( 7 )
0x56039f0a17a0 0016 getlocal_WC_0( 3 )
0x56039f0a17b0 0018 putstring( (VALUE)0x56039f0c7e90 )
0x56039f0a17c0 0020 opt_send_without_block( (struct rb_call_data *)0x56039f09f150 )
0x56039f0a17d0 0022 pop
0x56039f0a17d8 0023 getlocal_WC_0( 3 )
0x56039f0a17e8 0025 putobject( (VALUE)0x56039f0c7e68 )
0x56039f0a17f8 0027 getlocal_WC_1( 6 )
0x56039f0a1808 0029 dup
0x56039f0a1810 0030 checktype( 5 )
0x56039f0a1820 0032 branchif( 4 )
0x56039f0a1830 0034 dup
0x56039f0a1838 0035 opt_send_without_block( (struct rb_call_data *)0x56039f09f160 )
0x56039f0a1848 0037 tostring
0x56039f0a1850 0038 putobject( (VALUE)0x56039f0c7e40 )
0x56039f0a1860 0040 concatstrings( 3 )
0x56039f0a1870 0042 opt_send_without_block( (struct rb_call_data *)0x56039f09f170 )
0x56039f0a1880 0044 nop
0x56039f0a1888 0045 leave
(lldb) p ec->cfp->pc
(const VALUE *) $146 = 0x000056039f0a1848
```
Here we can see the VM is currently executing `opt_send_without_block`
(because the PC is one ahead of the current instruction)
I need to disassemble instruction sequences while debugging, so I wrote
this.
Usage is like this:
```
(lldb) p iseq
(rb_iseq_t *) $147 = 0x0000000101068400
(lldb) rbdisasm iseq
0000 putspecialobject( 3 )
0002 putnil
0003 defineclass( ID: 0x560b, (rb_iseq_t *)0x1010681d0, 2 )
0007 pop
0008 putspecialobject( 3 )
0010 putnil
0011 defineclass( ID: 0x56eb, (rb_iseq_t *)0x101063b58, 2 )
0015 leave
```
Also thanks a ton to @kivikakk helping me figure out how to navigate LLDB's Python 😆
Show the size of String.
To see the whole contents even after NUL char:
```
(lldb) rp str
(const char [5]) $1 = "x"
(lldb) memory read -s1 --format x --count `sizeof($1)` -- &$1 0x1010457a8: 0x78 0x00 0x61 0x61 0x61
```
This fixes the following in my environment:
misc/expand_tabs.rb:29:in `=~': invalid byte sequence in US-ASCII (ArgumentError)
This switches from =~ to start_with? as a regular expression is
not actually needed here.
This is implemented to close [Misc #16112] because all other options got
at least one objection, and nobody has objected to this solution.
This code is a little complicated for the purpose, but that's just
because it includes some historical code for auto-style.rb:
918a7c31b6/bin/auto-style.rb
Please feel free to improve this file as you like.
[Misc #16112]