With verbopse mode (-w), the interpreter shows a warning if
a block is passed to a method which does not use the given block.
Warning on:
* the invoked method is written in C
* the invoked method is not `initialize`
* not invoked with `super`
* the first time on the call-site with the invoked method
(`obj.foo{}` will be warned once if `foo` is same method)
[Feature #15554]
`Primitive.attr! :use_block` is introduced to declare that primitive
functions (written in C) will use passed block.
For minitest, test needs some tweak, so use
ea9caafc07
for `test-bundled-gems`.
Using rb_gc_mark_movable and a reference update function, we can make
frame infos movable in memory, and avoid pinning frame info backtraces.
```
require "objspace"
exceptions = []
GC.disable
50_000.times do
begin
raise "some exception"
rescue => exception
exception.backtrace_locations
exceptions << exception
end
end
GC.enable
GC.compact
p ObjectSpace.dump_all(output: :string).lines.grep(/"pinned":true/).count
```
Co-authored-by: Peter Zhu <peter@peterzhu.ca>
[Feature #13557]
Setting the backtrace with an array of strings is lossy. The resulting
exception will return nil on `#backtrace_locations`.
By accepting an array of `Backtrace::Location` instance, we can rebuild
a `Backtrace` instance and have a fully functioning Exception.
Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>
This `st_table` is used to both mark and pin classes
defined from the C API. But `vm->mark_object_ary` already
does both much more efficiently.
Currently a Ruby process starts with 252 rooted classes,
which uses `7224B` in an `st_table` or `2016B` in an `RArray`.
So a baseline of 5kB saved, but since `mark_object_ary` is
preallocated with `1024` slots but only use `405` of them,
it's a net `7kB` save.
`vm->mark_object_ary` is also being refactored.
Prior to this changes, `mark_object_ary` was a regular `RArray`, but
since this allows for references to be moved, it was marked a second
time from `rb_vm_mark()` to pin these objects.
This has the detrimental effect of marking these references on every
minors even though it's a mostly append only list.
But using a custom TypedData we can save from having to mark
all the references on minor GC runs.
Addtionally, immediate values are now ignored and not appended
to `vm->mark_object_ary` as it's just wasted space.
This frees FL_USER0 on both T_MODULE and T_CLASS.
Note: prior to this, FL_SINGLETON was never set on T_MODULE,
so checking for `FL_SINGLETON` without first checking that
`FL_TYPE` was `T_CLASS` was valid. That's no longer the case.
```
test.rb:1:in 'Object#toplevel_meth': unhandled exception
from test.rb:4:in 'Foo.class_meth'
from test.rb:6:in 'Foo#instance_meth'
from test.rb:11:in 'singleton_meth'
from test.rb:13:in '<main>'
```
[Feature #19117]
Instead of having iseq and cfunc separately, this change lets
Thread::Backtrace::Location have them together as
rb_callable_method_entry_t.
This is a refactoring, but also a preparation for implementing
[Feature #19117].
When using M:N threads, EC is set to NULL in the shared native thread
when nothing is scheduled. This previously caused a segfault when we try
to examine the EC.
Returning 0 instead means we may miss profiling information, but a
profiler relying on this isn't thread aware anyways, and observing that
"nothing" is running is probably correct.
Fixes [Bug #20017]
Co-authored-by: Dustin Brown <dbrown9@gmail.com>
rb_backtrace_t is 32B, so it fits well in a 80B slot.
There is some unused spaces but given Backtrace objects are
rarely held onto it should be inconsequential and avoid
the malloc churn.
Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>
The struct is 16B, so they will use the 80B size pool, so on paper it
wastes 80 - 32 - 16 = 52B, however most malloc implementations will
either pad sizes or use an extra 16B for each segment, so in practice
the waste isn't that big. Also `Backtrace::Location` are rarely held
on for long, so avoiding the malloc churn help performance.
Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>
Add a new API rb_profile_thread_frames(), which is essentialy a
per-thread version of rb_profile_frames().
While the original rb_profile_frames() always returns results about the
current active thread obtained by GET_EC(), this new API takes a Thread
to be profiled as an argument.
This should come in handy when profiling I/O-bound programs such as
webapps, since this new API allows us to learn about Threads performing
I/O (which do not have the GVL).
Profiling worker threads (such as Sidekiq workers) may be another
application.
Implements [Feature #10602]
Co-authored-by: Mike Perham <mike@perham.net>
Frames pushed by YJIT have an unreliable PC. The PC could be garbage,
and if we try to read the line number with a garbage PC, then the
program can crash.
This commit returns line 0 for programs where there is a `jit_return`
function. If `jit_return` has been set then this frame was pushed by
the JIT, and we cannot trust the PC.
Here is a debugger session for a program that crashed due to a broken
PC:
```
(lldb) p ruby_current_vm_ptr->ractor.main_thread->ec->cfp->iseq->body->iseq_encoded
(VALUE *) $0 = 0x0000000118a30e00
(lldb) p/x ruby_current_vm_ptr->ractor.main_thread->ec->cfp->pc
(const VALUE *) $1 = 0x0000600000b02d00
(lldb) p/x ruby_current_vm_ptr->ractor.main_thread->ec->cfp->jit_return
(void *) $2 = 0x000000010622942c
```
You can see the PC is completely out of range, but there is a
`jit_return` pointer so we can avoid this crash.
According to the C99 specification section 7.20.3.2 paragraph 2:
> If ptr is a null pointer, no action occurs.
So we do not need to check that the pointer is a null pointer.
Right now the attached object is stored as an instance variable
and all the call sites that either get or set it have to know how it's
stored.
It's preferable to hide this implementation detail behind accessors
so that it is easier to change how it's stored.
Backtrace objects hold references to:
* iseqs - via the captured locations
* strary - a lazily allocated array of strings
* locary - a lazily allocated array of backtrace locations
Co-authored-by: Adam Hess <HParker@github.com>
StackProf uses a signal handler to call `rb_profile_frames`. Signals
are delivered to threads randomly, and can be delivered after the thread
has been created but before the CFP has been established on the EC.
This commit returns early if there is no CFP to use.
Here is some info from the core files we are seeing. Here you can see
the CFP on the current EC is 0x0:
```
(gdb) p ruby_current_ec
$20 = (struct rb_execution_context_struct *) 0x7f3481301b50
(gdb) p ruby_current_ec->cfp
$21 = (rb_control_frame_t *) 0x0
```
Here is where VM_FRAME_CFRAME_P gets a 0x0 CFP:
```
6 VM_FRAME_CFRAME_P (cfp=0x0) at vm_core.h:1350
7 VM_FRAME_RUBYFRAME_P (cfp=<optimized out>) at vm_core.h:1350
8 rb_profile_frames (start=0, limit=2048, buff=0x7f3493809590, lines=0x7f349380d590) at vm_backtrace.c:1587
```
Down the stack we can see this is happening after thread creation:
```
19 0x00007f3495bf9420 in <signal handler called> () at /lib/x86_64-linux-gnu/libpthread.so.0
20 0x000055d531574e55 in thread_start_func_2 (th=<optimized out>, stack_start=<optimized out>) at thread.c:676
21 0x000055d531575b31 in thread_start_func_1 (th_ptr=<optimized out>) at thread_pthread.c:1170
22 0x00007f3495bed609 in start_thread (arg=<optimized out>) at pthread_create.c:477
23 0x00007f3495b12133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
```
The following new debug context APIs are for implementing debugger's
`next` (step over) and similar functionality.
* `rb_debug_inspector_frame_depth(dc, index)` returns `index`-th
frame's depth.
* `rb_debug_inspector_current_depth()` returns current frame depth.
The frame depth is not related to the frame index because debug
context API skips some special frames but proposed `_depth()` APIs
returns the count of all frames (raw depth).
This patch pushes dummy frames when loading code for the
profiling purpose.
The following methods push a dummy frame:
* `Kernel#require`
* `Kernel#load`
* `RubyVM::InstructionSequence.compile_file`
* `RubyVM::InstructionSequence.load_from_binary`
https://bugs.ruby-lang.org/issues/18559
The `rb_profile_frames` API did not skip the two dummy frames that
each thread has at its beginning. This was unlike `backtrace_each` and
`rb_ec_parcial_backtrace_object`, which do skip them.
This does not seem to be a problem for non-main thread frames,
because both `VM_FRAME_RUBYFRAME_P(cfp)` and
`rb_vm_frame_method_entry(cfp)` are NULL for them.
BUT, on the main thread `VM_FRAME_RUBYFRAME_P(cfp)` was true
and thus the dummy thread was still included in the output of
`rb_profile_frames`.
I've now made `rb_profile_frames` skip this extra frame (like
`backtrace_each` and friends), as well as add a test that asserts
the size and contents of `rb_profile_frames`.
Fixes [Bug #18907] (<https://bugs.ruby-lang.org/issues/18907>)
Use ISEQ_BODY macro to get the rb_iseq_constant_body of the ISeq. Using
this macro will make it easier for us to change the allocation strategy
of rb_iseq_constant_body when using Variable Width Allocation.
All frames should be either iseq frames or cfunc frames. Use a
VM assert instead of a conditional to check for a cfunc frame if
the current frame is not an iseq frame.
Fixes this compiler warning:
warning: 'loc' may be used uninitialized in this function [-Wmaybe-uninitialized]
bt_yield_loc(loc - cfunc_counter, cfunc_counter, btobj);
This method takes a block and yields Thread::Backtrace::Location
objects to the block. It does not take arguments, and always
starts at the default frame that caller_locations would start at.
Implements [Feature #16663]
This fixes multiple bugs found in the partial backtrace
optimization added in 3b24b7914c.
These bugs occurs when passing a start argument to caller where
the start argument lands on a iseq frame without a pc.
Before this commit, the following code results in the same
line being printed twice, both for the #each method.
```ruby
def a; [1].group_by { b } end
def b; puts(caller(2, 1).first, caller(3, 1).first) end
a
```
After this commit and in Ruby 2.7, the lines are different,
with the first line being for each and the second for group_by.
Before this commit, the following code can either segfault or
result in an infinite loop:
```ruby
def foo
caller_locations(2, 1).inspect # segfault
caller_locations(2, 1)[0].path # infinite loop
end
1.times.map { 1.times.map { foo } }
```
After this commit, this code works correctly.
This commit completely refactors the backtrace handling.
Instead of processing the backtrace from the outermost
frame working in, process it from the innermost frame
working out. This is much faster for partial backtraces,
since you only access the control frames you need to in
order to construct the backtrace.
To handle cfunc frames in the new design, they start
out with no location information. We increment a counter
for each cfunc frame added. When an iseq frame with pc
is accessed, after adding the iseq backtrace location,
we use the location for the iseq backtrace location for
all of the directly preceding cfunc backtrace locations.
If the last backtrace line is a cfunc frame, we continue
scanning for iseq frames until the end control frame, and
use the location information from the first one for the
trailing cfunc frames in the backtrace.
As only rb_ec_partial_backtrace_object uses the new
backtrace implementation, remove all of the function
pointers and inline the functions. This makes the
process easier to understand.
Restore the Ruby 2.7 implementation of backtrace_each and
use it for all the other functions that called
backtrace_each other than rb_ec_partial_backtrace_object.
All other cases requested the entire backtrace, so there
is no advantage of using the new algorithm for those.
Additionally, there are implicit assumptions in the other
code that the backtrace processing works inward instead
of outward.
Remove the cfunc/iseq union in rb_backtrace_location_t,
and remove the prev_loc member for cfunc. Both cfunc and
iseq types can now have iseq and pc entries, so the
location information can be accessed the same way for each.
This avoids the need for a extra backtrace location entry
to store an iseq backtrace location if the final entry in
the backtrace is a cfunc. This is also what fixes the
segfault and infinite loop issues in the above bugs.
Here's Ruby pseudocode for the new algorithm, where start
and length are the arguments to caller or caller_locations:
```ruby
end_cf = VM.end_control_frame.next
cf = VM.start_control_frame
size = VM.num_control_frames - 2
bt = []
cfunc_counter = 0
if length.nil? || length > size
length = size
end
while cf != end_cf && bt.size != length
if cf.iseq?
if cf.instruction_pointer?
if start > 0
start -= 1
else
bt << cf.iseq_backtrace_entry
cf_counter.times do |i|
bt[-1 - i].loc = cf.loc
end
cfunc_counter = 0
end
end
elsif cf.cfunc?
if start > 0
start -= 1
else
bt << cf.cfunc_backtrace_entry
cfunc_counter += 1
end
end
cf = cf.prev
end
if cfunc_counter > 0
while cf != end_cf
if (cf.iseq? && cf.instruction_pointer?)
cf_counter.times do |i|
bt[-i].loc = cf.loc
end
end
cf = cf.prev
end
end
```
With the following benchmark, which uses a call depth of
around 100 (common in many Ruby applications):
```ruby
class T
def test(depth, &block)
if depth == 0
yield self
else
test(depth - 1, &block)
end
end
def array
Array.new
end
def first
caller_locations(1, 1)
end
def full
caller_locations
end
end
t = T.new
t.test((ARGV.first || 100).to_i) do
Benchmark.ips do |x|
x.report ('caller_loc(1, 1)') {t.first}
x.report ('caller_loc') {t.full}
x.report ('Array.new') {t.array}
x.compare!
end
end
```
Results before commit:
```
Calculating -------------------------------------
caller_loc(1, 1) 281.159k (_ 0.7%) i/s - 1.426M in 5.073055s
caller_loc 15.836k (_ 2.1%) i/s - 79.450k in 5.019426s
Array.new 1.852M (_ 2.5%) i/s - 9.296M in 5.022511s
Comparison:
Array.new: 1852297.5 i/s
caller_loc(1, 1): 281159.1 i/s - 6.59x (_ 0.00) slower
caller_loc: 15835.9 i/s - 116.97x (_ 0.00) slower
```
Results after commit:
```
Calculating -------------------------------------
caller_loc(1, 1) 562.286k (_ 0.8%) i/s - 2.858M in 5.083249s
caller_loc 16.402k (_ 1.0%) i/s - 83.200k in 5.072963s
Array.new 1.853M (_ 0.1%) i/s - 9.278M in 5.007523s
Comparison:
Array.new: 1852776.5 i/s
caller_loc(1, 1): 562285.6 i/s - 3.30x (_ 0.00) slower
caller_loc: 16402.3 i/s - 112.96x (_ 0.00) slower
```
This shows that the speed of caller_locations(1, 1) has roughly
doubled, and the speed of caller_locations with no arguments
has improved slightly. So this new algorithm is significant faster,
much simpler, and fixes bugs in the previous algorithm.
Fixes [Bug #18053]