The `rb_profile_frames` API did not skip the two dummy frames that
each thread has at its beginning. This was unlike `backtrace_each` and
`rb_ec_parcial_backtrace_object`, which do skip them.
This does not seem to be a problem for non-main thread frames,
because both `VM_FRAME_RUBYFRAME_P(cfp)` and
`rb_vm_frame_method_entry(cfp)` are NULL for them.
BUT, on the main thread `VM_FRAME_RUBYFRAME_P(cfp)` was true
and thus the dummy thread was still included in the output of
`rb_profile_frames`.
I've now made `rb_profile_frames` skip this extra frame (like
`backtrace_each` and friends), as well as add a test that asserts
the size and contents of `rb_profile_frames`.
Fixes [Bug #18907] (<https://bugs.ruby-lang.org/issues/18907>)
Use ISEQ_BODY macro to get the rb_iseq_constant_body of the ISeq. Using
this macro will make it easier for us to change the allocation strategy
of rb_iseq_constant_body when using Variable Width Allocation.
All frames should be either iseq frames or cfunc frames. Use a
VM assert instead of a conditional to check for a cfunc frame if
the current frame is not an iseq frame.
Fixes this compiler warning:
warning: 'loc' may be used uninitialized in this function [-Wmaybe-uninitialized]
bt_yield_loc(loc - cfunc_counter, cfunc_counter, btobj);
This method takes a block and yields Thread::Backtrace::Location
objects to the block. It does not take arguments, and always
starts at the default frame that caller_locations would start at.
Implements [Feature #16663]
This fixes multiple bugs found in the partial backtrace
optimization added in 3b24b7914c.
These bugs occurs when passing a start argument to caller where
the start argument lands on a iseq frame without a pc.
Before this commit, the following code results in the same
line being printed twice, both for the #each method.
```ruby
def a; [1].group_by { b } end
def b; puts(caller(2, 1).first, caller(3, 1).first) end
a
```
After this commit and in Ruby 2.7, the lines are different,
with the first line being for each and the second for group_by.
Before this commit, the following code can either segfault or
result in an infinite loop:
```ruby
def foo
caller_locations(2, 1).inspect # segfault
caller_locations(2, 1)[0].path # infinite loop
end
1.times.map { 1.times.map { foo } }
```
After this commit, this code works correctly.
This commit completely refactors the backtrace handling.
Instead of processing the backtrace from the outermost
frame working in, process it from the innermost frame
working out. This is much faster for partial backtraces,
since you only access the control frames you need to in
order to construct the backtrace.
To handle cfunc frames in the new design, they start
out with no location information. We increment a counter
for each cfunc frame added. When an iseq frame with pc
is accessed, after adding the iseq backtrace location,
we use the location for the iseq backtrace location for
all of the directly preceding cfunc backtrace locations.
If the last backtrace line is a cfunc frame, we continue
scanning for iseq frames until the end control frame, and
use the location information from the first one for the
trailing cfunc frames in the backtrace.
As only rb_ec_partial_backtrace_object uses the new
backtrace implementation, remove all of the function
pointers and inline the functions. This makes the
process easier to understand.
Restore the Ruby 2.7 implementation of backtrace_each and
use it for all the other functions that called
backtrace_each other than rb_ec_partial_backtrace_object.
All other cases requested the entire backtrace, so there
is no advantage of using the new algorithm for those.
Additionally, there are implicit assumptions in the other
code that the backtrace processing works inward instead
of outward.
Remove the cfunc/iseq union in rb_backtrace_location_t,
and remove the prev_loc member for cfunc. Both cfunc and
iseq types can now have iseq and pc entries, so the
location information can be accessed the same way for each.
This avoids the need for a extra backtrace location entry
to store an iseq backtrace location if the final entry in
the backtrace is a cfunc. This is also what fixes the
segfault and infinite loop issues in the above bugs.
Here's Ruby pseudocode for the new algorithm, where start
and length are the arguments to caller or caller_locations:
```ruby
end_cf = VM.end_control_frame.next
cf = VM.start_control_frame
size = VM.num_control_frames - 2
bt = []
cfunc_counter = 0
if length.nil? || length > size
length = size
end
while cf != end_cf && bt.size != length
if cf.iseq?
if cf.instruction_pointer?
if start > 0
start -= 1
else
bt << cf.iseq_backtrace_entry
cf_counter.times do |i|
bt[-1 - i].loc = cf.loc
end
cfunc_counter = 0
end
end
elsif cf.cfunc?
if start > 0
start -= 1
else
bt << cf.cfunc_backtrace_entry
cfunc_counter += 1
end
end
cf = cf.prev
end
if cfunc_counter > 0
while cf != end_cf
if (cf.iseq? && cf.instruction_pointer?)
cf_counter.times do |i|
bt[-i].loc = cf.loc
end
end
cf = cf.prev
end
end
```
With the following benchmark, which uses a call depth of
around 100 (common in many Ruby applications):
```ruby
class T
def test(depth, &block)
if depth == 0
yield self
else
test(depth - 1, &block)
end
end
def array
Array.new
end
def first
caller_locations(1, 1)
end
def full
caller_locations
end
end
t = T.new
t.test((ARGV.first || 100).to_i) do
Benchmark.ips do |x|
x.report ('caller_loc(1, 1)') {t.first}
x.report ('caller_loc') {t.full}
x.report ('Array.new') {t.array}
x.compare!
end
end
```
Results before commit:
```
Calculating -------------------------------------
caller_loc(1, 1) 281.159k (_ 0.7%) i/s - 1.426M in 5.073055s
caller_loc 15.836k (_ 2.1%) i/s - 79.450k in 5.019426s
Array.new 1.852M (_ 2.5%) i/s - 9.296M in 5.022511s
Comparison:
Array.new: 1852297.5 i/s
caller_loc(1, 1): 281159.1 i/s - 6.59x (_ 0.00) slower
caller_loc: 15835.9 i/s - 116.97x (_ 0.00) slower
```
Results after commit:
```
Calculating -------------------------------------
caller_loc(1, 1) 562.286k (_ 0.8%) i/s - 2.858M in 5.083249s
caller_loc 16.402k (_ 1.0%) i/s - 83.200k in 5.072963s
Array.new 1.853M (_ 0.1%) i/s - 9.278M in 5.007523s
Comparison:
Array.new: 1852776.5 i/s
caller_loc(1, 1): 562285.6 i/s - 3.30x (_ 0.00) slower
caller_loc: 16402.3 i/s - 112.96x (_ 0.00) slower
```
This shows that the speed of caller_locations(1, 1) has roughly
doubled, and the speed of caller_locations with no arguments
has improved slightly. So this new algorithm is significant faster,
much simpler, and fixes bugs in the previous algorithm.
Fixes [Bug #18053]
RubyVM::AST.of(Thread::Backtrace::Location) returns a node that
corresponds to the location. Typically, the node is a method call, but
not always.
This change also includes iseq's dump/load support of node_ids for each
instructions.
Previously Backtrace::Location had two possible states:
LOCATION_TYPE_ISEQ and LOCATION_TYPE_ISEQ_CALCED. The former had the
location information as PC, and the latter had it as lineno.
Once lineno was caluculated, the state was changed to
LOCATION_TYPE_ISEQ_CALCED and the caluculated result was kept.
This change removes LOCATION_TYPE_ISEQ_CALCED, so lineno is calculated
whenever it is needed. It will be slow a little, but lineno is typically
needed only when its backtrace is shown, so I believe that it does not
matter.
This is a preparation to add column information to Backtrace::Location
because PC is needed to caluculate node_id for AST::Node even after
lineno is calculated. This change is approved by ko1.
Previously, if there were ignored frames (iseq without pc), we could
go beyond the requested start frame. This has two changes:
1) Ensure that we don't look beyond the start frame by using
last_cfp = RUBY_VM_PREVIOUS_CONTROL_FRAME(last_cfp) until the
desired start frame is reached.
2) To fix the failures caused by change 1), which occur when a
limited number of frames is requested, scan the VM stack before
allocating backtrace frames, looking for ignored frames. This
is complicated if there are ignored frames before and after
the start, in which case we need to scan until the start frame,
and then scan backwards, decrementing the start value until we
get to the point where start will result in the number of
requested frames.
This fixes a Rails test failure. Jean Boussier was able to
to produce a failing test case outside of Rails.
Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
Previously, frames with iseq but no pc were skipped (even before
the refactoring in 3b24b7914c).
Because the entire backtrace was procesed before the refactoring,
this was handled by using later frames instead. However, after
the refactoring, we need to handle those frames or they get
lost.
Keep two iteration counters when iterating, one for the desired
backtrace size (so we generate the desired number of frames), and
one for the actual backtrace size (so we don't process off the end
of the stack). When skipping over an iseq frame with no pc,
decrement the counter for the desired backtrace, so it will
continue to process the expected number of backtrace frames.
Fixes [Bug #17581]
Previously, backtrace_each fully populated the rb_backtrace_t with all
backtrace frames, even if caller only requested a partial backtrace
(e.g. Kernel#caller_locations(1, 1)). This changes backtrace_each to
only add the requested frames to the rb_backtrace_t.
To do this, backtrace_each needs to be passed the starting frame and
number of frames values passed to Kernel#caller or #caller_locations.
backtrace_each works from the top of the stack to the bottom, where the
bottom is the current frame. Due to how the location for cfuncs is
tracked using the location of the previous iseq, we need to store an
extra frame for the previous iseq if we are limiting the backtrace and
final backtrace frame (the first one stored) would be a cfunc and not
an iseq.
To limit the amount of work in this case, while scanning until the start
of the requested backtrace, for each iseq, store the cfp. If the first
backtrace frame we care about is a cfunc, use the stored cfp to find the
related iseq. Use a function pointer to handle the storage of the cfp
in the iteration arg, and also store the location of the extra frame
in the iteration arg.
backtrace_each needs to return int instead of void in order to signal
when a starting frame larger than backtrace size is given, as caller
and caller_locations needs to return nil and not the empty array in
these cases.
To handle cases where a range is provided with a negative end, and the
backtrace size is needed to calculate the result to pass to
rb_range_beg_len, add a backtrace_size static function to calculate
the size, which copies the logic from backtrace_each.
As backtrace_each only adds the backtrace lines requested,
backtrace_to_*_ary can be simplified to always operate on the entire
backtrace.
Previously, caller_locations(1,1) was about 6.2 times slower for an
800 deep callstack compared to an empty callstack. With this new
approach, it is only 1.3 times slower. It will always be somewhat
slower as it still needs to scan the cfps from the top of the stack
until it finds the first requested backtrace frame.
This initializes the backtrace memory to zero. I do not think this is
necessary, as from my analysis, nothing during the setting of the
backtrace entries can cause a garbage collection, but it seems the
safest approach, and it's unlikely the performance decrease is
significant.
This removes the rb_backtrace_t backtrace_base member. backtrace
and backtrace_base were initialized to the same value, and neither
is modified, so it doesn't make sense to have two pointers.
This also removes LOCATION_TYPE_IFUNC from vm_backtrace.c, as
the value is never set.
Fixes [Bug #17031]
Previously, backtrace_each fully populated the rb_backtrace_t with all
backtrace frames, even if caller only requested a partial backtrace
(e.g. Kernel#caller_locations(1, 1)). This changes backtrace_each to
only add the requested frames to the rb_backtrace_t.
To do this, backtrace_each needs to be passed the starting frame and
number of frames values passed to Kernel#caller or #caller_locations.
backtrace_each works from the top of the stack to the bottom, where the
bottom is the current frame. Due to how the location for cfuncs is
tracked using the location of the previous iseq, we need to store an
extra frame for the previous iseq if we are limiting the backtrace and
final backtrace frame (the first one stored) would be a cfunc and not
an iseq.
To limit the amount of work in this case, while scanning until the start
of the requested backtrace, for each iseq, store the cfp. If the first
backtrace frame we care about is a cfunc, use the stored cfp to find the
related iseq. Use a function pointer to handle the storage of the cfp
in the iteration arg, and also store the location of the extra frame
in the iteration arg.
backtrace_each needs to return int instead of void in order to signal
when a starting frame larger than backtrace size is given, as caller
and caller_locations needs to return nil and not the empty array in
these cases.
To handle cases where a range is provided with a negative end, and the
backtrace size is needed to calculate the result to pass to
rb_range_beg_len, add a backtrace_size static function to calculate
the size, which copies the logic from backtrace_each.
As backtrace_each only adds the backtrace lines requested,
backtrace_to_*_ary can be simplified to always operate on the entire
backtrace.
Previously, caller_locations(1,1) was about 6.2 times slower for an
800 deep callstack compared to an empty callstack. With this new
approach, it is only 1.3 times slower. It will always be somewhat
slower as it still needs to scan the cfps from the top of the stack
until it finds the first requested backtrace frame.
Fixes [Bug #17031]
Right now `SomeClass.method` is properly named, but `SomeModule.method`
is displayed as `#<Module:0x000055eb5d95adc8>.method` which makes
profiling annoying.
Saves comitters' daily life by avoid #include-ing everything from
internal.h to make each file do so instead. This would significantly
speed up incremental builds.
We take the following inclusion order in this changeset:
1. "ruby/config.h", where _GNU_SOURCE is defined (must be the very
first thing among everything).
2. RUBY_EXTCONF_H if any.
3. Standard C headers, sorted alphabetically.
4. Other system headers, maybe guarded by #ifdef
5. Everything else, sorted alphabetically.
Exceptions are those win32-related headers, which tend not be self-
containing (headers have inclusion order dependencies).
This changeset basically replaces `ruby_xmalloc(x * y)` into
`ruby_xmalloc2(x, y)`. Some convenient functions are also
provided for instance `rb_xmalloc_mul_add(x, y, z)` which allocates
x * y + z byes.
This reverts commits: 10d6a3aca78ba48c1b85fba8627dc1dd883de5ba6c6a25feca167e6b48f17cb96d41a53207979278595b3c4fdd1521f7cf89c11c5e69accf336082033632a812c0f56506be0d86427a3219 .
The reason for the revert is that we observe ABA problem around
inline method cache. When a cache misshits, we search for a
method entry. And if the entry is identical to what was cached
before, we reuse the cache. But the commits we are reverting here
introduced situations where a method entry is freed, then the
identical memory region is used for another method entry. An
inline method cache cannot detect that ABA.
Here is a code that reproduce such situation:
```ruby
require 'prime'
class << Integer
alias org_sqrt sqrt
def sqrt(n)
raise
end
GC.stress = true
Prime.each(7*37){} rescue nil # <- Here we populate CC
class << Object.new; end
# These adjacent remove-then-alias maneuver
# frees a method entry, then immediately
# reuses it for another.
remove_method :sqrt
alias sqrt org_sqrt
end
Prime.each(7*37).to_a # <- SEGV
```
Now that we have eliminated most destructive operations over the
rb_method_entry_t / rb_callable_method_entry_t, let's make them
mostly immutabe and mark them const.
One exception is rb_export_method(), which destructively modifies
visibilities of method entries. I have left that operation as is
because I suspect that destructiveness is the nature of that
function.
We can check the function pointer passed to rb_define_global_function
like we do so in rb_define_method. It turns out that almost anybody
is misunderstanding the API.
lineno is an int, and INT2FIX(0) was assigned.
[Bug #15719] [ruby-core:91911]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67326 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* vm_backtrace.c (rb_debug_inspector_open): escape all env using
`rb_vm_stack_to_heap()` before making bindings.
[Bug #15105]
There is a complicated story of this issue:
Without this patch, IFUNC frame does not escaped. A IFUNC frame
points to CFUNC ep as previous ep. However, CFUNC ep can be escaped
because of making bindings of Ruby level frames.
IFUNC's ep can points to invalidated ep and `rb_iter_break()` will
fail. This is why `any?` fails.
* test/-ext-/debug/test_debug.rb: add a test.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64800 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This reverts commit r63265.
ko1 said I should not have committed this! I'm sorry!
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63267 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
rb_profile_frames was always behaving as if the value given for the
start parameter was 0.
The reason for this was that it would check if (start > 0) { then
continue without updating the control frame pointer or anything other
than decrementing start.
[ruby-core:86147] [Bug #14607]
Co-authored-by: Dylan Thacker-Smith <Dylan.Smith@shopify.com>
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63265 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
which has been developed by Takashi Kokubun <takashikkbn@gmail> as
YARV-MJIT. Many of its bugs are fixed by wanabe <s.wanabe@gmail.com>.
This JIT compiler is designed to be a safe migration path to introduce
JIT compiler to MRI. So this commit does not include any bytecode
changes or dynamic instruction modifications, which are done in original
MJIT.
This commit even strips off some aggressive optimizations from
YARV-MJIT, and thus it's slower than YARV-MJIT too. But it's still
fairly faster than Ruby 2.5 in some benchmarks (attached below).
Note that this JIT compiler passes `make test`, `make test-all`, `make
test-spec` without JIT, and even with JIT. Not only it's perfectly safe
with JIT disabled because it does not replace VM instructions unlike
MJIT, but also with JIT enabled it stably runs Ruby applications
including Rails applications.
I'm expecting this version as just "initial" JIT compiler. I have many
optimization ideas which are skipped for initial merging, and you may
easily replace this JIT compiler with a faster one by just replacing
mjit_compile.c. `mjit_compile` interface is designed for the purpose.
common.mk: update dependencies for mjit_compile.c.
internal.h: declare `rb_vm_insn_addr2insn` for MJIT.
vm.c: exclude some definitions if `-DMJIT_HEADER` is provided to
compiler. This avoids to include some functions which take a long time
to compile, e.g. vm_exec_core. Some of the purpose is achieved in
transform_mjit_header.rb (see `IGNORED_FUNCTIONS`) but others are
manually resolved for now. Load mjit_helper.h for MJIT header.
mjit_helper.h: New. This is a file used only by JIT-ed code. I'll
refactor `mjit_call_cfunc` later.
vm_eval.c: add some #ifdef switches to skip compiling some functions
like Init_vm_eval.
win32/mkexports.rb: export thread/ec functions, which are used by MJIT.
include/ruby/defines.h: add MJIT_FUNC_EXPORTED macro alis to clarify
that a function is exported only for MJIT.
array.c: export a function used by MJIT.
bignum.c: ditto.
class.c: ditto.
compile.c: ditto.
error.c: ditto.
gc.c: ditto.
hash.c: ditto.
iseq.c: ditto.
numeric.c: ditto.
object.c: ditto.
proc.c: ditto.
re.c: ditto.
st.c: ditto.
string.c: ditto.
thread.c: ditto.
variable.c: ditto.
vm_backtrace.c: ditto.
vm_insnhelper.c: ditto.
vm_method.c: ditto.
I would like to improve maintainability of function exports, but I
believe this way is acceptable as initial merging if we clarify the
new exports are for MJIT (so that we can use them as TODO list to fix)
and add unit tests to detect unresolved symbols.
I'll add unit tests of JIT compilations in succeeding commits.
Author: Takashi Kokubun <takashikkbn@gmail.com>
Contributor: wanabe <s.wanabe@gmail.com>
Part of [Feature #14235]
---
* Known issues
* Code generated by gcc is faster than clang. The benchmark may be worse
in macOS. Following benchmark result is provided by gcc w/ Linux.
* Performance is decreased when Google Chrome is running
* JIT can work on MinGW, but it doesn't improve performance at least
in short running benchmark.
* Currently it doesn't perform well with Rails. We'll try to fix this
before release.
---
* Benchmark reslts
Benchmarked with:
Intel 4.0GHz i7-4790K with 16GB memory under x86-64 Ubuntu 8 Cores
- 2.0.0-p0: Ruby 2.0.0-p0
- r62186: Ruby trunk (early 2.6.0), before MJIT changes
- JIT off: On this commit, but without `--jit` option
- JIT on: On this commit, and with `--jit` option
** Optcarrot fps
Benchmark: https://github.com/mame/optcarrot
| |2.0.0-p0 |r62186 |JIT off |JIT on |
|:--------|:--------|:--------|:--------|:--------|
|fps |37.32 |51.46 |51.31 |58.88 |
|vs 2.0.0 |1.00x |1.38x |1.37x |1.58x |
** MJIT benchmarks
Benchmark: https://github.com/benchmark-driver/mjit-benchmarks
(Original: https://github.com/vnmakarov/ruby/tree/rtl_mjit_branch/MJIT-benchmarks)
| |2.0.0-p0 |r62186 |JIT off |JIT on |
|:----------|:--------|:--------|:--------|:--------|
|aread |1.00 |1.09 |1.07 |2.19 |
|aref |1.00 |1.13 |1.11 |2.22 |
|aset |1.00 |1.50 |1.45 |2.64 |
|awrite |1.00 |1.17 |1.13 |2.20 |
|call |1.00 |1.29 |1.26 |2.02 |
|const2 |1.00 |1.10 |1.10 |2.19 |
|const |1.00 |1.11 |1.10 |2.19 |
|fannk |1.00 |1.04 |1.02 |1.00 |
|fib |1.00 |1.32 |1.31 |1.84 |
|ivread |1.00 |1.13 |1.12 |2.43 |
|ivwrite |1.00 |1.23 |1.21 |2.40 |
|mandelbrot |1.00 |1.13 |1.16 |1.28 |
|meteor |1.00 |2.97 |2.92 |3.17 |
|nbody |1.00 |1.17 |1.15 |1.49 |
|nest-ntimes|1.00 |1.22 |1.20 |1.39 |
|nest-while |1.00 |1.10 |1.10 |1.37 |
|norm |1.00 |1.18 |1.16 |1.24 |
|nsvb |1.00 |1.16 |1.16 |1.17 |
|red-black |1.00 |1.02 |0.99 |1.12 |
|sieve |1.00 |1.30 |1.28 |1.62 |
|trees |1.00 |1.14 |1.13 |1.19 |
|while |1.00 |1.12 |1.11 |2.41 |
** Discourse's script/bench.rb
Benchmark: https://github.com/discourse/discourse/blob/v1.8.7/script/bench.rb
NOTE: Rails performance was somehow a little degraded with JIT for now.
We should fix this.
(At least I know opt_aref is performing badly in JIT and I have an idea
to fix it. Please wait for the fix.)
*** JIT off
Your Results: (note for timings- percentile is first, duration is second in millisecs)
categories_admin:
50: 17
75: 18
90: 22
99: 29
home_admin:
50: 21
75: 21
90: 27
99: 40
topic_admin:
50: 17
75: 18
90: 22
99: 32
categories:
50: 35
75: 41
90: 43
99: 77
home:
50: 39
75: 46
90: 49
99: 95
topic:
50: 46
75: 52
90: 56
99: 101
*** JIT on
Your Results: (note for timings- percentile is first, duration is second in millisecs)
categories_admin:
50: 19
75: 21
90: 25
99: 33
home_admin:
50: 24
75: 26
90: 30
99: 35
topic_admin:
50: 19
75: 20
90: 25
99: 30
categories:
50: 40
75: 44
90: 48
99: 76
home:
50: 42
75: 48
90: 51
99: 89
topic:
50: 49
75: 55
90: 58
99: 99
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62197 b2dd03c8-39d4-4d8f-98ff-823fe69b080e