github/ruby - ruby

Граф коммитов

Автор	SHA1	Сообщение	Дата
Jeremy Evans	f6254f77f7	Speed up calling iseq bmethods Currently, bmethod arguments are copied from the VM stack to the C stack in vm_call_bmethod, then copied from the C stack to the VM stack later in invoke_iseq_block_from_c. This is inefficient. This adds vm_call_iseq_bmethod and vm_call_noniseq_bmethod. vm_call_iseq_bmethod is an optimized method that skips stack copies (though there is one copy to remove the receiver from the stack), and avoids calling vm_call_bmethod_body, rb_vm_invoke_bmethod, invoke_block_from_c_proc, invoke_iseq_block_from_c, and vm_yield_setup_args. Th vm_call_iseq_bmethod argument handling is similar to the way normal iseq methods are called, and allows for similar performance optimizations when using splats or keywords. However, even in the no argument case it's still significantly faster. A benchmark is added for bmethod calling. In my environment, it improves bmethod calling performance by 38-59% for simple bmethod calls, and up to 180% for bmethod calls passing literal keywords on both sides. ``` ./miniruby-iseq-bmethod: 18159792.6 i/s ./miniruby-m: 13174419.1 i/s - 1.38x slower bmethod_simple_1 ./miniruby-iseq-bmethod: 15890745.4 i/s ./miniruby-m: 10008972.7 i/s - 1.59x slower bmethod_simple_0_splat ./miniruby-iseq-bmethod: 13142804.3 i/s ./miniruby-m: 11168595.2 i/s - 1.18x slower bmethod_simple_1_splat ./miniruby-iseq-bmethod: 12375791.0 i/s ./miniruby-m: 8491140.1 i/s - 1.46x slower bmethod_no_splat ./miniruby-iseq-bmethod: 10151258.8 i/s ./miniruby-m: 8716664.1 i/s - 1.16x slower bmethod_0_splat ./miniruby-iseq-bmethod: 8138802.5 i/s ./miniruby-m: 7515600.2 i/s - 1.08x slower bmethod_1_splat ./miniruby-iseq-bmethod: 8028372.7 i/s ./miniruby-m: 5947658.6 i/s - 1.35x slower bmethod_10_splat ./miniruby-iseq-bmethod: 6953514.1 i/s ./miniruby-m: 4840132.9 i/s - 1.44x slower bmethod_100_splat ./miniruby-iseq-bmethod: 5287288.4 i/s ./miniruby-m: 2243218.4 i/s - 2.36x slower bmethod_kw ./miniruby-iseq-bmethod: 8931358.2 i/s ./miniruby-m: 3185818.6 i/s - 2.80x slower bmethod_no_kw ./miniruby-iseq-bmethod: 12281287.4 i/s ./miniruby-m: 10041727.9 i/s - 1.22x slower bmethod_kw_splat ./miniruby-iseq-bmethod: 5618956.8 i/s ./miniruby-m: 3657549.5 i/s - 1.54x slower ```	2023-04-25 08:06:16 -07:00
Takashi Kokubun	66c4dc1592	Remove MJIT-specific benchmarks	2023-03-06 22:36:57 -08:00
John Bampton	c43fbe4ebd	Fix spelling (#7405 )	2023-02-28 10:05:30 -08:00
Matt Valentine-House	2605615fe6	Benchmark String interpolation across size pools	2023-01-13 10:31:35 -05:00
Takashi Kokubun	509da028c2	Rewrite Kernel#loop in Ruby (#6983 ) * Rewrite Kernel#loop in Ruby * Use enum_for(:loop) { Float::INFINITY } Co-authored-by: Ufuk Kayserilioglu <ufuk@paralaus.com> * Limit the scope to rescue StopIteration Co-authored-by: Ufuk Kayserilioglu <ufuk@paralaus.com>	2022-12-25 21:46:29 -08:00
Nobuyoshi Nakada	8c272f4481	[Feature #18033 ] Make Time.new parse time strings `Time.new` now parses strings such as the result of `Time#inspect` and restricted ISO-8601 formats.	2022-12-16 22:52:59 +09:00
Daniel Colson	e69b91fae4	Introduce BOP_CMP for optimized comparison Prior to this commit the `OPTIMIZED_CMP` macro relied on a method lookup to determine whether `<=>` was overridden. The result of the lookup was cached, but only for the duration of the specific method that initialized the cmp_opt_data cache structure. With this method lookup, `[x,y].max` is slower than doing `x > y ? x : y` even though there's an optimized instruction for "new array max". (John noticed somebody a proposed micro-optimization based on this fact in https://github.com/mastodon/mastodon/pull/19903.) ```rb a, b = 1, 2 Benchmark.ips do \|bm\| bm.report('conditional') { a > b ? a : b } bm.report('method') { [a, b].max } bm.compare! end ``` Before: ``` Comparison: conditional: 22603733.2 i/s method: 19820412.7 i/s - 1.14x (± 0.00) slower ``` This commit replaces the method lookup with a new CMP basic op, which gives the examples above equivalent performance. After: ``` Comparison: method: 24022466.5 i/s conditional: 23851094.2 i/s - same-ish: difference falls within error ``` Relevant benchmarks show an improvement to Array#max and Array#min when not using the optimized newarray_max instruction as well. They are noticeably faster for small arrays with the relevant types, and the same or maybe a touch faster on larger arrays. ``` $ make benchmark COMPARE_RUBY=<master@5958c305> ITEM=array_min $ make benchmark COMPARE_RUBY=<master@5958c305> ITEM=array_max ``` The benchmarks added in this commit also look generally improved. Co-authored-by: John Hawthorn <jhawthorn@github.com>	2022-12-06 12:37:23 -08:00
Takashi Kokubun	d15d1c01c2	Rename --mjit-min-calls to --mjit-call-threshold (#6731 ) for consistency with YJIT	2022-11-14 23:38:52 -08:00
Takashi Kokubun	f276d5a7fe	Improve HTML escape benchmarks	2022-11-04 23:54:25 -07:00
S.H	c6f439a6a8	Improve performance some `Integer` and `Float` methods [Feature #19085 ] (#6638 ) * Improve some Integer and Float methods * Using alias and Remove unnecessary code * Remove commentout code	2022-10-27 09:13:16 -07:00
Samuel Williams	025b8701c0	Add several new methods for getting and setting buffer contents. (#6434 )	2022-09-26 18:06:12 +13:00
Jemma Issroff	b5c459d57a	Adds a benchmark to measure freezing objects	2022-09-22 10:29:43 -07:00
HParker	fbaac837cf	avoid extra dup and pop in compile_op_asgn2 Co-authored-by: John Hawthorn <jhawthorn@github.com>	2022-09-22 09:47:13 -07:00
Jemma Issroff	aecb57ceb0	Fix style on vm_ivar benchmarks (#6379 )	2022-09-15 09:39:39 +09:00
Jemma Issroff	513a11b477	Add vm_ivar get, get_unitialized, and lazy_set benchmarks	2022-09-14 13:50:47 -07:00
Jean Boussier	cd1724bdde	rb_str_concat_literals: use rb_str_buf_append That's about 1.30x faster.	2022-09-08 15:02:21 +02:00
John Hawthorn	679ef34586	New constant caching insn: opt_getconstant_path Previously YARV bytecode implemented constant caching by having a pair of instructions, opt_getinlinecache and opt_setinlinecache, wrapping a series of getconstant calls (with putobject providing supporting arguments). This commit replaces that pattern with a new instruction, opt_getconstant_path, handling both getting/setting the inline cache and fetching the constant on a cache miss. This is implemented by storing the full constant path as a null-terminated array of IDs inside of the IC structure. idNULL is used to signal an absolute constant reference. $ ./miniruby --dump=insns -e '::Foo::Bar::Baz' == disasm: #<ISeq:<main>@-e:1 (1,0)-(1,13)> (catch: FALSE) 0000 opt_getconstant_path <ic:0 ::Foo::Bar::Baz> ( 1)[Li] 0002 leave The motivation for this is that we had increasingly found the need to disassemble the instructions between the opt_getinlinecache and opt_setinlinecache in order to determine the constant we are fetching, or otherwise store metadata. This disassembly was done: * In opt_setinlinecache, to register the IC against the constant names it is using for granular invalidation. * In rb_iseq_free, to unregister the IC from the invalidation table. * In YJIT to find the position of a opt_getinlinecache instruction to invalidate it when the cache is populated * In YJIT to register the constant names being used for invalidation. With this change we no longe need disassemly for these (in fact rb_iseq_each is now unused), as the list of constant names being referenced is held in the IC. This should also make it possible to make more optimizations in the future. This may also reduce the size of iseqs, as previously each segment required 32 bytes (on 64-bit platforms) for each constant segment. This implementation only stores one ID per-segment. There should be no significant performance change between this and the previous implementation. Previously opt_getinlinecache was a "leaf" instruction, but it included a jump (almost always to a separate cache line). Now opt_getconstant_path is a non-leaf (it may raise/autoload/call const_missing) but it does not jump. These seem to even out.	2022-09-01 15:20:49 -07:00
Takashi Kokubun	9f3140a42e	Remove mjit_exec benchmarks Now that mjit_exec doesn't exist, those files feel old. I'll probably change how I benchmark it when I add benchmarks for it again.	2022-08-21 11:35:40 -07:00
Takashi Kokubun	a60507f616	Rename mjit_compile.c to mjit_compiler.c I'm planning to introduce mjit_compiler.rb, and I want to make this consistent with it. Consistency with compile.c doesn't seem important for MJIT anyway.	2022-08-21 11:33:06 -07:00
Takashi Kokubun	485019c2bd	Rename mjit_exec to jit_exec (#6262 ) * Rename mjit_exec to jit_exec * Rename mjit_exec_slowpath to mjit_check_iseq * Remove mjit_exec references from comments	2022-08-19 23:57:17 -07:00
Takashi Kokubun	fc4acf8cae	Make benchmark indentation consistent Related to https://github.com/Shopify/yjit-bench/pull/109	2022-08-19 14:44:08 -07:00
Jemma Issroff	b4539dba7a	Added vm setivar benchmark from yjit-bench	2022-08-17 10:26:28 -07:00
John Hawthorn	0608a9a086	Optimize Marshal dump/load for large (> 31-bit) FIXNUM (#6229 ) * Optimize Marshal dump of large fixnum Marshal's FIXNUM type only supports 31-bit fixnums, so on 64-bit platforms the 63-bit fixnums need to be represented in Marshal's BIGNUM. Previously this was done by converting to a bugnum and serializing the bignum object. This commit avoids allocating the intermediate bignum object, instead outputting the T_FIXNUM directly to a Marshal bignum. This maintains the same representation as the previous implementation, including not using LINKs for these large fixnums (an artifact of the previous implementation always allocating a new BIGNUM). This commit also avoids unnecessary st_lookups on immediate values, which we know will not be in that table. * Fastpath for loading FIXNUM from Marshal bignum * Run update-deps	2022-08-15 16:14:12 -07:00
Jeremy Evans	7922fd65e3	Update multiple assignment benchmarks to include non-literal array cases This allows them to show the effect of the previous newarray/expandarray to swap/opt_reverse optimization. This shows an 35-83% performance improvement in the four multiple assignment benchmarks that use this optimization.	2022-08-09 22:19:46 -07:00
Jean Boussier	1cb77f2304	Update IO::Buffer#get_value benchmark - The method was renamed from `get` to `get_value` - Comparing to `String#unpack` isn't quite equivalent, `unpack1` is closer. - Use frozen_string_literal to avoid allocating a format string every time. - Use `N` format which is equivalent to `:U32` (`uint_32_t` big-endian). - Disable experimental warnings to not mess up the output.	2022-08-08 15:15:33 +02:00
Jean Boussier	31a5586d1e	rb_str_buf_append: add a fast path for ENC_CODERANGE_VALID If the RHS has valid encoding, and both strings have the same encoding, we can use the fast path. However we need to update the LHS coderange. ``` compare-ruby: ruby 3.2.0dev (2022-07-21T14:46:32Z master `cdbb9b8555`) [arm64-darwin21] built-ruby: ruby 3.2.0dev (2022-07-25T07:25:41Z string-concat-vali.. 11a2772bdd) [arm64-darwin21] warming up... \| \|compare-ruby\|built-ruby\| \|:-------------------\|-----------:\|---------:\| \|binary_concat_7bit \| 554.816k\| 556.460k\| \| \| -\| 1.00x\| \|utf8_concat_7bit \| 556.367k\| 555.101k\| \| \| 1.00x\| -\| \|utf8_concat_UTF8 \| 412.555k\| 556.824k\| \| \| -\| 1.35x\| ```	2022-07-25 14:18:52 +02:00
Jean Boussier	f954c5dae4	string.c: use str_enc_fastpath in TERM_LEN Not having to fetch the rb_encoding save a significant amount of time. Additionally, even when we have to fetch it, we can do it faster using `ENCODING_GET` rather than `rb_enc_get`. ``` compare-ruby: ruby 3.2.0dev (2022-07-19T08:41:40Z master `cb9fd920a3`) [arm64-darwin21] built-ruby: ruby 3.2.0dev (2022-07-21T11:16:16Z faster-buffer-conc.. 4f001f0748) [arm64-darwin21] warming up... \| \|compare-ruby\|built-ruby\| \|:---------------------\|-----------:\|---------:\| \|binary_concat_utf8 \| 510.580k\| 565.600k\| \| \| -\| 1.11x\| \|binary_concat_binary \| 512.653k\| 571.483k\| \| \| -\| 1.11x\| \|utf8_concat_utf8 \| 511.396k\| 566.879k\| \| \| -\| 1.11x\| ```	2022-07-21 15:06:50 +02:00
Jean Boussier	0ae8dbbee0	rb_str_buf_append: fastpath to str_buf_cat If the LHS is ASCII compatible and the RHS is 7BIT we can directly concat without being concerned about anything else. Benchmark: ``` compare-ruby: ruby 3.2.0dev (2022-07-12T15:01:11Z master `71aec68566`) [arm64-darwin21] built-ruby: ruby 3.2.0dev (2022-07-13T10:13:53Z faster-buffer-conc.. a04c10476d) [arm64-darwin21] warming up... \| \|compare-ruby\|built-ruby\| \|:---------------------\|-----------:\|---------:\| \|binary_append_utf8 \| 385.315k\| 573.663k\| \| \| -\| 1.49x\| \|binary_append_binary \| 446.579k\| 574.898k\| \| \| -\| 1.29x\| \|utf8_append_utf8 \| 430.936k\| 573.394k\| \| \| -\| 1.33x\| ``` Note that in the benchmark, the RHS always have a precomputed coderange. So the benchmark never enter the slowpath of having to scan the RHS. However it's extremly likely that we'll end up scanning it anyway in rb_enc_cr_str_buf_cat	2022-07-19 10:41:40 +02:00
Jemma Issroff	f375280d5a	Add benchmarks for setting / getting ivars on generics	2022-07-15 13:39:02 -07:00
Jemma Issroff	c53439294e	Fixes ivar benchmarks to not depend on object allocation Prior to this change, we were measuring object allocation as well as setting instance variables within ivar benchmarks. With this change, we now only measure setting instance variables within ivar benchmarks.	2022-07-15 10:29:42 -04:00
Jean Boussier	906f7cb3e7	vm_opt_ltlt: call rb_str_buf_append directly if RHS is a String `rb_str_concat` does a lot of type checking we can easily bypass. ``` \| \|compare-ruby\|built-ruby\| \|:--------------\|-----------:\|---------:\| \|string_concat \| 362.007k\| 398.965k\| \| \| -\| 1.10x\| ```	2022-07-06 17:25:58 +02:00
Jemma Issroff	af425b6d66	Added vm_ivar benchmark for initializing an embedded obj	2022-06-16 08:47:19 -07:00
Takashi Kokubun	790825db44	Update the help message on /benchmark I wanted to point out there's --output=all.	2022-06-07 21:30:28 -07:00
Samuel Williams	216593f59b	Add IO write throughput/locking overhead benchmark.	2022-05-28 15:44:18 +12:00
Kevin Newton	6068da8937	Finer-grained constant cache invalidation (take 2) This commit reintroduces finer-grained constant cache invalidation. After `8008fb7` got merged, it was causing issues on token-threaded builds (such as on Windows). The issue was that when you're iterating through instruction sequences and using the translator functions to get back the instruction structs, you're either using `rb_vm_insn_null_translator` or `rb_vm_insn_addr2insn2` depending if it's a direct-threading build. `rb_vm_insn_addr2insn2` does some normalization to always return to you the non-trace version of whatever instruction you're looking at. `rb_vm_insn_null_translator` does not do that normalization. This means that when you're looping through the instructions if you're trying to do an opcode comparison, it can change depending on the type of threading that you're using. This can be very confusing. So, this commit creates a new translator function `rb_vm_insn_normalizing_translator` to always return the non-trace version so that opcode comparisons don't have to worry about different configurations. [Feature #18589]	2022-04-01 14:48:22 -04:00
Nobuyoshi Nakada	69967ee64e	Revert "Finer-grained inline constant cache invalidation" This reverts commits for [Feature #18589]: * `8008fb7352` "Update formatting per feedback" * `8f6eaca2e1` "Delete ID from constant cache table if it becomes empty on ISEQ free" * `629908586b` "Finer-grained inline constant cache invalidation" MSWin builds on AppVeyor have been crashing since the merger.	2022-03-25 20:29:09 +09:00
Kevin Newton	629908586b	Finer-grained inline constant cache invalidation Current behavior - caches depend on a global counter. All constant mutations cause caches to be invalidated. ```ruby class A B = 1 end def foo A::B # inline cache depends on global counter end foo # populate inline cache foo # hit inline cache C = 1 # global counter increments, all caches are invalidated foo # misses inline cache due to `C = 1` ``` Proposed behavior - caches depend on name components. Only constant mutations with corresponding names will invalidate the cache. ```ruby class A B = 1 end def foo A::B # inline cache depends constants named "A" and "B" end foo # populate inline cache foo # hit inline cache C = 1 # caches that depend on the name "C" are invalidated foo # hits inline cache because IC only depends on "A" and "B" ``` Examples of breaking the new cache: ```ruby module C # Breaks `foo` cache because "A" constant is set and the cache in foo depends # on "A" and "B" class A; end end B = 1 ``` We expect the new cache scheme to be invalidated less often because names aren't frequently reused. With the cache being invalidated less, we can rely on its stability more to keep our constant references fast and reduce the need to throw away generated code in YJIT.	2022-03-24 09:14:38 -07:00
John Hawthorn	b13a7c8e36	Constant time class to class ancestor lookup Previously when checking ancestors, we would walk all the way up the ancestry chain checking each parent for a matching class or module. I believe this was especially unfriendly to CPU cache since for each step we need to check two cache lines (the class and class ext). This check is used quite often in: * case statements * rescue statements * Calling protected methods * Class#is_a? * Module#=== * Module#<=> I believe it's most common to check a class against a parent class, to this commit aims to improve that (unfortunately does not help checking for an included Module). This is done by storing on each class the number and an array of all parent classes, in order (BasicObject is at index 0). Using this we can check whether a class is a subclass of another in constant time since we know the location to expect it in the hierarchy.	2022-02-23 19:57:42 -08:00
John Hawthorn	2f71f6bb82	Speed up and avoid kwarg hash alloc in Time.now Previously Time.now was switched to use Time.new as it added support for the in: argument. Unfortunately because Class#new is a cfunc this requires always allocating a Hash. This commit switches Time.now back to using a builtin time_s_now. This avoids the extra Hash allocation and is about 3x faster. $ benchmark-driver -e './ruby;3.1::~/.rubies/ruby-3.1.0/bin/ruby;3.0::~/.rubies/ruby-3.0.2/bin/ruby' benchmark/time_now.yml Warming up -------------------------------------- Time.now 6.704M i/s - 6.710M times in 1.000814s (149.16ns/i, 328clocks/i) Time.now(in: "+09:00") 2.003M i/s - 2.112M times in 1.054330s (499.31ns/i) Calculating ------------------------------------- ./ruby 3.1 3.0 Time.now 7.693M 2.763M 6.394M i/s - 20.113M times in 2.614428s 7.278710s 3.145572s Time.now(in: "+09:00") 2.030M 1.260M 1.617M i/s - 6.008M times in 2.960132s 4.769378s 3.716537s Comparison: Time.now ./ruby: 7693129.7 i/s 3.0: 6394109.2 i/s - 1.20x slower 3.1: 2763282.5 i/s - 2.78x slower Time.now(in: "+09:00") ./ruby: 2029757.4 i/s 3.0: 1616652.3 i/s - 1.26x slower 3.1: 1259776.2 i/s - 1.61x slower	2022-01-12 12:55:14 -08:00
Takashi Kokubun	1a63468831	Prepare for removing RubyVM::JIT (#5262 )	2021-12-13 23:07:46 -08:00
Jeremy Evans	b08dacfea3	Optimize dynamic string interpolation for symbol/true/false/nil/0-9 This provides a significant speedup for symbol, true, false, nil, and 0-9, class/module, and a small speedup in most other cases. Speedups (using included benchmarks): :symbol :: 60% 0-9 :: 50% Class/Module :: 50% nil/true/false :: 20% integer :: 10% [] :: 10% "" :: 3% One reason this approach is faster is it reduces the number of VM instructions for each interpolated value. Initial idea, approach, and benchmarks from Eric Wong. I applied the same approach against the master branch, updating it to handle the significant internal changes since this was first proposed 4 years ago (such as CALL_INFO/CALL_CACHE -> CALL_DATA). I also expanded it to optimize true/false/nil/0-9/class/module, and added handling of missing methods, refined methods, and RUBY_DEBUG. This renames the tostring insn to anytostring, and adds an objtostring insn that implements the optimization. This requires making a few functions non-static, and adding some non-static functions. This disables 4 YJIT tests. Those tests should be reenabled after YJIT optimizes the new objtostring insn. Implements [Feature #13715] Co-authored-by: Eric Wong <e@80x24.org> Co-authored-by: Alan Wu <XrXr@users.noreply.github.com> Co-authored-by: Yusuke Endoh <mame@ruby-lang.org> Co-authored-by: Koichi Sasada <ko1@atdot.net>	2021-11-18 15:10:20 -08:00
Takashi Kokubun	021255f1e7	Skip string allocation in benchmark/time_at.yml and also drop a weird newline from benchmark/array_sample.yml.	2021-11-14 23:25:25 -08:00
Koichi Sasada	f943264565	add benchmark/time_at.yml ``` ruby_2_6 ruby_2_7 ruby_3_0 master modified Time.at(0) 12.362M 11.015M 9.499M 6.615M 9.000M i/s - 32.115M times in 2.597946s 2.915517s 3.380725s 4.854651s 3.568234s Time.at(0, 500) 7.542M 7.136M 8.252M 5.707M 5.646M i/s - 20.713M times in 2.746279s 2.902556s 2.510166s 3.629644s 3.668854s Time.at(0, in: "+09:00") 1.426M 1.346M 1.565M 1.674M 1.667M i/s - 4.240M times in 2.974049s 3.149753s 2.709416s 2.533043s 2.542853s ``` ``` ruby_2_6: ruby 2.6.7p150 (2020-12-09 revision 67888) [x86_64-linux] ruby_2_7: ruby 2.7.3p140 (2020-12-09 revision `9b884df6dd`) [x86_64-linux] ruby_3_0: ruby 3.0.3p150 (2021-11-06 revision `6d540c1b98`) [x86_64-linux] master: ruby 3.1.0dev (2021-11-13T20:48:57Z master `fc456adc6a`) [x86_64-linux] modified: ruby 3.1.0dev (2021-11-15T01:12:51Z mandatory_only_bui.. b0228446db) [x86_64-linux] ```	2021-11-15 15:58:56 +09:00
Koichi Sasada	dde010c974	add benchmark/array_sample.yml ``` ruby_2_6 ruby_2_7 ruby_3_0 master modified ary.sample 32.113M 30.146M 11.162M 10.539M 26.620M i/s - 64.882M times in 2.020428s 2.152296s 5.812981s 6.156398s 2.437325s ary.sample(2) 9.420M 8.987M 7.500M 6.973M 7.191M i/s - 25.170M times in 2.672085s 2.800616s 3.355896s 3.609534s 3.500108s ``` ``` ruby_2_6: ruby 2.6.7p150 (2020-12-09 revision 67888) [x86_64-linux] ruby_2_7: ruby 2.7.3p140 (2020-12-09 revision `9b884df6dd`) [x86_64-linux] ruby_3_0: ruby 3.0.3p150 (2021-11-06 revision `6d540c1b98`) [x86_64-linux] master: ruby 3.1.0dev (2021-11-13T20:48:57Z master `fc456adc6a`) [x86_64-linux] modified: ruby 3.1.0dev (2021-11-15T01:12:51Z mandatory_only_bui.. b0228446db) [x86_64-linux] ```	2021-11-15 15:58:56 +09:00
Samuel Williams	4b89034218	IO::Buffer for scheduler interface.	2021-11-10 19:21:05 +13:00
Koichi Sasada	a7776077be	add vm_ivar_of_class_set benchmark for a class's ivar setter	2021-10-23 01:32:55 +09:00
Koichi Sasada	acb23454e5	allow to access ivars of classes/modules if an ivar of a class/module refer to a shareable object, this ivar can be read from non-main Ractors.	2021-10-23 01:32:55 +09:00
John Hawthorn	bb488a1a7f	Use faster any_hash logic in rb_hash From the documentation of rb_obj_hash: > Certain core classes such as Integer use built-in hash calculations and > do not call the #hash method when used as a hash key. So if you override, say, Integer#hash it won't be used from rb_hash_aref and similar. This avoids method lookups in many common cases. This commit uses the same optimization in rb_hash, a method used internally and in the C API to get the hash value of an object. Usually this is used to build the hash of an object based on its elements. Previously it would always do a method lookup for 'hash'. This is primarily intended to speed up hashing of Arrays and Hashes, which call rb_hash for each element. compare-ruby: ruby 3.0.1p64 (2021-04-05 revision `0fb782ee38`) [x86_64-linux] built-ruby: ruby 3.1.0dev (2021-09-29T02:13:24Z fast_hash d670bf88b2) [x86_64-linux] # Iteration per second (i/s) \| \|compare-ruby\|built-ruby\| \|:----------------\|-----------:\|---------:\| \|hash_aref_array \| 1.008\| 1.769\| \| \| -\| 1.76x\|	2021-09-30 13:06:53 -07:00
Nobuyoshi Nakada	11fd3fec53	Add benchmarks to create Time instances	2021-09-12 18:44:53 +09:00
Jeremy Evans	2d98593bf5	Support tracing of attr_reader and attr_writer In vm_call_method_each_type, check for c_call and c_return events before dispatching to vm_call_ivar and vm_call_attrset. With this approach, the call cache will still dispatch directly to those functions, so this change will only decrease performance for the first (uncached) call, and even then, the performance decrease is very minimal. This approach requires that we clear the call caches when tracing is enabled or disabled. The approach currently switches all vm_call_ivar and vm_call_attrset call caches to vm_call_general any time tracing is enabled or disabled. So it could theoretically result in a slowdown for code that constantly enables or disables tracing. This approach does not handle targeted tracepoints, but from my testing, c_call and c_return events are not supported for targeted tracepoints, so that shouldn't matter. This includes a benchmark showing the performance decrease is minimal if detectable at all. Fixes [Bug #16383] Fixes [Bug #10470] Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>	2021-08-29 07:23:39 -07:00
Nobuyoshi Nakada	9eae8cdefb	Prefer qualified names under Thread	2021-06-29 11:41:10 +09:00
eileencodes	b91b3bc771	Add a cache for class variables Redo of 34a2acdac788602c14bf05fb616215187badd504 and 931138b00696419945dc03e10f033b1f53cd50f3 which were reverted. GitHub PR #4340. This change implements a cache for class variables. Previously there was no cache for cvars. Cvar access is slow due to needing to travel all the way up th ancestor tree before returning the cvar value. The deeper the ancestor tree the slower cvar access will be. The benefits of the cache are more visible with a higher number of included modules due to the way Ruby looks up class variables. The benchmark here includes 26 modules and shows with the cache, this branch is 6.5x faster when accessing class variables. ``` compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master `9e5105c`) [x86_64-darwin19] built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be009) [x86_64-darwin19] \| \|compare-ruby\|built-ruby\| \|:--------\|-----------:\|---------:\| \|vm_cvar \| 5.681M\| 36.980M\| \| \| -\| 6.51x\| ``` Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails application. ActiveRecord::Base.logger has 71 ancestors. The more ancestors a tree has, the more clear the speed increase. IE if Base had only one ancestor we'd see no improvement. This benchmark is run on a vanilla Rails application. Benchmark code: ```ruby require "benchmark/ips" require_relative "config/environment" Benchmark.ips do \|x\| x.report "logger" do ActiveRecord::Base.logger end end ``` Ruby 3.0 master / Rails 6.1: ``` Warming up -------------------------------------- logger 155.251k i/100ms Calculating ------------------------------------- ``` Ruby 3.0 with cvar cache / Rails 6.1: ``` Warming up -------------------------------------- logger 1.546M i/100ms Calculating ------------------------------------- logger 14.857M (± 4.8%) i/s - 74.198M in 5.006202s ``` Lastly we ran a benchmark to demonstate the difference between master and our cache when the number of modules increases. This benchmark measures 1 ancestor, 30 ancestors, and 100 ancestors. Ruby 3.0 master: ``` Warming up -------------------------------------- 1 module 1.231M i/100ms 30 modules 432.020k i/100ms 100 modules 145.399k i/100ms Calculating ------------------------------------- 1 module 12.210M (± 2.1%) i/s - 61.553M in 5.043400s 30 modules 4.354M (± 2.7%) i/s - 22.033M in 5.063839s 100 modules 1.434M (± 2.9%) i/s - 7.270M in 5.072531s Comparison: 1 module: 12209958.3 i/s 30 modules: 4354217.8 i/s - 2.80x (± 0.00) slower 100 modules: 1434447.3 i/s - 8.51x (± 0.00) slower ``` Ruby 3.0 with cvar cache: ``` Warming up -------------------------------------- 1 module 1.641M i/100ms 30 modules 1.655M i/100ms 100 modules 1.620M i/100ms Calculating ------------------------------------- 1 module 16.279M (± 3.8%) i/s - 82.038M in 5.046923s 30 modules 15.891M (± 3.9%) i/s - 79.459M in 5.007958s 100 modules 16.087M (± 3.6%) i/s - 81.005M in 5.041931s Comparison: 1 module: 16279458.0 i/s 100 modules: 16087484.6 i/s - same-ish: difference falls within error 30 modules: 15891406.2 i/s - same-ish: difference falls within error ``` Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>	2021-06-18 10:02:44 -07:00
S.H	3208a5df2d	Improve perfomance for Integer#size method [Feature #17135 ] (#3476 ) * Improve perfomance for Integer#size method [Feature #17135] * re-run ci * Let MJIT frame skip work for Integer#size Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>	2021-06-04 21:57:21 -07:00
S.H	28b481938b	Implemented some NilClass method in Ruby code is faster [Feature #17054 ] (#3366 )	2021-06-02 20:04:56 -07:00
Alan Wu	5ada23ac12	compile.c: Emit send for === calls in when statements The checkmatch instruction with VM_CHECKMATCH_TYPE_CASE calls === without a call cache. Emit a send instruction to make the call instead. It includes a call cache. The call cache improves throughput of using when statements to check the class of a given object. This is useful for say, JSON serialization. Use of a regular send instead of checkmatch also avoids taking the VM lock every time, which is good for multi-ractor workloads. Calculating ------------------------------------- master post vm_case_classes 11.013M 16.172M i/s - 6.000M times in 0.544795s 0.371009s vm_case_lit 2.296 2.263 i/s - 1.000 times in 0.435606s 0.441826s vm_case 74.098M 64.338M i/s - 6.000M times in 0.080974s 0.093257s Comparison: vm_case_classes post: 16172114.4 i/s master: 11013316.9 i/s - 1.47x slower vm_case_lit master: 2.3 i/s post: 2.3 i/s - 1.01x slower vm_case master: 74097858.6 i/s post: 64338333.9 i/s - 1.15x slower The vm_case benchmark is a bit slower post patch, possibily due to the larger instruction sequence. The benchmark dispatches using opt_case_dispatch so was not running checkmatch and does not make the === call post patch.	2021-05-28 12:34:03 -04:00
Aaron Patterson	07f055bb13	Revert "Filling cache values on cvar write" This reverts commit `08de37f9fa`. This reverts commit `e8ae922b62`.	2021-05-11 13:31:00 -07:00
eileencodes	e8ae922b62	Add a cache for class variables This change implements a cache for class variables. Previously there was no cache for cvars. Cvar access is slow due to needing to travel all the way up th ancestor tree before returning the cvar value. The deeper the ancestor tree the slower cvar access will be. The benefits of the cache are more visible with a higher number of included modules due to the way Ruby looks up class variables. The benchmark here includes 26 modules and shows with the cache, this branch is 6.5x faster when accessing class variables. ``` compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master `9e5105ca45`) [x86_64-darwin19] built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be0093ae) [x86_64-darwin19] \| \|compare-ruby\|built-ruby\| \|:--------\|-----------:\|---------:\| \|vm_cvar \| 5.681M\| 36.980M\| \| \| -\| 6.51x\| ``` Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails application. ActiveRecord::Base.logger has 71 ancestors. The more ancestors a tree has, the more clear the speed increase. IE if Base had only one ancestor we'd see no improvement. This benchmark is run on a vanilla Rails application. Benchmark code: ```ruby require "benchmark/ips" require_relative "config/environment" Benchmark.ips do \|x\| x.report "logger" do ActiveRecord::Base.logger end end ``` Ruby 3.0 master / Rails 6.1: ``` Warming up -------------------------------------- logger 155.251k i/100ms Calculating ------------------------------------- ``` Ruby 3.0 with cvar cache / Rails 6.1: ``` Warming up -------------------------------------- logger 1.546M i/100ms Calculating ------------------------------------- logger 14.857M (± 4.8%) i/s - 74.198M in 5.006202s ``` Lastly we ran a benchmark to demonstate the difference between master and our cache when the number of modules increases. This benchmark measures 1 ancestor, 30 ancestors, and 100 ancestors. Ruby 3.0 master: ``` Warming up -------------------------------------- 1 module 1.231M i/100ms 30 modules 432.020k i/100ms 100 modules 145.399k i/100ms Calculating ------------------------------------- 1 module 12.210M (± 2.1%) i/s - 61.553M in 5.043400s 30 modules 4.354M (± 2.7%) i/s - 22.033M in 5.063839s 100 modules 1.434M (± 2.9%) i/s - 7.270M in 5.072531s Comparison: 1 module: 12209958.3 i/s 30 modules: 4354217.8 i/s - 2.80x (± 0.00) slower 100 modules: 1434447.3 i/s - 8.51x (± 0.00) slower ``` Ruby 3.0 with cvar cache: ``` Warming up -------------------------------------- 1 module 1.641M i/100ms 30 modules 1.655M i/100ms 100 modules 1.620M i/100ms Calculating ------------------------------------- 1 module 16.279M (± 3.8%) i/s - 82.038M in 5.046923s 30 modules 15.891M (± 3.9%) i/s - 79.459M in 5.007958s 100 modules 16.087M (± 3.6%) i/s - 81.005M in 5.041931s Comparison: 1 module: 16279458.0 i/s 100 modules: 16087484.6 i/s - same-ish: difference falls within error 30 modules: 15891406.2 i/s - same-ish: difference falls within error ``` Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>	2021-05-11 12:04:27 -07:00
Aaron Patterson	9a6226c61e	Eagerly allocate instance variable tables along with object This allows us to allocate the right size for the object in advance, meaning that we don't have to pay the cost of ivar table extension later. The idea is that if an object type ever became "extended" at some point, then it is very likely it will become extended again. So we may as well allocate the ivar table up front.	2021-05-03 14:11:48 -07:00
wonda-tea-coffee	cc5bab80e4	[Doc] Fix a typo s/visilibity/visibility/	2021-04-25 19:46:37 +12:00
Jeremy Evans	50c54d40a8	Evaluate multiple assignment left hand side before right hand side In regular assignment, Ruby evaluates the left hand side before the right hand side. For example: ```ruby foo[0] = bar ``` Calls `foo`, then `bar`, then `[]=` on the result of `foo`. Previously, multiple assignment didn't work this way. If you did: ```ruby abc.def, foo[0] = bar, baz ``` Ruby would previously call `bar`, then `baz`, then `abc`, then `def=` on the result of `abc`, then `foo`, then `[]=` on the result of `foo`. This change makes multiple assignment similar to single assignment, changing the evaluation order of the above multiple assignment code to calling `abc`, then `foo`, then `bar`, then `baz`, then `def=` on the result of `abc`, then `[]=` on the result of `foo`. Implementing this is challenging with the stack-based virtual machine. We need to keep track of all of the left hand side attribute setter receivers and setter arguments, and then keep track of the stack level while handling the assignment processing, so we can issue the appropriate topn instructions to get the receiver. Here's an example of how the multiple assignment is executed, showing the stack and instructions: ``` self # putself abc # send abc, self # putself abc, foo # send abc, foo, 0 # putobject 0 abc, foo, 0, [bar, baz] # evaluate RHS abc, foo, 0, [bar, baz], baz, bar # expandarray abc, foo, 0, [bar, baz], baz, bar, abc # topn 5 abc, foo, 0, [bar, baz], baz, abc, bar # swap abc, foo, 0, [bar, baz], baz, def= # send abc, foo, 0, [bar, baz], baz # pop abc, foo, 0, [bar, baz], baz, foo # topn 3 abc, foo, 0, [bar, baz], baz, foo, 0 # topn 3 abc, foo, 0, [bar, baz], baz, foo, 0, baz # topn 2 abc, foo, 0, [bar, baz], baz, []= # send abc, foo, 0, [bar, baz], baz # pop abc, foo, 0, [bar, baz] # pop [bar, baz], foo, 0, [bar, baz] # setn 3 [bar, baz], foo, 0 # pop [bar, baz], foo # pop [bar, baz] # pop ``` As multiple assignment must deal with splats, post args, and any level of nesting, it gets quite a bit more complex than this in non-trivial cases. To handle this, struct masgn_state is added to keep track of the overall state of the mass assignment, which stores a linked list of struct masgn_attrasgn, one for each assigned attribute. This adds a new optimization that replaces a topn 1/pop instruction combination with a single swap instruction for multiple assignment to non-aref attributes. This new approach isn't compatible with one of the optimizations previously used, in the case where the multiple assignment return value was not needed, there was no lhs splat, and one of the left hand side used an attribute setter. This removes that optimization. Removing the optimization allowed for removing the POP_ELEMENT and adjust_stack functions. This adds a benchmark to measure how much slower multiple assignment is with the correct evaluation order. This benchmark shows: * 4-9% decrease for attribute sets * 14-23% decrease for array member sets * Basically same speed for local variable sets Importantly, it shows no significant difference between the popped (where return value of the multiple assignment is not needed) and !popped (where return value of the multiple assignment is needed) cases for attribute and array member sets. This indicates the previous optimization, which was dropped in the evaluation order fix and only affected the popped case, is not important to performance. Fixes [Bug #4443]	2021-04-21 10:49:19 -07:00
tompng (tomoya ishida)	9f9045123e	st.c: skip all deleted entries [Bug #17779 ] Update the start entry skipping all already deleted entries. Fixes performance issue of `Hash#first` in a certain case.	2021-04-11 19:05:26 +09:00
Nobuyoshi Nakada	382d3a4516	Improve Enumerable#tally performance Iteration per second (i/s) \| \|compare-ruby\|built-ruby\| \|:------\|-----------:\|---------:\| \|tally \| 52.814\| 114.936\| \| \| -\| 2.18x\|	2021-03-16 23:06:41 +09:00
Jean Boussier	1041bff3b2	Add a benchmark for RubyVM::InstructionSequence.load_from_binary	2021-03-10 13:44:07 -08:00
Jean Boussier	a03653d386	proc.c: make bind_call use existing callable method entry when possible The most common use case for `bind_call` is to protect from core methods being redefined, for instance a typical use: ```ruby UNBOUND_METHOD_MODULE_NAME = Module.instance_method(:name) def real_mod_name(mod) UNBOUND_METHOD_MODULE_NAME.bind_call(mod) end ``` But it's extremely common that the method wasn't actually redefined. In such case we can avoid creating a new callable method entry, and simply delegate to the receiver. This result in a 1.5-2X speed-up for the fast path, and little to no impact on the slowpath: ``` compare-ruby: ruby 3.1.0dev (2021-02-05T06:33:00Z master `b2674c1fd7`) [x86_64-darwin19] built-ruby: ruby 3.1.0dev (2021-02-15T10:35:17Z bind-call-fastpath d687e06615) [x86_64-darwin19] \| \|compare-ruby\|built-ruby\| \|:---------\|-----------:\|---------:\| \|fastpath \| 11.325M\| 16.393M\| \| \| -\| 1.45x\| \|slowpath \| 10.488M\| 10.242M\| \| \| 1.02x\| -\| ```	2021-03-10 13:43:22 -08:00
S.H	efd19badf4	Improve performance some Numeric methods [Feature #17632 ] (#4190 )	2021-02-19 11:11:19 -08:00
Takashi Kokubun	a0216b1acf	Do not run File.write while Ractors are running also make sure all local variables have the __bmdv_ prefix.	2021-02-11 00:25:46 -08:00
Takashi Kokubun	27382eb9fc	Add a benchmark-driver runner for Ractor (#4172 ) * Add a benchmark-driver runner for Ractor * Process.clock_gettime(Process:CLOCK_MONOTONIC) could be slow in Ruby 3.0 Ractor * Fetching Time could also be slow * Fix a comment * Assert overriding a private method	2021-02-10 21:24:25 -08:00
Nobuyoshi Nakada	ad2c7f8a1e	Simple benchmark of Float#to_s	2021-02-10 19:42:00 +09:00
S.H	fad7908a5d	Improve performance Float#positive? and Float#negative? [Feature #17614 ] (#4160 )	2021-02-08 20:29:42 -08:00
Takashi Kokubun	e1fee7f949	Rename RubyVM::MJIT to RubyVM::JIT because the name "MJIT" is an internal code name, it's inconsistent with --jit while they are related to each other, and I want to discourage future JIT implementation-specific (e.g. MJIT-specific) APIs by this rename. [Feature #17490]	2021-01-13 22:46:51 -08:00
S.H	daec5f9edc	Improve performance some Float methods [Feature #17498 ] (#4018 )	2021-01-01 18:39:07 -08:00
Takashi Kokubun	dbb4f19969	Allow inlining Integer#-@ and #~ ``` $ benchmark-driver -v --rbenv 'before --jit;after --jit' benchmark/mjit_integer.yml --filter '(comp\|uminus)' before --jit: ruby 3.0.0dev (2020-12-23T05:41:44Z master `0dd4896175`) +JIT [x86_64-linux] after --jit: ruby 3.0.0dev (2020-12-23T06:25:41Z master 8887d78992) +JIT [x86_64-linux] last_commit=Allow inlining Integer#-@ and #~ Calculating ------------------------------------- before --jit after --jit mjit_comp(1) 44.006M 70.417M i/s - 40.000M times in 0.908967s 0.568042s mjit_uminus(1) 44.333M 68.422M i/s - 40.000M times in 0.902255s 0.584603s Comparison: mjit_comp(1) after --jit: 70417331.4 i/s before --jit: 44005980.4 i/s - 1.60x slower mjit_uminus(1) after --jit: 68422468.8 i/s before --jit: 44333371.0 i/s - 1.54x slower ```	2020-12-22 22:32:19 -08:00
Koichi Sasada	dff67c809f	fix duplicated name	2020-12-16 10:29:48 +09:00
Benoit Daloze	b4ec4a41c2	Guard all accesses to RubyVM::MJIT with defined?(RubyVM::MJIT) && * Otherwise those tests, etc cannot run on alternative Ruby implementations.	2020-12-04 16:45:54 +01:00
Alan Wu	68ffc8db08	Set allocator on class creation Allocating an instance of a class uses the allocator for the class. When the class has no allocator set, Ruby looks for it in the super class (see rb_get_alloc_func()). It's uncommon for classes created from Ruby code to ever have an allocator set, so it's common during the allocation process to search all the way to BasicObject from the class with which the allocation is being performed. This makes creating instances of classes that have long ancestry chains more expensive than creating instances of classes have that shorter ancestry chains. Setting the allocator at class creation time removes the need to perform a search for the alloctor during allocation. This is a breaking change for C-extensions that assume that classes created from Ruby code have no allocator set. Libraries that setup a class hierarchy in Ruby code and then set the allocator on some parent class, for example, can experience breakage. This seems like an unusual use case and hopefully it is rare or non-existent in practice. Rails has many classes that have upwards of 60 elements in the ancestry chain and benchmark shows a significant improvement for allocating with a class that includes 64 modules. ``` pre: ruby 3.0.0dev (2020-11-12T14:39:27Z master `6325866421`) post: ruby 3.0.0dev (2020-11-12T20:15:30Z cut-allocator-lookup) Comparison: allocate_8_deep post: 10336985.6 i/s pre: 8691873.1 i/s - 1.19x slower allocate_32_deep post: 10423181.2 i/s pre: 6264879.1 i/s - 1.66x slower allocate_64_deep post: 10541851.2 i/s pre: 4936321.5 i/s - 2.14x slower allocate_128_deep post: 10451505.0 i/s pre: 3031313.5 i/s - 3.45x slower ```	2020-11-16 17:41:40 -05:00
Aaron Patterson	d7581370fd	Add a benchmark for polymorphic ivar setting This benchmark demonstrates the performance of setting an instance variable when the type of object is constantly changing. This benchmark should give us an idea of the performance of ivar setting in a polymorphic environment	2020-11-09 14:05:41 -08:00
Aaron Patterson	eb229994e5	eagerly initialize ivar table when index is small enough When the inline cache is written, the iv table will contain an entry for the instance variable. If we get an inline cache hit, then we know the iv table must contain a value for the index written to the inline cache. If the index in the inline cache is larger than the list on the object, but smaller than the iv index table on the class, then we can just eagerly allocate the iv list to be the same size as the iv index table. This avoids duplicate work of checking frozen as well as looking up the index for the particular instance variable name.	2020-11-09 09:44:16 -08:00
Nobuyoshi Nakada	8f9c113f35	Added benchmark of vm_send by variable [ci skip]	2020-10-28 09:47:46 +09:00
eileencodes	637d1cc0c0	Improve the performance of super This PR improves the performance of `super` calls. While working on some Rails optimizations jhawthorn discovered that `super` calls were slower than expected. The changes here do the following: 1) Adds a check for whether the call frame is not equal to the method entry iseq. This avoids the `rb_obj_is_kind_of` check on the next line which is quite slow. If the current call frame is equal to the method entry we know we can't have an instance eval, etc. 2) Changes `FL_TEST` to `FL_TEST_RAW`. This is safe because we've already done the check for `T_ICLASS` above. 3) Adds a benchmark for `T_ICLASS` super calls. 4) Note: makes a chage for `method_entry_cref` to use `const`. On master the benchmarks showed that `super` is 1.76x slower. Our changes improved the performance so that it is now only 1.36x slower. Benchmark IPS: ``` Warming up -------------------------------------- super 244.918k i/100ms method call 383.007k i/100ms Calculating ------------------------------------- super 2.280M (± 6.7%) i/s - 11.511M in 5.071758s method call 3.834M (± 4.9%) i/s - 19.150M in 5.008444s Comparison: method call: 3833648.3 i/s super: 2279837.9 i/s - 1.68x (± 0.00) slower ``` With changes: ``` Warming up -------------------------------------- super 308.777k i/100ms method call 375.051k i/100ms Calculating ------------------------------------- super 2.951M (± 5.4%) i/s - 14.821M in 5.039592s method call 3.551M (± 4.9%) i/s - 18.002M in 5.081695s Comparison: method call: 3551372.7 i/s super: 2950557.9 i/s - 1.20x (± 0.00) slower ``` Ruby VM benchmarks also showed an improvement: Existing `vm_super` benchmark`. ``` $ make benchmark ITEM=vm_super \| \|compare-ruby\|built-ruby\| \|:---------\|-----------:\|---------:\| \|vm_super \| 21.555M\| 37.819M\| \| \| -\| 1.75x\| ``` New `vm_iclass_super` benchmark: ``` $ make benchmark ITEM=vm_iclass_super \| \|compare-ruby\|built-ruby\| \|:----------------\|-----------:\|---------:\| \|vm_iclass_super \| 1.669M\| 3.683M\| \| \| -\| 2.21x\| ``` This is the benchmark script used for the benchmark-ips benchmarks: ```ruby require "benchmark/ips" class Foo def zuper; end def top; end last_method = "top" ("A".."M").each do \|module_name\| eval <<-EOM module #{module_name} def zuper; super; end def #{module_name.downcase} #{last_method} end end prepend #{module_name} EOM last_method = module_name.downcase end end foo = Foo.new Benchmark.ips do \|x\| x.report "super" do foo.zuper end x.report "method call" do foo.m end x.compare! end ``` Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org> Co-authored-by: John Hawthorn <john@hawthorn.email>	2020-09-23 11:52:36 -07:00
Jean Boussier	5001cc4716	Optimize ObjectSpace.dump_all The two main optimization are: - buffer writes for improved performance - avoid formatting functions when possible ``` \| \|compare-ruby\|built-ruby\| \|:------------------\|-----------:\|---------:\| \|dump_all_string \| 1.038\| 195.925\| \| \| -\| 188.77x\| \|dump_all_file \| 33.453\| 139.645\| \| \| -\| 4.17x\| \|dump_all_dev_null \| 44.030\| 278.552\| \| \| -\| 6.33x\| ```	2020-09-09 11:11:36 -07:00
Nobuyoshi Nakada	54acb3dd52	Improved Enumerable::Lazy#zip \| \|compare-ruby\|built-ruby\| \|:-------------------\|-----------:\|---------:\| \|first_ary \| 290.514k\| 296.331k\| \| \| -\| 1.02x\| \|first_nonary \| 166.954k\| 169.178k\| \| \| -\| 1.01x\| \|first_noarg \| 299.547k\| 305.358k\| \| \| -\| 1.02x\| \|take3_ary \| 129.388k\| 188.360k\| \| \| -\| 1.46x\| \|take3_nonary \| 90.684k\| 112.688k\| \| \| -\| 1.24x\| \|take3_noarg \| 131.940k\| 189.471k\| \| \| -\| 1.44x\| \|chain-first_ary \| 195.913k\| 286.194k\| \| \| -\| 1.46x\| \|chain-first_nonary \| 127.483k\| 168.716k\| \| \| -\| 1.32x\| \|chain-first_noarg \| 201.252k\| 298.562k\| \| \| -\| 1.48x\| \|chain-take3_ary \| 101.189k\| 183.188k\| \| \| -\| 1.81x\| \|chain-take3_nonary \| 75.381k\| 112.301k\| \| \| -\| 1.49x\| \|chain-take3_noarg \| 101.483k\| 192.148k\| \| \| -\| 1.89x\| \|block \| 296.696k\| 292.877k\| \| \| 1.01x\| -\|	2020-07-23 16:57:26 +09:00
Nobuyoshi Nakada	6b3cff12f6	Improved Enumerable::Lazy#flat_map \| \|compare-ruby\|built-ruby\| \|:-------\|-----------:\|---------:\| \|num3 \| 96.333k\| 160.732k\| \| \| -\| 1.67x\| \|num10 \| 96.615k\| 159.150k\| \| \| -\| 1.65x\| \|ary2 \| 103.836k\| 172.787k\| \| \| -\| 1.66x\| \|ary10 \| 109.249k\| 177.252k\| \| \| -\| 1.62x\| \|ary20 \| 106.628k\| 177.371k\| \| \| -\| 1.66x\| \|ary50 \| 107.135k\| 162.282k\| \| \| -\| 1.51x\| \|ary100 \| 106.513k\| 177.626k\| \| \| -\| 1.67x\|	2020-07-23 16:57:26 +09:00
Kenta Murata	b4e784434c	Optimize Array#min (#3324 ) The benchmark result is below: \| \|compare-ruby\|built-ruby\| \|:---------------\|-----------:\|---------:\| \|ary2.min \| 39.105M\| 39.442M\| \| \| -\| 1.01x\| \|ary10.min \| 23.995M\| 30.762M\| \| \| -\| 1.28x\| \|ary100.min \| 6.249M\| 10.783M\| \| \| -\| 1.73x\| \|ary500.min \| 1.408M\| 2.714M\| \| \| -\| 1.93x\| \|ary1000.min \| 828.397k\| 1.465M\| \| \| -\| 1.77x\| \|ary2000.min \| 332.256k\| 570.504k\| \| \| -\| 1.72x\| \|ary3000.min \| 338.079k\| 573.868k\| \| \| -\| 1.70x\| \|ary5000.min \| 168.217k\| 286.114k\| \| \| -\| 1.70x\| \|ary10000.min \| 85.512k\| 143.551k\| \| \| -\| 1.68x\| \|ary20000.min \| 43.264k\| 71.935k\| \| \| -\| 1.66x\| \|ary50000.min \| 17.317k\| 29.107k\| \| \| -\| 1.68x\| \|ary100000.min \| 9.072k\| 14.540k\| \| \| -\| 1.60x\| \|ary1000000.min \| 872.930\| 1.436k\| \| \| -\| 1.64x\| compare-ruby is `9f4b7fc82e`.	2020-07-18 23:45:25 +09:00
Kenta Murata	a63f520971	Optimize Array#max (#3325 ) The benchmark result is below: \| \|compare-ruby\|built-ruby\| \|:---------------\|-----------:\|---------:\| \|ary2.max \| 38.837M\| 40.830M\| \| \| -\| 1.05x\| \|ary10.max \| 23.035M\| 32.626M\| \| \| -\| 1.42x\| \|ary100.max \| 5.490M\| 11.020M\| \| \| -\| 2.01x\| \|ary500.max \| 1.324M\| 2.679M\| \| \| -\| 2.02x\| \|ary1000.max \| 699.167k\| 1.403M\| \| \| -\| 2.01x\| \|ary2000.max \| 284.321k\| 570.446k\| \| \| -\| 2.01x\| \|ary3000.max \| 282.613k\| 571.683k\| \| \| -\| 2.02x\| \|ary5000.max \| 145.120k\| 285.546k\| \| \| -\| 1.97x\| \|ary10000.max \| 72.102k\| 142.831k\| \| \| -\| 1.98x\| \|ary20000.max \| 36.065k\| 72.077k\| \| \| -\| 2.00x\| \|ary50000.max \| 14.343k\| 29.139k\| \| \| -\| 2.03x\| \|ary100000.max \| 7.586k\| 14.472k\| \| \| -\| 1.91x\| \|ary1000000.max \| 726.915\| 1.495k\| \| \| -\| 2.06x\|	2020-07-18 23:45:00 +09:00
Takashi Kokubun	167d139487	Inline builtin struct aref We don't do this for aset because it might raise a FrozenError. ``` $ benchmark-driver -v --rbenv 'before;after;before --jit;after --jit' benchmark/mjit_struct_aref.yml --repeat-count=4 before: ruby 2.8.0dev (2020-07-06T01:47:11Z master `d94ef7c6b6`) [x86_64-linux] after: ruby 2.8.0dev (2020-07-06T07:11:51Z master 85425168f4) [x86_64-linux] last_commit=Inline builtin struct aref before --jit: ruby 2.8.0dev (2020-07-06T01:47:11Z master `d94ef7c6b6`) +JIT [x86_64-linux] after --jit: ruby 2.8.0dev (2020-07-06T07:11:51Z master 85425168f4) +JIT [x86_64-linux] last_commit=Inline builtin struct aref Calculating ------------------------------------- before after before --jit after --jit mjit_struct_aref(struct) 34.783M 34.810M 48.321M 58.378M i/s - 40.000M times in 1.149996s 1.149097s 0.827794s 0.685192s Comparison: mjit_struct_aref(struct) after --jit: 58377836.7 i/s before --jit: 48321205.7 i/s - 1.21x slower after: 34809935.5 i/s - 1.68x slower before: 34782736.5 i/s - 1.68x slower ```	2020-07-06 00:14:00 -07:00
Takashi Kokubun	24fa37d87a	Make Kernel#then, #yield_self, #frozen? builtin (#3283 ) * Make Kernel#then, #yield_self, #frozen? builtin * Fix test_jit	2020-07-03 18:02:43 -07:00
Takashi Kokubun	f3a0d7a203	Rewrite Kernel#tap with Ruby (#3281 ) * Rewrite Kernel#tap with Ruby This was good for VM too, but of course my intention is to unblock JIT's inlining of a block over yield (inlining invokeyield has not been committed though). * Fix test_settracefunc About the :tap deletions, the :tap events are actually traced (we already have a TracePoint test for builtin methods), but it's filtered out by tp.path == "xyzzy" (it became "<internal:kernel>"). We could trace tp.path == "<internal:kernel>" cases too, but the lineno is impacted by kernel.rb changes and I didn't want to make it fragile for kernel.rb lineno changes.	2020-07-03 09:52:35 -07:00
Takashi Kokubun	0703e01471	Mark some Integer methods as inline (#3264 )	2020-06-27 10:07:47 -07:00
Vladimir Dementyev	6770d8f1b0	Add pattern matching with arrays benchmark	2020-06-27 13:51:03 +09:00
Takashi Kokubun	7982dc1dfd	Decide JIT-ed insn based on cached cfunc for opt_* insns. opt_eq handles rb_obj_equal inside opt_eq, and all other cfunc is handled by opt_send_without_block. Therefore we can't decide which insn should be generated by checking whether it's cfunc cc or not. ``` $ benchmark-driver -v --rbenv 'before --jit;after --jit' benchmark/mjit_opt_cc_insns.yml --repeat-count=4 before --jit: ruby 2.8.0dev (2020-06-26T05:21:43Z master `9dbc2294a6`) +JIT [x86_64-linux] after --jit: ruby 2.8.0dev (2020-06-26T06:30:18Z master 75cece1b0b) +JIT [x86_64-linux] last_commit=Decide JIT-ed insn based on cached cfunc Calculating ------------------------------------- before --jit after --jit mjit_nil?(1) 73.878M 74.021M i/s - 40.000M times in 0.541432s 0.540391s mjit_not(1) 72.635M 74.601M i/s - 40.000M times in 0.550702s 0.536187s mjit_eq(1, nil) 7.331M 7.445M i/s - 8.000M times in 1.091211s 1.074596s mjit_eq(nil, 1) 49.450M 64.711M i/s - 8.000M times in 0.161781s 0.123627s Comparison: mjit_nil?(1) after --jit: 74020528.4 i/s before --jit: 73878185.9 i/s - 1.00x slower mjit_not(1) after --jit: 74600882.0 i/s before --jit: 72634507.6 i/s - 1.03x slower mjit_eq(1, nil) after --jit: 7444657.4 i/s before --jit: 7331304.3 i/s - 1.02x slower mjit_eq(nil, 1) after --jit: 64710790.6 i/s before --jit: 49449507.4 i/s - 1.31x slower ```	2020-06-25 23:33:08 -07:00
Takashi Kokubun	946e5cc668	Annotate Kernel#class as inline (#3250 ) ``` $ benchmark-driver -v --rbenv 'before;after;before --jit;after --jit' benchmark/mjit_class.yml --repeat-count=4 before: ruby 2.8.0dev (2020-06-23T07:09:54Z master `37a2e48d76`) [x86_64-linux] after: ruby 2.8.0dev (2020-06-23T17:29:56Z inline-class 0ff147c007) [x86_64-linux] before --jit: ruby 2.8.0dev (2020-06-23T07:09:54Z master `37a2e48d76`) +JIT [x86_64-linux] after --jit: ruby 2.8.0dev (2020-06-23T17:29:56Z inline-class 0ff147c007) +JIT [x86_64-linux] Calculating ------------------------------------- before after before --jit after --jit mjit_class(self) 39.219M 40.060M 53.502M 69.202M i/s - 40.000M times in 1.019915s 0.998495s 0.747631s 0.578021s mjit_class(1) 39.567M 41.242M 52.100M 68.895M i/s - 40.000M times in 1.010935s 0.969885s 0.767749s 0.580591s Comparison: mjit_class(self) after --jit: 69201690.7 i/s before --jit: 53502336.4 i/s - 1.29x slower after: 40060289.1 i/s - 1.73x slower before: 39218939.2 i/s - 1.76x slower mjit_class(1) after --jit: 68895358.6 i/s before --jit: 52100353.0 i/s - 1.32x slower after: 41241993.6 i/s - 1.67x slower before: 39567314.0 i/s - 1.74x slower ```	2020-06-23 23:49:03 -07:00
Takashi Kokubun	78352fb52e	Compile opt_send for opt_* only when cc has ISeq because opt_nil/opt_not/opt_eq populates cc even when it doesn't fallback to opt_send_without_block because of vm_method_cfunc_is. ``` $ benchmark-driver -v --rbenv 'before --jit;after --jit' benchmark/mjit_opt_cc_insns.yml --repeat-count=4 before --jit: ruby 2.8.0dev (2020-06-22T08:11:24Z master `d231b8f95b`) +JIT [x86_64-linux] after --jit: ruby 2.8.0dev (2020-06-22T08:53:27Z master e1125879ed) +JIT [x86_64-linux] last_commit=Compile opt_send for opt_* only when cc has ISeq Calculating ------------------------------------- before --jit after --jit mjit_nil?(1) 54.106M 73.693M i/s - 40.000M times in 0.739288s 0.542795s mjit_not(1) 53.398M 74.477M i/s - 40.000M times in 0.749090s 0.537075s mjit_eq(1, nil) 7.427M 6.497M i/s - 8.000M times in 1.077136s 1.231326s Comparison: mjit_nil?(1) after --jit: 73692594.3 i/s before --jit: 54106108.4 i/s - 1.36x slower mjit_not(1) after --jit: 74477487.9 i/s before --jit: 53398125.0 i/s - 1.39x slower mjit_eq(1, nil) before --jit: 7427105.9 i/s after --jit: 6497063.0 i/s - 1.14x slower ``` Actually opt_eq becomes slower by this. Maybe it's indeed using opt_send_without_block, but I'll approach that one in another commit.	2020-06-22 02:08:21 -07:00
Takashi Kokubun	4c5780e51e	Share warmup logic across MJIT benchmarks	2020-06-22 00:54:27 -07:00
Takashi Kokubun	faf93e4545	The RUBYOPT= comment is no longer needed	2020-06-22 00:20:30 -07:00
Takashi Kokubun	8838600c1e	Stop relying on `make benchmark`'s `-I$(srcdir)/benchmark/lib` These days I don't use `make benchmark`. The YAML files should be executable with bare `benchmark-driver` CLI without passing `RUBYOPT=-Ibenchmark/lib`.	2020-06-22 00:17:10 -07:00
Takashi Kokubun	7561db8c00	Introduce Primitive.attr! to annotate 'inline' (#3242 ) [Feature #15589]	2020-06-20 17:13:03 -07:00
Takashi Kokubun	95b0fed371	Make Integer#zero? a separated method and builtin (#3226 ) A prerequisite to fix https://bugs.ruby-lang.org/issues/15589 with JIT. This commit alone doesn't make a significant difference yet, but I thought this commit should be committed independently. This method override was discussed in [Misc #16961].	2020-06-20 14:55:09 -07:00
Ryuta Kamizono	9d24ddbb53	Fix `make benchmark` example `make benchmark ARGS=../benchmark/erb_render.yml` does not work. ``` % make benchmark ARGS=../benchmark/erb_render.yml /Users/kamipo/.rbenv/shims/ruby --disable=gems -rrubygems -I./benchmark/lib ./benchmark/benchmark-driver/exe/benchmark-driver \ --executables="compare-ruby::/Users/kamipo/.rbenv/shims/ruby --disable=gems -I.ext/common --disable-gem" \ --executables="built-ruby::./miniruby -I./lib -I. -I.ext/common ./tool/runruby.rb --extout=.ext -- --disable-gems --disable-gem" \ ../benchmark/erb_render.yml Traceback (most recent call last): 6: from ./benchmark/benchmark-driver/exe/benchmark-driver:112:in `<main>' 5: from ./benchmark/benchmark-driver/exe/benchmark-driver:112:in `flat_map' 4: from ./benchmark/benchmark-driver/exe/benchmark-driver:112:in `each' 3: from ./benchmark/benchmark-driver/exe/benchmark-driver:122:in `block in <main>' 2: from /Users/kamipo/.rbenv/versions/2.6.6/lib/ruby/2.6.0/psych.rb:577:in `load_file' 1: from /Users/kamipo/.rbenv/versions/2.6.6/lib/ruby/2.6.0/psych.rb:577:in `open' /Users/kamipo/.rbenv/versions/2.6.6/lib/ruby/2.6.0/psych.rb:577:in `initialize': No such file or directory @ rb_sysopen - ../benchmark/erb_render.yml (Errno::ENOENT) make: *** [benchmark] Error 1 % make benchmark ARGS=benchmark/erb_render.yml /Users/kamipo/.rbenv/shims/ruby --disable=gems -rrubygems -I./benchmark/lib ./benchmark/benchmark-driver/exe/benchmark-driver \ --executables="compare-ruby::/Users/kamipo/.rbenv/shims/ruby --disable=gems -I.ext/common --disable-gem" \ --executables="built-ruby::./miniruby -I./lib -I. -I.ext/common ./tool/runruby.rb --extout=.ext -- --disable-gems --disable-gem" \ benchmark/erb_render.yml Calculating ------------------------------------- compare-ruby built-ruby erb_render 825.454k 783.664k i/s - 1.500M times in 1.817181s 1.914086s Comparison: erb_render compare-ruby: 825454.4 i/s built-ruby: 783663.8 i/s - 1.05x slower ```	2020-06-07 10:33:14 +09:00
卜部昌平	d4015cfee3	add benchmark for different block handlers	2020-06-03 16:13:47 +09:00
Nobuyoshi Nakada	02cb643ddb	Added String#split benchmark for empty regexp \| \|compare-ruby\|built-ruby\| \|:--------------\|-----------:\|---------:\| \|re_chars-1 \| 169.230k\| 973.855k\| \| \| -\| 5.75x\| \|re_chars-10 \| 25.536k\| 107.598k\| \| \| -\| 4.21x\| \|re_chars-100 \| 2.621k\| 11.207k\| \| \| -\| 4.28x\| \|re_chars-1000 \| 259.098\| 1.133k\| \| \| -\| 4.37x\|	2020-05-12 22:59:58 +09:00

1 2 3 4 5 ...

456 Коммитов