github/ruby - ruby

Граф коммитов

Автор	SHA1	Сообщение	Дата
Jean Boussier	9594db0cf2	Implement Hash.new(capacity:) [Feature #19236] When building a large hash, pre-allocating it with enough capacity can save many re-hashes and significantly improve performance. ``` /opt/rubies/3.3.0/bin/ruby --disable=gems -rrubygems -I./benchmark/lib ./benchmark/benchmark-driver/exe/benchmark-driver \ --executables="compare-ruby::../miniruby-master -I.ext/common --disable-gem" \ --executables="built-ruby::./miniruby --disable-gem" \ --output=markdown --output-compare -v $(find ./benchmark -maxdepth 1 -name 'hash_new' -o -name 'hash_new.yml' -o -name 'hash_new.rb' \| sort) compare-ruby: ruby 3.4.0dev (2024-03-25T11:48:11Z master `f53209f023`) +YJIT dev [arm64-darwin23] last_commit=[ruby/irb] Cache RDoc::RI::Driver.new (https://github.com/ruby/irb/pull/911) built-ruby: ruby 3.4.0dev (2024-03-25T15:29:40Z hash-new-rb 77652b08a2) +YJIT dev [arm64-darwin23] warming up... \| \|compare-ruby\|built-ruby\| \|:-------------------\|-----------:\|---------:\| \|new \| 7.614M\| 5.976M\| \| \| 1.27x\| -\| \|new_with_capa_1k \| 13.931k\| 15.698k\| \| \| -\| 1.13x\| \|new_with_capa_100k \| 124.746\| 148.283\| \| \| -\| 1.19x\| ```	2024-07-08 12:24:33 +02:00
Jean Boussier	9e9f1d9301	Precompute embedded string literals hash code With embedded strings we often have some space left in the slot, which we can use to store the string Hash code. It's probably only worth it for string literals, as they are the ones likely to be used as hash keys. We chose to store the Hash code right after the string terminator as to make it easy/fast to compute, and not require one more union in RString. ``` compare-ruby: ruby 3.4.0dev (2024-04-22T06:32:21Z main `f77618c1fa`) [arm64-darwin23] built-ruby: ruby 3.4.0dev (2024-04-22T10:13:03Z interned-string-ha.. 8a1a32331b) [arm64-darwin23] last_commit=Precompute embedded string literals hash code \| \|compare-ruby\|built-ruby\| \|:-----------\|-----------:\|---------:\| \|symbol \| 39.275M\| 39.753M\| \| \| -\| 1.01x\| \|dyn_symbol \| 37.348M\| 37.704M\| \| \| -\| 1.01x\| \|small_lit \| 29.514M\| 33.948M\| \| \| -\| 1.15x\| \|frozen_lit \| 27.180M\| 33.056M\| \| \| -\| 1.22x\| \|iseq_lit \| 27.391M\| 32.242M\| \| \| -\| 1.18x\| ``` Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>	2024-05-28 07:32:41 +02:00
Aaron Patterson	f86fb1eda2	add allocation benchmark	2024-04-15 11:29:48 -07:00
Takashi Kokubun	70de3b170b	Optimize Hash methods with Kernel#hash (#10160 )	2024-03-01 11:16:31 -08:00
Jeremy Evans	f446d68ba6	Add benchmarks for super and zsuper calls of different types These show gains from the recent optimization commits: ``` arg_splat miniruby: 7346039.9 i/s miniruby-before: 4692240.8 i/s - 1.57x slower arg_splat_block miniruby: 6539749.6 i/s miniruby-before: 4358063.6 i/s - 1.50x slower splat_kw_splat miniruby: 5433641.5 i/s miniruby-before: 3851048.6 i/s - 1.41x slower splat_kw_splat_block miniruby: 4916137.1 i/s miniruby-before: 3477090.1 i/s - 1.41x slower splat_kw_block miniruby: 2912829.5 i/s miniruby-before: 2465611.7 i/s - 1.18x slower arg_splat_post miniruby: 2195208.2 i/s miniruby-before: 1860204.3 i/s - 1.18x slower ``` zsuper only speeds up in the post argument case, because it was already set to use splatarray false in cases where there were no post arguments.	2024-03-01 07:10:25 -08:00
Alan Wu	e4272fd292	Avoid allocation when passing no keywords to anonymous kwrest methods Thanks to the new semantics from [ruby-core:115808], `nil` is now equivalent to `{}`. Since the only thing one could do with anonymous keyword rest parameter is to delegate it with ``, nil is just as good as an empty hash. Using nil avoids allocating an empty hash. This is particularly important for `...` methods since they now use `kwrest` under the hood after `4f77d8d328`. Most calls don't pass keywords. Comparison: fw_no_kw post: 9816800.9 i/s pre: 8570297.0 i/s - 1.15x slower	2024-02-13 11:05:26 -05:00
Jeremy Evans	c20e819e8b	Fix crash when passing large keyword splat to method accepting keywords and keyword splat The following code previously caused a crash: ```ruby h = {} 1000000.times{\|i\| h[i.to_s.to_sym] = i} def f(kw: 1, kws) end f(h) ``` Inside a thread or fiber, the size of the keyword splat could be much smaller and still cause a crash. I found this issue while optimizing method calling by reducing implicit allocations. Given the following code: ```ruby def f(kw: , kws) end kw = {kw: 1} f(kw) ``` The `f(**kw)` call previously allocated two hashes callee side instead of a single hash. This is because `setup_parameters_complex` would extract the keywords from the keyword splat hash to the C stack, to attempt to mirror the case when literal keywords are passed without a keyword splat. Then, `make_rest_kw_hash` would build a new hash based on the extracted keywords that weren't used for literal keywords. Switch the implementation so that if a keyword splat is passed, literal keywords are deleted from the keyword splat hash (or a copy of the hash if the hash is not mutable). In addition to avoiding the crash, this new approach is much more efficient in all cases. With the included benchmark: ``` 1 miniruby: 5247879.9 i/s miniruby-before: 2474050.2 i/s - 2.12x slower 1_mutable miniruby: 1797036.5 i/s miniruby-before: 1239543.3 i/s - 1.45x slower 10 miniruby: 1094750.1 i/s miniruby-before: 365529.6 i/s - 2.99x slower 10_mutable miniruby: 407781.7 i/s miniruby-before: 225364.0 i/s - 1.81x slower 100 miniruby: 100992.3 i/s miniruby-before: 32703.6 i/s - 3.09x slower 100_mutable miniruby: 40092.3 i/s miniruby-before: 21266.9 i/s - 1.89x slower 1000 miniruby: 21694.2 i/s miniruby-before: 4949.8 i/s - 4.38x slower 1000_mutable miniruby: 5819.5 i/s miniruby-before: 2995.0 i/s - 1.94x slower ```	2024-02-11 22:48:38 -08:00
Takashi Kokubun	76f0eec20f	Fix a benchmark to avoid leaving a garbage file	2024-02-08 17:08:23 -08:00
Jeremy Evans	2217e08340	Optimize compilation of large literal arrays To avoid stack overflow, Ruby splits compilation of large arrays into smaller arrays, and concatenates the small arrays together. It previously used newarray/concatarray for this, which is inefficient. This switches the compilation to use pushtoarray, which is much faster. This makes almost all literal arrays only allocate a single array. For cases where there is a large amount of static values in the array, Ruby will statically compile subarrays, and previously added them using concatarray. This switches to concattoarray, avoiding an array allocation for the append. Keyword splats are also supported in arrays, and ignored if the keyword splat is empty. Previously, this used newarraykwsplat and concatarray. This still uses newarraykwsplat, but switches to concattoarray to save an allocation. So large arrays with keyword splats can allocate 2 arrays instead of 1. Previously, for the following array sizes (assuming local variable access for each element), Ruby allocated the following number of arrays: 1000 elements: 7 arrays 10000 elements: 79 arrays 100000 elements: 781 arrays With these changes, only a single array is allocated (or 2 for a large array with a keyword splat. Results using the included benchmark: ``` array_1000 miniruby: 34770.0 i/s ./miniruby-before: 10511.7 i/s - 3.31x slower array_10000 miniruby: 4938.8 i/s ./miniruby-before: 483.8 i/s - 10.21x slower array_100000 miniruby: 727.2 i/s ./miniruby-before: 4.1 i/s - 176.98x slower ``` Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>	2024-01-27 10:16:52 -08:00
Jeremy Evans	42d891be2c	Add benchmark for implicit array/hash allocation reduction changes Benchmark results: ``` named_multi_arg_splat after: 5344097.6 i/s before: 3088134.0 i/s - 1.73x slower named_post_splat after: 5401882.3 i/s before: 2629321.8 i/s - 2.05x slower anon_arg_splat after: 12242780.9 i/s before: 6845413.2 i/s - 1.79x slower anon_arg_kw_splat after: 11277398.7 i/s before: 4329509.4 i/s - 2.60x slower anon_multi_arg_splat after: 5132699.5 i/s before: 3018103.7 i/s - 1.70x slower anon_post_splat after: 5602915.1 i/s before: 2645185.5 i/s - 2.12x slower anon_kw_splat after: 15403727.3 i/s before: 6249504.6 i/s - 2.46x slower anon_fw_to_named_splat after: 2985715.3 i/s before: 2049159.9 i/s - 1.46x slower anon_fw_to_named_no_splat after: 2941030.4 i/s before: 2100380.0 i/s - 1.40x slower fw_to_named_splat after: 2801008.7 i/s before: 2012416.4 i/s - 1.39x slower fw_to_named_no_splat after: 2742670.4 i/s before: 1957707.2 i/s - 1.40x slower fw_to_anon_to_named_splat after: 2309246.6 i/s before: 1375924.6 i/s - 1.68x slower fw_to_anon_to_named_no_splat after: 2193227.6 i/s before: 1351184.1 i/s - 1.62x slower ```	2024-01-24 18:25:55 -08:00
Takashi Kokubun	c84237f953	Rewrite Array#each in Ruby using Primitive (#9533 )	2024-01-23 20:09:57 +00:00
Takashi Kokubun	27c1dd8634	YJIT: Allow inlining ISEQ calls with a block (#9622 ) * YJIT: Allow inlining ISEQ calls with a block * Leave a TODO comment about u16 inline_block	2024-01-23 19:36:23 +00:00
Jeremy Evans	f5a01b0916	Add benchmark for recent optimization to avoid implicit allocations	2023-12-07 11:27:55 -08:00
Jeremy Evans	3081c83169	Support tracing of struct member accessor methods This follows the same approach used for attr_reader/attr_writer in `2d98593bf5`, skipping the checking for tracing after the first call using the call cache, and clearing the call cache when tracing is turned on/off. Fixes [Bug #18886]	2023-12-07 10:29:33 -08:00
Jean Boussier	83c385719d	Specialize String#dup `String#+@` is 2-3 times faster than `String#dup` because it can directly go through `rb_str_dup` instead of using the generic much slower `rb_obj_dup`. This fact led to the existance of the ugly `Performance/UnfreezeString` rubocop performance rule that encourage users to rewrite the much more readable and convenient `"foo".dup` into the ugly `(+"foo")`. Let's make that rubocop rule useless. ``` compare-ruby: ruby 3.3.0dev (2023-11-20T02:02:55Z master `701b0650de`) [arm64-darwin22] last_commit=[ruby/prism] feat: add encoding for IBM865 (https://github.com/ruby/prism/pull/1884) built-ruby: ruby 3.3.0dev (2023-11-20T12:51:45Z faster-str-lit-dup 6b745bbc5d) [arm64-darwin22] warming up.. \| \|compare-ruby\|built-ruby\| \|:------\|-----------:\|---------:\| \|uplus \| 16.312M\| 16.332M\| \| \| -\| 1.00x\| \|dup \| 5.912M\| 16.329M\| \| \| -\| 2.76x\| ```	2023-11-20 14:33:20 +01:00
Jean Boussier	b92b9e1e9e	vm_getivar: assume the cached shape_id like have a common ancestor When an inline cache misses, it is very likely that the stale shape_id and the current instance shape_id have a close common ancestor. For example if the instance variable is sometimes frozen sometimes not, one of the two shape will be the direct parent of the other. Another pattern that commonly cause IC misses is "memoization", in such case the object will have a "base common shape" and then a number of close descendants. In addition, when we find a common ancestor, we store it in the inline cache instead of the current shape. This help prevent the cache from flip-flopping, ensuring the next lookup will be marginally faster and more generally avoid writing in memory too much. However, now that shapes have an ancestors index, we only check for a few ancestors before falling back to use the index. So overall this change speeds up what is assumed to be the more common case, but makes what is assumed to be the less common case a bit slower. ``` compare-ruby: ruby 3.3.0dev (2023-10-26T05:30:17Z master `701ca070b4`) [arm64-darwin22] built-ruby: ruby 3.3.0dev (2023-10-26T09:25:09Z shapes_double_sear.. a723a85235) [arm64-darwin22] warming up...... \| \|compare-ruby\|built-ruby\| \|:------------------------------------\|-----------:\|---------:\| \|vm_ivar_stable_shape \| 11.672M\| 11.679M\| \| \| -\| 1.00x\| \|vm_ivar_memoize_unstable_shape \| 7.551M\| 10.506M\| \| \| -\| 1.39x\| \|vm_ivar_memoize_unstable_shape_miss \| 11.591M\| 11.624M\| \| \| -\| 1.00x\| \|vm_ivar_unstable_undef \| 9.037M\| 7.981M\| \| \| 1.13x\| -\| \|vm_ivar_divergent_shape \| 8.034M\| 6.657M\| \| \| 1.21x\| -\| \|vm_ivar_divergent_shape_imbalanced \| 10.471M\| 9.231M\| \| \| 1.13x\| -\| ``` Co-Authored-By: John Hawthorn <john@hawthorn.email>	2023-11-03 12:47:43 +01:00
Aaron Patterson	884c3195d9	Update benchmark/vm_ivar_ic_miss.yml Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>	2023-10-24 10:52:06 -07:00
Aaron Patterson	84e4453436	Use a functional red-black tree for indexing the shapes This is an experimental commit that uses a functional red-black tree to create an index of the ancestor shapes. It uses an Okasaki style functional red black tree: https://www.cs.tufts.edu/comp/150FP/archive/chris-okasaki/redblack99.pdf This tree is advantageous because: * It offers O(n log n) insertions and O(n log n) lookups. * It shares memory with previous "versions" of the tree When we insert a node in the tree, only the parts of the tree that need to be rebalanced are newly allocated. Parts of the tree that don't need to be rebalanced are not reallocated, so "new trees" are able to share memory with old trees. This is in contrast to a sorted set where we would have to duplicate the set, and also resort the set on each insertion. I've added a new stat to RubyVM.stat so we can understand how the red black tree increases.	2023-10-24 10:52:06 -07:00
Nobuyoshi Nakada	ccd18d0557	Clean up temporary file, wc.input [ci skip]	2023-10-24 12:30:10 +09:00
Kouhei Yanagita	769f53eb7e	Add benchmarks for Range#reverse_each	2023-10-12 17:34:49 +09:00
Kouhei Yanagita	6ae2996e29	Optimize `Range#count` by using `range_size` if possible	2023-10-05 00:19:55 +09:00
Kouhei Yanagita	91042ec0ae	Add benchmarks for Range#bsearch	2023-09-26 17:31:10 +09:00
Kouhei Yanagita	7e350f5310	Optimize Range#bsearch for beginless/endless ranges within Fixnum	2023-09-21 10:30:58 +09:00
Nobuyoshi Nakada	b4213a73b8	[Feature #19839 ] Fix `Range#overlap?` for empty ranges Empty ranges do not overlap with any range. Regarding benchmarks, PR#8242 is significantly faster in some cases, but one of these two cases is a wrong result. \| \|ActiveSupport\| PR#8242\|built-ruby\| \|:--------------------------\|------------:\|-------:\|---------:\| \|(2..3).overlap?(1..1) \| 7.761M\| 15.053M\| 32.368M\| \| \| -\| 1.94x\| 4.17x\| \|(2..3).overlap?(2..4) \| 25.720M\| 55.070M\| 21.981M\| \| \| 1.17x\| 2.51x\| -\| \|(2..3).overlap?(4..5) \| 7.616M\| 15.048M\| 21.730M\| \| \| -\| 1.98x\| 2.85x\| \|(2..3).overlap?(2..1) \| 25.585M\| 56.545M\| 32.786M\| \| \| -\| 2.21x\| 1.28x\| \|(2..3).overlap?(0..1) \| 7.554M\| 14.755M\| 32.545M\| \| \| -\| 1.95x\| 4.31x\| \|(2..3).overlap?(...1) \| 6.681M\| 5.843M\| 32.255M\| \| \| 1.14x\| -\| 5.52x\| \|(2...3).overlap?(..2) \| 6.676M\| 5.817M\| 21.572M\| \| \| 1.15x\| -\| 3.71x\| \|(2...3).overlap?(3...) \| 7.392M\| 14.755M\| 31.805M\| \| \| -\| 2.00x\| 4.30x\| \|(2..3).overlap?('a'..'d') \| 3.675M\| 3.482M\| 17.009M\| \| \| 1.06x\| -\| 4.89x\|	2023-09-16 17:24:21 +09:00
Kouhei Yanagita	7d08dbd015	Optimize Range#bsearch for beginless/endless ranges On Range#bsearch for endless ranges, we try positions at `begin + 2i` (i = 0, 1, 2, ...) to find a point that satisfies a given condition. Subsequently, we perform binary searching with the interval `[begin, begin + 2n]`. However, the interval `[begin + 2(n-1), begin + 2n]` is sufficient for binary search because `begin + 2**(n-1)` does not satisfy the condition. The same applies to beginless ranges.	2023-09-16 12:10:09 +09:00
Nobuyoshi Nakada	5e79d5a560	Make `rb_str_rindex` return byte index Leave callers to convert byte index to char index, as well as `rb_str_index`, so that `rb_str_rpartition` does not need to re-convert char index to byte index.	2023-07-09 16:39:28 +09:00
Nobuyoshi Nakada	ab6eb3786c	Optimize `Regexp#dup` and `Regexp.new(/RE/)` When copying from another regexp, copy already built `regex_t` instead of re-compiling its source.	2023-06-09 20:22:30 +09:00
nekoyama32767	87217f26f1	[Feature #19643 ] Direct primitive compare sort for `Array#sort_by` In most of case `sort_by` works on primitive type. Using `qsort_r` with function pointer is much slower than compare data directly. I implement an intro sort which compare primitive data directly for `sort_by`. We can even afford an O(n) type check before primitive data sort. It still go faster.	2023-05-20 19:40:27 +09:00
Jeremy Evans	a82a24ed57	Optimize method_missing calls CALLER_ARG_SPLAT is not necessary for method_missing. We just need to unshift the method name into the arguments. This optimizes all method_missing calls: * mm(recv) ~9% * mm(recv, args) ~215% for args.length == 200 mm(recv, args, kw) ~55% for args.length == 200 mm(recv, *kw) ~22% mm(recv, kw: 1) ~100% Note that empty argument splats do get slower with this approach, by about 30-40%. Other than non-empty argument splats, other argument splats are faster, with the speedup depending on the number of arguments.	2023-04-25 08:06:16 -07:00
Jeremy Evans	583e9d24d4	Optimize symproc calls Similar to the bmethod/send optimization, this avoids using CALLER_ARG_SPLAT if not necessary. As long as the receiver argument can be shifted off, other arguments are passed through as-is. This optimizes the following types of calls: * symproc.(recv) ~5% * symproc.(recv, args) ~65% for args.length == 200 symproc.(recv, args, kw) ~45% for args.length == 200 symproc.(recv, *kw) ~30% symproc.(recv, kw: 1) ~100% Note that empty argument splats do get slower with this approach, by about 2-3%. This is probably because iseq argument setup is slower for empty argument splats than CALLER_SETUP_ARG is. Other than non-empty argument splats, other argument splats are faster, with the speedup depending on the number of arguments. The following types of calls are not optimized: * symproc.(args) symproc.(args, *kw) This is because the you cannot shift the receiver argument off without first splatting the arg.	2023-04-25 08:06:16 -07:00
Jeremy Evans	9b4bf02aa8	Optimize send calls Similar to the bmethod optimization, this avoids using CALLER_ARG_SPLAT if not necessary. As long as the method argument can be shifted off, other arguments are passed through as-is. This optimizes the following types of calls: * send(meth, arg) ~5% * send(meth, args) ~75% for args.length == 200 send(meth, args, kw) ~50% for args.length == 200 send(meth, *kw) ~25% send(meth, kw: 1) ~115% Note that empty argument splats do get slower with this approach, by about 20%. This is probably because iseq argument setup is slower for empty argument splats than CALLER_SETUP_ARG is. Other than non-empty argument splats, other argument splats are faster, with the speedup depending on the number of arguments. The following types of calls are not optimized: * send(args) send(args, *kw) This is because the you cannot shift the method argument off without first splatting the arg.	2023-04-25 08:06:16 -07:00
Jeremy Evans	af2da6419a	Optimize cfunc calls for f(a) and f(a, *kw) if kw is empty This optimizes the following calls: ~10-15% for f(a) when a does not end with a flagged keywords hash ~10-15% for f(a) when a ends with an empty flagged keywords hash ~35-40% for f(a, *kw) if kw is empty This still copies the array contents to the VM stack, but avoids some overhead. It would be faster to use the array pointer directly, but that could cause problems if the array was modified during the call to the function. You could do that optimization for frozen arrays, but as splatting frozen arrays is uncommon, and the speedup is minimal (<5%), it doesn't seem worth it. The vm_send_cfunc benchmark has been updated to test additional cfunc call types, and the numbers above were taken from the benchmark results.	2023-04-25 08:06:16 -07:00
Jeremy Evans	f6254f77f7	Speed up calling iseq bmethods Currently, bmethod arguments are copied from the VM stack to the C stack in vm_call_bmethod, then copied from the C stack to the VM stack later in invoke_iseq_block_from_c. This is inefficient. This adds vm_call_iseq_bmethod and vm_call_noniseq_bmethod. vm_call_iseq_bmethod is an optimized method that skips stack copies (though there is one copy to remove the receiver from the stack), and avoids calling vm_call_bmethod_body, rb_vm_invoke_bmethod, invoke_block_from_c_proc, invoke_iseq_block_from_c, and vm_yield_setup_args. Th vm_call_iseq_bmethod argument handling is similar to the way normal iseq methods are called, and allows for similar performance optimizations when using splats or keywords. However, even in the no argument case it's still significantly faster. A benchmark is added for bmethod calling. In my environment, it improves bmethod calling performance by 38-59% for simple bmethod calls, and up to 180% for bmethod calls passing literal keywords on both sides. ``` ./miniruby-iseq-bmethod: 18159792.6 i/s ./miniruby-m: 13174419.1 i/s - 1.38x slower bmethod_simple_1 ./miniruby-iseq-bmethod: 15890745.4 i/s ./miniruby-m: 10008972.7 i/s - 1.59x slower bmethod_simple_0_splat ./miniruby-iseq-bmethod: 13142804.3 i/s ./miniruby-m: 11168595.2 i/s - 1.18x slower bmethod_simple_1_splat ./miniruby-iseq-bmethod: 12375791.0 i/s ./miniruby-m: 8491140.1 i/s - 1.46x slower bmethod_no_splat ./miniruby-iseq-bmethod: 10151258.8 i/s ./miniruby-m: 8716664.1 i/s - 1.16x slower bmethod_0_splat ./miniruby-iseq-bmethod: 8138802.5 i/s ./miniruby-m: 7515600.2 i/s - 1.08x slower bmethod_1_splat ./miniruby-iseq-bmethod: 8028372.7 i/s ./miniruby-m: 5947658.6 i/s - 1.35x slower bmethod_10_splat ./miniruby-iseq-bmethod: 6953514.1 i/s ./miniruby-m: 4840132.9 i/s - 1.44x slower bmethod_100_splat ./miniruby-iseq-bmethod: 5287288.4 i/s ./miniruby-m: 2243218.4 i/s - 2.36x slower bmethod_kw ./miniruby-iseq-bmethod: 8931358.2 i/s ./miniruby-m: 3185818.6 i/s - 2.80x slower bmethod_no_kw ./miniruby-iseq-bmethod: 12281287.4 i/s ./miniruby-m: 10041727.9 i/s - 1.22x slower bmethod_kw_splat ./miniruby-iseq-bmethod: 5618956.8 i/s ./miniruby-m: 3657549.5 i/s - 1.54x slower ```	2023-04-25 08:06:16 -07:00
Takashi Kokubun	66c4dc1592	Remove MJIT-specific benchmarks	2023-03-06 22:36:57 -08:00
John Bampton	c43fbe4ebd	Fix spelling (#7405 )	2023-02-28 10:05:30 -08:00
Matt Valentine-House	2605615fe6	Benchmark String interpolation across size pools	2023-01-13 10:31:35 -05:00
Takashi Kokubun	509da028c2	Rewrite Kernel#loop in Ruby (#6983 ) * Rewrite Kernel#loop in Ruby * Use enum_for(:loop) { Float::INFINITY } Co-authored-by: Ufuk Kayserilioglu <ufuk@paralaus.com> * Limit the scope to rescue StopIteration Co-authored-by: Ufuk Kayserilioglu <ufuk@paralaus.com>	2022-12-25 21:46:29 -08:00
Nobuyoshi Nakada	8c272f4481	[Feature #18033 ] Make Time.new parse time strings `Time.new` now parses strings such as the result of `Time#inspect` and restricted ISO-8601 formats.	2022-12-16 22:52:59 +09:00
Daniel Colson	e69b91fae4	Introduce BOP_CMP for optimized comparison Prior to this commit the `OPTIMIZED_CMP` macro relied on a method lookup to determine whether `<=>` was overridden. The result of the lookup was cached, but only for the duration of the specific method that initialized the cmp_opt_data cache structure. With this method lookup, `[x,y].max` is slower than doing `x > y ? x : y` even though there's an optimized instruction for "new array max". (John noticed somebody a proposed micro-optimization based on this fact in https://github.com/mastodon/mastodon/pull/19903.) ```rb a, b = 1, 2 Benchmark.ips do \|bm\| bm.report('conditional') { a > b ? a : b } bm.report('method') { [a, b].max } bm.compare! end ``` Before: ``` Comparison: conditional: 22603733.2 i/s method: 19820412.7 i/s - 1.14x (± 0.00) slower ``` This commit replaces the method lookup with a new CMP basic op, which gives the examples above equivalent performance. After: ``` Comparison: method: 24022466.5 i/s conditional: 23851094.2 i/s - same-ish: difference falls within error ``` Relevant benchmarks show an improvement to Array#max and Array#min when not using the optimized newarray_max instruction as well. They are noticeably faster for small arrays with the relevant types, and the same or maybe a touch faster on larger arrays. ``` $ make benchmark COMPARE_RUBY=<master@5958c305> ITEM=array_min $ make benchmark COMPARE_RUBY=<master@5958c305> ITEM=array_max ``` The benchmarks added in this commit also look generally improved. Co-authored-by: John Hawthorn <jhawthorn@github.com>	2022-12-06 12:37:23 -08:00
Takashi Kokubun	d15d1c01c2	Rename --mjit-min-calls to --mjit-call-threshold (#6731 ) for consistency with YJIT	2022-11-14 23:38:52 -08:00
Takashi Kokubun	f276d5a7fe	Improve HTML escape benchmarks	2022-11-04 23:54:25 -07:00
S.H	c6f439a6a8	Improve performance some `Integer` and `Float` methods [Feature #19085 ] (#6638 ) * Improve some Integer and Float methods * Using alias and Remove unnecessary code * Remove commentout code	2022-10-27 09:13:16 -07:00
Samuel Williams	025b8701c0	Add several new methods for getting and setting buffer contents. (#6434 )	2022-09-26 18:06:12 +13:00
Jemma Issroff	b5c459d57a	Adds a benchmark to measure freezing objects	2022-09-22 10:29:43 -07:00
HParker	fbaac837cf	avoid extra dup and pop in compile_op_asgn2 Co-authored-by: John Hawthorn <jhawthorn@github.com>	2022-09-22 09:47:13 -07:00
Jemma Issroff	aecb57ceb0	Fix style on vm_ivar benchmarks (#6379 )	2022-09-15 09:39:39 +09:00
Jemma Issroff	513a11b477	Add vm_ivar get, get_unitialized, and lazy_set benchmarks	2022-09-14 13:50:47 -07:00
Jean Boussier	cd1724bdde	rb_str_concat_literals: use rb_str_buf_append That's about 1.30x faster.	2022-09-08 15:02:21 +02:00
John Hawthorn	679ef34586	New constant caching insn: opt_getconstant_path Previously YARV bytecode implemented constant caching by having a pair of instructions, opt_getinlinecache and opt_setinlinecache, wrapping a series of getconstant calls (with putobject providing supporting arguments). This commit replaces that pattern with a new instruction, opt_getconstant_path, handling both getting/setting the inline cache and fetching the constant on a cache miss. This is implemented by storing the full constant path as a null-terminated array of IDs inside of the IC structure. idNULL is used to signal an absolute constant reference. $ ./miniruby --dump=insns -e '::Foo::Bar::Baz' == disasm: #<ISeq:<main>@-e:1 (1,0)-(1,13)> (catch: FALSE) 0000 opt_getconstant_path <ic:0 ::Foo::Bar::Baz> ( 1)[Li] 0002 leave The motivation for this is that we had increasingly found the need to disassemble the instructions between the opt_getinlinecache and opt_setinlinecache in order to determine the constant we are fetching, or otherwise store metadata. This disassembly was done: * In opt_setinlinecache, to register the IC against the constant names it is using for granular invalidation. * In rb_iseq_free, to unregister the IC from the invalidation table. * In YJIT to find the position of a opt_getinlinecache instruction to invalidate it when the cache is populated * In YJIT to register the constant names being used for invalidation. With this change we no longe need disassemly for these (in fact rb_iseq_each is now unused), as the list of constant names being referenced is held in the IC. This should also make it possible to make more optimizations in the future. This may also reduce the size of iseqs, as previously each segment required 32 bytes (on 64-bit platforms) for each constant segment. This implementation only stores one ID per-segment. There should be no significant performance change between this and the previous implementation. Previously opt_getinlinecache was a "leaf" instruction, but it included a jump (almost always to a separate cache line). Now opt_getconstant_path is a non-leaf (it may raise/autoload/call const_missing) but it does not jump. These seem to even out.	2022-09-01 15:20:49 -07:00
Takashi Kokubun	9f3140a42e	Remove mjit_exec benchmarks Now that mjit_exec doesn't exist, those files feel old. I'll probably change how I benchmark it when I add benchmarks for it again.	2022-08-21 11:35:40 -07:00

1 2 3 4 5 ...

438 Коммитов