github/ruby - ruby

Граф коммитов

Автор	SHA1	Сообщение	Дата
Ivo Anjo	425e468d25	[Backport #20633 ] Fix the condition for `atomic_signal_fence` (#11166 ) [Bug #20633] Fix the condition for `atomic_signal_fence` `AC_CHECK_DECLS` defines `HAVE_DECL_SYMBOL` to 1 if declared, 0 otherwise, not undefined. Co-authored-by: kimuraw (Wataru Kimura) <kimuraw@i.nifty.jp>	2024-07-15 09:08:16 -07:00
Ivo Anjo	b2be36ef33	[Backport #11036 ] Add explicit compiler fence when pushing frames to ensure safe profiling (#11090 ) What does this PR do? This PR tweaks the `vm_push_frame` function to add an explicit compiler fence (`atomic_signal_fence`) to ensure profilers that use signals to interrupt applications (stackprof, vernier, pf2, Datadog profiler) can safely sample from the signal handler. This is a backport of #11036 to Ruby 3.3 . Motivation: The `vm_push_frame` was specifically tweaked in https://github.com/ruby/ruby/pull/3296 to initialize the a frame before updating the `cfp` pointer. But since there's nothing stopping the compiler from reordering the initialization of a frame (`cfp =`) with the update of the cfp pointer (`ec->cfp = cfp`) we've been hesitant to rely on this on the Datadog profiler. In practice, after some experimentation + talking to folks, this reordering does not seem to happen. But since modern compilers have a way for us to exactly tell them not to do the reordering (`atomic_signal_fence`), this seems even better. I've actually extracted `vm_push_frame` into the "Compiler Explorer" website, which you can use to see the assembly output of this function across many compilers and architectures: https://godbolt.org/z/3oxd1446K On that link you can observe two things across many compilers: 1. The compilers are not reordering the writes 2. The barrier does not change the generated assembly output (== has no cost in practice) Additional Notes:* The checks added in `configure.ac` define two new macros: * `HAVE_STDATOMIC_H` * `HAVE_DECL_ATOMIC_SIGNAL_FENCE` Since Ruby generates an arch-specific `config.h` header with these macros upon installation, this can be used by profilers and other libraries to test if Ruby was compiled with the fence enabled. How to test the change? As I mentioned above, you can check https://godbolt.org/z/3oxd1446K to confirm the compiled output of `vm_push_frame` does not change in most compilers (at least all that I've checked on that site).	2024-07-03 09:56:00 -07:00
Jeremy Evans	3081c83169	Support tracing of struct member accessor methods This follows the same approach used for attr_reader/attr_writer in `2d98593bf5`, skipping the checking for tracing after the first call using the call cache, and clearing the call cache when tracing is turned on/off. Fixes [Bug #18886]	2023-12-07 10:29:33 -08:00
Jeremy Evans	3a88de3ca7	Support eval "return" at toplevel Since Ruby 2.4, `return` is supported at toplevel. This makes `eval "return"` also supported at toplevel. This mostly uses the same tests as direct `return` at toplevel, with a couple differences: `END {return if false}` is a SyntaxError, but `END {eval "return" if false}` is not an error since the eval is never executed. `END {return}` is a SyntaxError, but `END {eval "return"}` is a LocalJumpError. The following is a SyntaxError: ```ruby class X nil&defined?0--begin e=no_method_error(); return; 0;end end ``` However, the following is not, because the eval is never executed: ```ruby class X nil&defined?0--begin e=no_method_error(); eval "return"; 0;end end ``` Fixes [Bug #19779]	2023-12-07 09:30:36 -08:00
Peter Zhu	0aed37b973	Make expandarray compaction safe The expandarray instruction can allocate an array, which can trigger a GC compaction. However, since it does not increment the sp until the end of the instruction, the objects it places on the stack are not marked or reference updated by the GC, which can cause the objects to move which leaves broken or incorrect objects on the stack. This commit changes the instruction to be handles_sp so the sp is incremented inside of the instruction right after the object is written on the stack.	2023-12-01 17:13:56 -05:00
Alan Wu	cd4207869f	Fix cache incoherency for ME resolved through VM_METHOD_TYPE_REFINED Previously, we didn't invalidate the method entry wrapped by VM_METHOD_TYPE_REFINED method entries which could cause calls to land in the wrong method like it did in the included test. Do the invalidation, and adjust rb_method_entry_clone() to accommodate this new invalidation vector. Fix: `cfd7729ce7` See-also: `e201b81f79`	2023-11-28 13:03:04 -05:00
Jean Boussier	f1c32c0ee0	vm_setivar_slowpath: only optimize T_OBJECT We've seen occasional CI failures on i686 in this codepath: ``` [BUG] vm_setivar_slowpath: didn't find ivar @verify_depth in shape ``` Generic ivars are very complex to get right, but also quite rare. I don't see a good reason to take the risk to give them an optimized path here, when the much more common T_CLASS/T_MODULE don't have one. Having an optimization here means duplicating the fairly brittle logic, which is a recipe for bugs, and I don't think it's worth it in such case.	2023-11-23 11:19:12 +01:00
Jean Boussier	960a031a06	vm_setivar_slowpath: improve bug error message We're occasionally hitting this bug on CI, it would be useful to see if the id is consistent.	2023-11-15 11:19:03 +01:00
Peter Zhu	68869e9bd9	Revert "Revert "Remove SHAPE_CAPACITY_CHANGE shapes"" This reverts commit `5f3fb4f4e3`.	2023-11-13 18:26:36 -05:00
Peter Zhu	5f3fb4f4e3	Revert "Remove SHAPE_CAPACITY_CHANGE shapes" This reverts commit `f6910a6112`. We're seeing crashes in the test suite of Shopify's core monolith after this change.	2023-11-10 11:27:49 -05:00
Peter Zhu	f6910a6112	Remove SHAPE_CAPACITY_CHANGE shapes We don't need to create a shape to transition capacity as we can transition the capacity when the capacity of the SHAPE_IVAR changes.	2023-11-09 09:25:02 -05:00
Peter Zhu	1321df773b	Use shape capacity transitions for generic ivars This commit changes generic ivars to respect the capacity transition in shapes rather than growing the capacity independently.	2023-11-03 10:15:32 -04:00
Jean Boussier	b92b9e1e9e	vm_getivar: assume the cached shape_id like have a common ancestor When an inline cache misses, it is very likely that the stale shape_id and the current instance shape_id have a close common ancestor. For example if the instance variable is sometimes frozen sometimes not, one of the two shape will be the direct parent of the other. Another pattern that commonly cause IC misses is "memoization", in such case the object will have a "base common shape" and then a number of close descendants. In addition, when we find a common ancestor, we store it in the inline cache instead of the current shape. This help prevent the cache from flip-flopping, ensuring the next lookup will be marginally faster and more generally avoid writing in memory too much. However, now that shapes have an ancestors index, we only check for a few ancestors before falling back to use the index. So overall this change speeds up what is assumed to be the more common case, but makes what is assumed to be the less common case a bit slower. ``` compare-ruby: ruby 3.3.0dev (2023-10-26T05:30:17Z master `701ca070b4`) [arm64-darwin22] built-ruby: ruby 3.3.0dev (2023-10-26T09:25:09Z shapes_double_sear.. a723a85235) [arm64-darwin22] warming up...... \| \|compare-ruby\|built-ruby\| \|:------------------------------------\|-----------:\|---------:\| \|vm_ivar_stable_shape \| 11.672M\| 11.679M\| \| \| -\| 1.00x\| \|vm_ivar_memoize_unstable_shape \| 7.551M\| 10.506M\| \| \| -\| 1.39x\| \|vm_ivar_memoize_unstable_shape_miss \| 11.591M\| 11.624M\| \| \| -\| 1.00x\| \|vm_ivar_unstable_undef \| 9.037M\| 7.981M\| \| \| 1.13x\| -\| \|vm_ivar_divergent_shape \| 8.034M\| 6.657M\| \| \| 1.21x\| -\| \|vm_ivar_divergent_shape_imbalanced \| 10.471M\| 9.231M\| \| \| 1.13x\| -\| ``` Co-Authored-By: John Hawthorn <john@hawthorn.email>	2023-11-03 12:47:43 +01:00
Jean Boussier	0cb1fc3850	Fix vm_getivar to handle module with TOO_COMPLEX shape	2023-11-02 19:57:14 +01:00
Peter Zhu	e2d950733e	Add ST table to gen_ivtbl for complex shapes On 32-bit systems, we must store the shape ID in the gen_ivtbl to not lose the shape. If we directly store the ST table into the generic ivar table, then we lose the shape. This makes it impossible to determine the shape of the object and whether it is too complex or not.	2023-10-31 12:07:54 -04:00
Jean Boussier	4aacc559d9	Handle running out of shapes in `Object#dup` There is a handful of call sites where we may transition to OBJ_TOO_COMPLEX_SHAPE if we just ran out of shapes, but that weren't handling it properly.	2023-10-31 12:07:54 -04:00
Aaron Patterson	33bebee13a	Use available constants We don't need to intern "initialize" all the time because we already have `idInitialize` available	2023-10-24 14:23:17 -07:00
Takashi Kokubun	5808999d30	YJIT: Fallback opt_getconstant_path for const_missing (#8623 ) * YJIT: Fallback opt_getconstant_path for const_missing * Fix a comment [ci skip] * Remove a wrapper function	2023-10-13 08:52:23 -07:00
Adam Hess	8b236e0c66	[Bug #19896 ] fix memory leak in vm_method This introduces a unified reference_count to clarify who is referencing a method. This also allows us to treat the refinement method as the def owner since it counts itself as a reference Co-authored-by: Peter Zhu <peter@peterzhu.ca>	2023-09-22 09:44:58 -04:00
Alan Wu	1961c5bb76	YJIT: Plug native stack overflow Previously, TestStack#test_machine_stack_size failed pretty consistently on ARM64 macOS, with Rust code and part of the interpreter used for per-instruction fallback (rb_vm_invokeblock() and friends) touching the stack guard page and crashing with SEGV. I've also seen the same test fail on x64 Linux, though with a different symptom.	2023-09-14 17:18:45 -04:00
Maxime Chevalier-Boisvert	30a5b94517	YJIT: implement side chain fallback for setlocal to avoid exiting (#8227 ) * YJIT: implement side chain fallback for setlocal to avoid exiting * Update yjit/src/codegen.rs Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com> --------- Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>	2023-08-17 10:11:17 -04:00
Takashi Kokubun	654b648223	YJIT: Implement checkmatch instruction (#8203 )	2023-08-10 23:08:29 -04:00
Takashi Kokubun	cd8d20cd1f	YJIT: Compile exception handlers (#8171 ) Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>	2023-08-08 16:06:22 -07:00
Koichi Sasada	d68c01fd31	support `rescue` event for TracePoint fix [Feature #19572]	2023-08-01 22:46:17 +09:00
Koichi Sasada	cfd7729ce7	use inline cache for refinements From Ruby 3.0, refined method invocations are slow because resolved methods are not cached by inline cache because of conservertive strategy. However, `using` clears all caches so that it seems safe to cache resolved method entries. This patch caches resolved method entries in inline cache and clear all of inline method caches when `using` is called. fix [Bug #18572] ```ruby # without refinements class C def foo = :C end N = 1_000_000 obj = C.new require 'benchmark' Benchmark.bm{\|x\| x.report{N.times{ obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; }} } _END__ user system total real master 0.362859 0.002544 0.365403 ( 0.365424) modified 0.357251 0.000000 0.357251 ( 0.357258) ``` ```ruby # with refinment but without using class C def foo = :C end module R refine C do def foo = :R end end N = 1_000_000 obj = C.new require 'benchmark' Benchmark.bm{\|x\| x.report{N.times{ obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; }} } __END__ user system total real master 0.957182 0.000000 0.957182 ( 0.957212) modified 0.359228 0.000000 0.359228 ( 0.359238) ``` ```ruby # with using class C def foo = :C end module R refine C do def foo = :R end end N = 1_000_000 using R obj = C.new require 'benchmark' Benchmark.bm{\|x\| x.report{N.times{ obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; obj.foo; }} }	2023-07-31 17:13:43 +09:00
Koichi Sasada	280419d0e0	`calling->cd` instead of `calling->ci` `struct rb_calling_info::cd` is introduced and `rb_calling_info::ci` is replaced with it to manipulate the inline cache of iseq while method invocation process. So that `ci` can be acessed with `calling->cd->ci`. It adds one indirection but it can be justified by the following points: 1) `vm_search_method_fastpath()` doesn't need `ci` and also `vm_call_iseq_setup_normal()` doesn't need `ci`. It means reducing `cd->ci` access in `vm_sendish()` can make it faster. 2) most of method types need to access `ci` once in theory so that 1 additional indirection doesn't matter.	2023-07-31 17:13:43 +09:00
Koichi Sasada	36023d5cb7	mark `cc->cme_` if it is for `super` `vm_search_super_method()` makes orphan CCs (they are not connected from ccs) and `cc->cme_` can be collected before without marking.	2023-07-31 14:04:31 +09:00
ywenc	8ca399d640	Implement `opt_aref_with` instruction (#8118 ) Implement gen_opt_aref_with Vm opt_aref_with is available Test opt_aref_with Stats for opt_aref_with Co-authored-by: jhawthorn <jhawthorn@github.com>	2023-07-26 10:38:59 -04:00
Takashi Kokubun	cef60e93e6	YJIT: Fallback send instructions to vm_sendish (#8106 )	2023-07-24 13:51:46 -07:00
Alan Wu	f302e725e1	Remove __bp__ and speed-up bmethod calls (#8060 ) Remove rb_control_frame_t::__bp__ and optimize bmethod calls This commit removes the __bp__ field from rb_control_frame_t. It was introduced to help MJIT, but since MJIT was replaced by RJIT, we can use vm_base_ptr() to compute it from the SP of the previous control frame instead. Removing the field avoids needing to set it up when pushing new frames. Simply removing __bp__ would cause crashes since RJIT and YJIT used a slightly different stack layout for bmethod calls than the interpreter. At the moment of the call, the two layouts looked as follows: ┌────────────┐ ┌────────────┐ │ frame_base │ │ frame_base │ ├────────────┤ ├────────────┤ │ ... │ │ ... │ ├────────────┤ ├────────────┤ │ args │ │ args │ ├────────────┤ └────────────┘<─prev_frame_sp │ receiver │ prev_frame_sp─>└────────────┘ RJIT & YJIT interpreter Essentially, vm_base_ptr() needs to compute the address to frame_base given prev_frame_sp in the diagrams. The presence of the receiver created an off-by-one situation. Make the interpreter use the layout the JITs use for iseq-to-iseq bmethod calls. Doing so removes unnecessary argument shifting and vm_exec_core() re-entry from the interpreter, yielding a speed improvement visible through `benchmark/vm_defined_method.yml`: patched: 7578743.1 i/s master: 4796596.3 i/s - 1.58x slower C-to-iseq bmethod calls now store one more VALUE than before, but that should have negligible impact on overall performance. Note that re-entering vm_exec_core() used to be necessary for firing TracePoint events, but that's no longer the case since `9121e57a5f`. Closes ruby/ruby#6428	2023-07-17 13:57:58 -04:00
Peter Zhu	3223181284	Remove RARRAY_CONST_PTR_TRANSIENT RARRAY_CONST_PTR now does the same things as RARRAY_CONST_PTR_TRANSIENT.	2023-07-13 14:48:14 -04:00
Nobuyoshi Nakada	913e01e80e	Stop allocating unused backref strings at `defined?`	2023-06-27 23:14:10 +09:00
eileencodes	135a5eb716	Add missing write barrier We were missing the write barrier for class_value to cref. This should fix the segv we were seeing in http://ci.rvm.jp/logfiles/brlog.trunk-gc-asserts.20230601-165052 Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>	2023-06-05 11:11:12 -07:00
eileencodes	40f090f433	Revert "Revert "Fix cvar caching when class is cloned"" This reverts commit `10621f7cb9`. This was reverted because the gc integrity build started failing. We have figured out a fix so I'm reopening the PR. Original commit message: Fix cvar caching when class is cloned The class variable cache that was added in ruby#4544 changed the behavior of class variables on cloned classes. As reported when a class is cloned AND a class variable was set, and the class variable was read from the original class, reading a class variable from the cloned class would return the value from the original class. This was happening because the IC (inline cache) is stored on the ISEQ which is shared between the original and cloned class, therefore they share the cache too. To fix this we are now storing the `cref` in the cache so that we can check if it's equal to the current `cref`. If it's different we don't want to read from the cache. If it's the same we do. Cloned classes don't share the same cref with their original class. This will need to be backported to 3.1 in addition to 3.2 since the bug exists in both versions. We also added a marking function which was missing. Fixes [Bug #19379] Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>	2023-06-05 11:11:12 -07:00
Aaron Patterson	10621f7cb9	Revert "Fix cvar caching when class is cloned" This reverts commit `77d1b08247`.	2023-06-01 14:55:36 -07:00
eileencodes	77d1b08247	Fix cvar caching when class is cloned The class variable cache that was added in https://github.com/ruby/ruby/pull/4544 changed the behavior of class variables on cloned classes. As reported when a class is cloned AND a class variable was set, and the class variable was read from the original class, reading a class variable from the cloned class would return the value from the original class. This was happening because the IC (inline cache) is stored on the ISEQ which is shared between the original and cloned class, therefore they share the cache too. To fix this we are now storing the `cref` in the cache so that we can check if it's equal to the current `cref`. If it's different we don't want to read from the cache. If it's the same we do. Cloned classes don't share the same cref with their original class. This will need to be backported to 3.1 in addition to 3.2 since the bug exists in both versions. We also added a marking function which was missing. Fixes [Bug #19379] Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>	2023-06-01 08:52:48 -07:00
Nobuyoshi Nakada	8d242a33af	`rb_bug` prints a newline after the message	2023-05-20 21:43:30 +09:00
Samuel Williams	2df5a697e2	Add Fiber#kill, similar to Thread#kill. (#7823 )	2023-05-18 23:33:42 +09:00
Gary Tou	96a2e27185	`klass == (VALUE)NULL` --> `!klass` Co-authored-by: Rafael Mendonça França <rafael@franca.dev>	2023-04-26 10:21:20 -04:00
Gary Tou	1883dc5bde	defined zsuper: Handle NULL superclass for `BasicObject` Prior to this commit, a segmentation fault occurred in `vm_defined`'s `zsuper` implementation after NULL is returned as `BasicObject`'s superclass. This fix returns false from `vm_defined` if the superclass is NULL. For example, the following code resulted in a segfault. ```ruby class BasicObject def seg_fault defined?(super) end end seg_fault ```	2023-04-26 10:21:20 -04:00
Jeremy Evans	a82a24ed57	Optimize method_missing calls CALLER_ARG_SPLAT is not necessary for method_missing. We just need to unshift the method name into the arguments. This optimizes all method_missing calls: * mm(recv) ~9% * mm(recv, args) ~215% for args.length == 200 mm(recv, args, kw) ~55% for args.length == 200 mm(recv, *kw) ~22% mm(recv, kw: 1) ~100% Note that empty argument splats do get slower with this approach, by about 30-40%. Other than non-empty argument splats, other argument splats are faster, with the speedup depending on the number of arguments.	2023-04-25 08:06:16 -07:00
Jeremy Evans	583e9d24d4	Optimize symproc calls Similar to the bmethod/send optimization, this avoids using CALLER_ARG_SPLAT if not necessary. As long as the receiver argument can be shifted off, other arguments are passed through as-is. This optimizes the following types of calls: * symproc.(recv) ~5% * symproc.(recv, args) ~65% for args.length == 200 symproc.(recv, args, kw) ~45% for args.length == 200 symproc.(recv, *kw) ~30% symproc.(recv, kw: 1) ~100% Note that empty argument splats do get slower with this approach, by about 2-3%. This is probably because iseq argument setup is slower for empty argument splats than CALLER_SETUP_ARG is. Other than non-empty argument splats, other argument splats are faster, with the speedup depending on the number of arguments. The following types of calls are not optimized: * symproc.(args) symproc.(args, *kw) This is because the you cannot shift the receiver argument off without first splatting the arg.	2023-04-25 08:06:16 -07:00
Jeremy Evans	9b4bf02aa8	Optimize send calls Similar to the bmethod optimization, this avoids using CALLER_ARG_SPLAT if not necessary. As long as the method argument can be shifted off, other arguments are passed through as-is. This optimizes the following types of calls: * send(meth, arg) ~5% * send(meth, args) ~75% for args.length == 200 send(meth, args, kw) ~50% for args.length == 200 send(meth, *kw) ~25% send(meth, kw: 1) ~115% Note that empty argument splats do get slower with this approach, by about 20%. This is probably because iseq argument setup is slower for empty argument splats than CALLER_SETUP_ARG is. Other than non-empty argument splats, other argument splats are faster, with the speedup depending on the number of arguments. The following types of calls are not optimized: * send(args) send(args, *kw) This is because the you cannot shift the method argument off without first splatting the arg.	2023-04-25 08:06:16 -07:00
Jeremy Evans	af2da6419a	Optimize cfunc calls for f(a) and f(a, *kw) if kw is empty This optimizes the following calls: ~10-15% for f(a) when a does not end with a flagged keywords hash ~10-15% for f(a) when a ends with an empty flagged keywords hash ~35-40% for f(a, *kw) if kw is empty This still copies the array contents to the VM stack, but avoids some overhead. It would be faster to use the array pointer directly, but that could cause problems if the array was modified during the call to the function. You could do that optimization for frozen arrays, but as splatting frozen arrays is uncommon, and the speedup is minimal (<5%), it doesn't seem worth it. The vm_send_cfunc benchmark has been updated to test additional cfunc call types, and the numbers above were taken from the benchmark results.	2023-04-25 08:06:16 -07:00
Jeremy Evans	f6254f77f7	Speed up calling iseq bmethods Currently, bmethod arguments are copied from the VM stack to the C stack in vm_call_bmethod, then copied from the C stack to the VM stack later in invoke_iseq_block_from_c. This is inefficient. This adds vm_call_iseq_bmethod and vm_call_noniseq_bmethod. vm_call_iseq_bmethod is an optimized method that skips stack copies (though there is one copy to remove the receiver from the stack), and avoids calling vm_call_bmethod_body, rb_vm_invoke_bmethod, invoke_block_from_c_proc, invoke_iseq_block_from_c, and vm_yield_setup_args. Th vm_call_iseq_bmethod argument handling is similar to the way normal iseq methods are called, and allows for similar performance optimizations when using splats or keywords. However, even in the no argument case it's still significantly faster. A benchmark is added for bmethod calling. In my environment, it improves bmethod calling performance by 38-59% for simple bmethod calls, and up to 180% for bmethod calls passing literal keywords on both sides. ``` ./miniruby-iseq-bmethod: 18159792.6 i/s ./miniruby-m: 13174419.1 i/s - 1.38x slower bmethod_simple_1 ./miniruby-iseq-bmethod: 15890745.4 i/s ./miniruby-m: 10008972.7 i/s - 1.59x slower bmethod_simple_0_splat ./miniruby-iseq-bmethod: 13142804.3 i/s ./miniruby-m: 11168595.2 i/s - 1.18x slower bmethod_simple_1_splat ./miniruby-iseq-bmethod: 12375791.0 i/s ./miniruby-m: 8491140.1 i/s - 1.46x slower bmethod_no_splat ./miniruby-iseq-bmethod: 10151258.8 i/s ./miniruby-m: 8716664.1 i/s - 1.16x slower bmethod_0_splat ./miniruby-iseq-bmethod: 8138802.5 i/s ./miniruby-m: 7515600.2 i/s - 1.08x slower bmethod_1_splat ./miniruby-iseq-bmethod: 8028372.7 i/s ./miniruby-m: 5947658.6 i/s - 1.35x slower bmethod_10_splat ./miniruby-iseq-bmethod: 6953514.1 i/s ./miniruby-m: 4840132.9 i/s - 1.44x slower bmethod_100_splat ./miniruby-iseq-bmethod: 5287288.4 i/s ./miniruby-m: 2243218.4 i/s - 2.36x slower bmethod_kw ./miniruby-iseq-bmethod: 8931358.2 i/s ./miniruby-m: 3185818.6 i/s - 2.80x slower bmethod_no_kw ./miniruby-iseq-bmethod: 12281287.4 i/s ./miniruby-m: 10041727.9 i/s - 1.22x slower bmethod_kw_splat ./miniruby-iseq-bmethod: 5618956.8 i/s ./miniruby-m: 3657549.5 i/s - 1.54x slower ```	2023-04-25 08:06:16 -07:00
Jeremy Evans	99c6d19e50	Generalize cfunc large array splat fix to fix many additional cases raising SystemStackError Originally, when `2e7bceb34e` fixed cfuncs to no longer use the VM stack for large array splats, it was thought to have fully fixed Bug #4040, since the issue was fixed for methods defined in Ruby (iseqs) back in Ruby 2.2. After additional research, I determined that same issue affects almost all types of method calls, not just iseq and cfunc calls. There were two main types of remaining issues, important cases (where large array splat should work) and pedantic cases (where large array splat raised SystemStackError instead of ArgumentError). Important cases: ```ruby define_method(:a){\|a\|} a(1380888.times) def b(a); end send(:b, 1380888.times) :b.to_proc.call(self, 1380888.times) def d; yield(1380888.times) end d(&method(:b)) def self.method_missing(a); end not_a_method(1380888.times) ``` Pedantic cases: ```ruby def a; end a(1380888.times) def b(_); end b(1380888.times) def c(_=nil); end c(1380888.times) c = Class.new do attr_accessor :a alias b a= end.new c.a(1380888.times) c.b(1380888.times) c = Struct.new(:a) do alias b a= end.new c.a(1380888.times) c.b(*1380888.times) ``` This patch fixes all usage of CALLER_SETUP_ARG with splatting a large number of arguments, and required similar fixes to use a temporary hidden array in three other cases where the VM would use the VM stack for handling a large number of arguments. However, it is possible there may be additional cases where splatting a large number of arguments still causes a SystemStackError. This has a measurable performance impact, as it requires additional checks for a large number of arguments in many additional cases. This change is fairly invasive, as there were many different VM functions that needed to be modified to support this. To avoid too much API change, I modified struct rb_calling_info to add a heap_argv member for storing the array, so I would not have to thread it through many functions. This struct is always stack allocated, which helps ensure sure GC doesn't collect it early. Because of how invasive the changes are, and how rarely large arrays are actually splatted in Ruby code, the existing test/spec suites are not great at testing for correct behavior. To try to find and fix all issues, I tested this in CI with VM_ARGC_STACK_MAX to -1, ensuring that a temporary array is used for all array splat method calls. This was very helpful in finding breaking cases, especially ones involving flagged keyword hashes. Fixes [Bug #4040] Co-authored-by: Jimmy Miller <jimmy.miller@shopify.com>	2023-04-25 08:06:16 -07:00
Aaron Patterson	b816ea8772	Implement opt_newarray_send in YJIT This commit implements opt_newarray_send along with min / max / hash for stack allocated arrays	2023-04-18 17:16:22 -07:00
Aaron Patterson	c5fc1ce975	Emit special instruction for array literal + .(hash\|min\|max) This commit introduces a new instruction `opt_newarray_send` which is used when there is an array literal followed by either the `hash`, `min`, or `max` method. ``` [a, b, c].hash ``` Will emit an `opt_newarray_send` instruction. This instruction falls back to a method call if the "interested" method has been monkey patched. Here are some examples of the instructions generated: ``` $ ./miniruby --dump=insns -e '[@a, @b].max' == disasm: #<ISeq:<main>@-e:1 (1,0)-(1,12)> (catch: FALSE) 0000 getinstancevariable :@a, <is:0> ( 1)[Li] 0003 getinstancevariable :@b, <is:1> 0006 opt_newarray_send 2, :max 0009 leave $ ./miniruby --dump=insns -e '[@a, @b].min' == disasm: #<ISeq:<main>@-e:1 (1,0)-(1,12)> (catch: FALSE) 0000 getinstancevariable :@a, <is:0> ( 1)[Li] 0003 getinstancevariable :@b, <is:1> 0006 opt_newarray_send 2, :min 0009 leave $ ./miniruby --dump=insns -e '[@a, @b].hash' == disasm: #<ISeq:<main>@-e:1 (1,0)-(1,13)> (catch: FALSE) 0000 getinstancevariable :@a, <is:0> ( 1)[Li] 0003 getinstancevariable :@b, <is:1> 0006 opt_newarray_send 2, :hash 0009 leave ``` [Feature #18897] [ruby-core:109147] Co-authored-by: John Hawthorn <jhawthorn@github.com>	2023-04-18 17:16:22 -07:00
Peter Zhu	ad3d4e87d7	Move RCLASS_CLONED to rb_classext_struct This commit moves RCLASS_CLONED from the flags to the rb_classext_struct. This frees the FL_USER1 bit.	2023-04-16 11:06:31 -04:00
Koichi Sasada	c9fd81b860	`vm_call_single_noarg_inline_builtin` If the iseq only contains `opt_invokebuiltin_delegate_leave` insn and the builtin-function (bf) is inline-able, the caller doesn't need to build a method frame. `vm_call_single_noarg_inline_builtin` is fast path for such cases.	2023-03-23 14:03:12 +09:00

1 2 3 4 5 ...

1158 Коммитов