github/ruby - ruby

Граф коммитов

Автор	SHA1	Сообщение	Дата
Koichi Sasada	e9d7478ded	relax unused block warning for duck typing if a method `foo` uses a block, other (unrelated) method `foo` can receives a block. So try to relax the unused block warning condition. ```ruby class C0 def f = yield end class C1 < C0 def f = nil end [C0, C1].f{ block } # do not warn ```	2024-04-17 20:26:49 +09:00
Matt Valentine-House	065710c0f5	Initialize external GC Library Co-Authored-By: Peter Zhu <peter@peterzhu.ca>	2024-04-15 19:50:47 +01:00
HASUMI Hitoshi	9b1e97b211	[Universal parser] DeVALUE of p->debug_lines and ast->body.script_lines This patch is part of universal parser work. ## Summary - Decouple VALUE from members below: - `(struct parser_params )->debug_lines` - `(rb_ast_t )->body.script_lines` - Instead, they are now `rb_parser_ary_t ` - They can also be a `(VALUE)FIXNUM` as before to hold line count - `ISEQ_BODY(iseq)->variable.script_lines` remains VALUE - In order to do this, - Add `VALUE script_lines` param to `rb_iseq_new_with_opt()` - Introduce `rb_parser_build_script_lines_from()` to convert `rb_parser_ary_t ` into `VALUE` ## Other details - Extend `rb_parser_ary_t `. It previously could only store `rb_parser_ast_token `, now can store script_lines, too - Change tactics of building the top-level `SCRIPT_LINES__` in `yycompile0()` - Before: While parsing, each line of the script is added to `SCRIPT_LINES__[path]` - After: After `yyparse(p)`, `SCRIPT_LINES__[path]` will be built from `p->debug_lines` - Remove the second parameter of `rb_parser_set_script_lines()` to make it simple - Introduce `script_lines_free()` to be called from `rb_ast_free()` because the GC no longer takes care of the script_lines - Introduce `rb_parser_string_deep_copy()` in parse.y to maintain script_lines when `rb_ruby_parser_free()` called - With regard to this, please see Future tasks below ## Future tasks - Decouple IMEMO from `rb_ast_t *` - This lifts the five-members-restriction of Ruby object, - So we will be able to move the ownership of the `lex.string_buffer` from parser to AST - Then we remove `rb_parser_string_deep_copy()` to make the whole thing simple	2024-04-15 20:51:54 +09:00
Koichi Sasada	9180e33ca3	show warning for unused block With verbopse mode (-w), the interpreter shows a warning if a block is passed to a method which does not use the given block. Warning on: * the invoked method is written in C * the invoked method is not `initialize` * not invoked with `super` * the first time on the call-site with the invoked method (`obj.foo{}` will be warned once if `foo` is same method) [Feature #15554] `Primitive.attr! :use_block` is introduced to declare that primitive functions (written in C) will use passed block. For minitest, test needs some tweak, so use `ea9caafc07` for `test-bundled-gems`.	2024-04-15 12:08:07 +09:00
KJ Tsanaktsidis	2535a09e85	Check ASAN fake stacks when marking non-current threads Currently, we check the values on the machine stack & register state to see if they're actually a pointer to an ASAN fake stack, and mark the values on the fake stack too if required. However, we are only doing that for the _current_ thread (the one actually running the GC), not for any other thread in the program. Make rb_gc_mark_machine_context (which is called for marking non-current threads) perform the same ASAN fake stack handling that mark_current_machine_context performs. [Bug #20310]	2024-03-25 14:57:04 +11:00
KJ Tsanaktsidis	48d3bdddba	Move asan_fake_stack_handle to EC, not thread It's really a property of the EC; each fiber (which has its own EC) also has its own asan_fake_stack_handle. [Bug #20310]	2024-03-25 14:57:04 +11:00
Peter Zhu	c2170e5c2b	Fix typo from gloabl_object_list to global_object_list	2024-03-14 13:52:20 -04:00
Peter Zhu	4559a161af	Move gloabl_object_list from objspace to VM This is to be consistent with the mark_object_ary that is in the VM.	2024-03-14 13:29:59 -04:00
Jean Boussier	d4f3dcf4df	Refactor VM root modules This `st_table` is used to both mark and pin classes defined from the C API. But `vm->mark_object_ary` already does both much more efficiently. Currently a Ruby process starts with 252 rooted classes, which uses `7224B` in an `st_table` or `2016B` in an `RArray`. So a baseline of 5kB saved, but since `mark_object_ary` is preallocated with `1024` slots but only use `405` of them, it's a net `7kB` save. `vm->mark_object_ary` is also being refactored. Prior to this changes, `mark_object_ary` was a regular `RArray`, but since this allows for references to be moved, it was marked a second time from `rb_vm_mark()` to pin these objects. This has the detrimental effect of marking these references on every minors even though it's a mostly append only list. But using a custom TypedData we can save from having to mark all the references on minor GC runs. Addtionally, immediate values are now ignored and not appended to `vm->mark_object_ary` as it's just wasted space.	2024-03-06 15:33:43 -05:00
Kevin Newton	0f1ca9492c	[PRISM] Provide runtime flag for prism in iseq	2024-02-21 11:44:40 -05:00
Peter Zhu	330830dd1a	Add IMEMO_NEW Rather than exposing that an imemo has a flag and four fields, this changes the implementation to only expose one field (the klass) and fills the rest with 0. The type will have to fill in the values themselves.	2024-02-21 11:33:05 -05:00
John Hawthorn	1c97abaaba	De-dup identical callinfo objects Previously every call to vm_ci_new (when the CI was not packable) would result in a different callinfo being returned this meant that every kwarg callsite had its own CI. When calling, different CIs result in different CCs. These CIs and CCs both end up persisted on the T_CLASS inside cc_tbl. So in an eval loop this resulted in a memory leak of both types of object. This also likely resulted in extra memory used, and extra time searching, in non-eval cases. For simplicity in this commit I always allocate a CI object inside rb_vm_ci_lookup, but ideally we would lazily allocate it only when needed. I hope to do that as a follow up in the future.	2024-02-20 18:55:00 -08:00
Nobuyoshi Nakada	f3cc1f9a70	Show actual imemo type when unexpected type	2024-02-08 18:08:42 +09:00
Jeremy Evans	0f90a24a81	Introduce Allocationless Anonymous Splat Forwarding Ruby makes it easy to delegate all arguments from one method to another: ```ruby def f(args, kw) g(args, *kw) end ``` Unfortunately, this indirection decreases performance. One reason it decreases performance is that this allocates an array and a hash per call to `f`, even if `args` and `kw` are not modified. Due to Ruby's ability to modify almost anything at runtime, it's difficult to avoid the array allocation in the general case. For example, it's not safe to avoid the allocation in a case like this: ```ruby def f(args, *kw) foo(bar) g(args, *kw) end ``` Because `foo` may be `eval` and `bar` may be a string referencing `args` or `kw`. To fix this correctly, you need to perform something similar to escape analysis on the variables. However, there is a case where you can avoid the allocation without doing escape analysis, and that is when the splat variables are anonymous: ```ruby def f(, *) g(, *) end ``` When splat variables are anonymous, it is not possible to reference them directly, it is only possible to use them as splats to other methods. Since that is the case, if `f` is called with a regular splat and a keyword splat, it can pass the arguments directly to `g` without copying them, avoiding allocation. For example: ```ruby def g(a, b:) a + b end def f(, *) g(, *) end a = [1] kw = {b: 2} f(a, **kw) ``` I call this technique: Allocationless Anonymous Splat Forwarding. This is implemented using a couple additional iseq param flags, anon_rest and anon_kwrest. If anon_rest is set, and an array splat is passed when calling the method when the array splat can be used without modification, `setup_parameters_complex` does not duplicate it. Similarly, if anon_kwest is set, and a keyword splat is passed when calling the method, `setup_parameters_complex` does not duplicate it.	2024-01-24 18:25:55 -08:00
Takashi Kokubun	27c1dd8634	YJIT: Allow inlining ISEQ calls with a block (#9622 ) * YJIT: Allow inlining ISEQ calls with a block * Leave a TODO comment about u16 inline_block	2024-01-23 19:36:23 +00:00
Nobuyoshi Nakada	127b19ab56	Use line numbers as builtin-index The order of iseq may differ from the order of tokens, typically `while`/`until` conditions are put after the body. These orders can match by using line numbers as builtin-indexes, but at the same time, it introduces the restriction that multiple `cexpr!` and `cstmt!` cannot appear in the same line. Another possible idea is to use `RubyVM::AbstractSyntaxTree` and `node_id` instead of ripper, with making BASERUBY 3.1 or later.	2024-01-22 19:39:34 +09:00
KJ Tsanaktsidis	61da90c1b8	Mark asan fake stacks during machine stack marking ASAN leaves a pointer to the fake frame on the stack; we can use the __asan_addr_is_in_fake_stack API to work out the extent of the fake stack and thus mark any VALUEs contained therein. [Bug #20001]	2024-01-19 09:55:12 +11:00
KJ Tsanaktsidis	807714447e	Pass down "stack start" variables from closer to the top of the stack This commit changes how stack extents are calculated for both the main thread and other threads. Ruby uses the address of a local variable as part of the calculation for machine stack extents: * pthreads uses it as a lower-bound on the start of the stack, because glibc (and maybe other libcs) can store its own data on the stack before calling into user code on thread creation. * win32 uses it as an argument to VirtualQuery, which gets the extent of the memory mapping which contains the variable However, the local being used for this is actually too low (too close to the leaf function call) in both the main thread case and the new thread case. In the main thread case, we have the `INIT_STACK` macro, which is used for pthreads to set the `native_main_thread->stack_start` value. This value is correctly captured at the very top level of the program (in main.c). However, this is _not_ what's used to set the execution context machine stack (`th->ec->machine_stack.stack_start`); that gets set as part of a call to `ruby_thread_init_stack` in `Init_BareVM`, using the address of a local variable allocated _inside_ `Init_BareVM`. This is too low; we need to use a local allocated closer to the top of the program. In the new thread case, the lolcal is allocated inside `native_thread_init_stack`, which is, again, too low. In both cases, this means that we might have VALUEs lying outside the bounds of `th->ec->machine.stack_{start,end}`, which won't be marked correctly by the GC machinery. To fix this, * In the main thread case: We already have `INIT_STACK` at the right level, so just pass that local var to `ruby_thread_init_stack`. * In the new thread case: Allocate the local one level above the call to `native_thread_init_stack` in `call_thread_start_func2`. [Bug #20001] fix	2024-01-19 09:55:12 +11:00
Takashi Kokubun	8642a573e6	Rename BUILTIN_ATTR_SINGLE_NOARG_INLINE to BUILTIN_ATTR_SINGLE_NOARG_LEAF The attribute was created when the other attribute was called BUILTIN_ATTR_INLINE. Now that the original attribute is renamed to BUILTIN_ATTR_LEAF, it's only confusing that we call it "_INLINE".	2024-01-16 17:31:27 -08:00
Takashi Kokubun	e37a37e696	Drop obsoleted BUILTIN_ATTR_NO_GC attribute The thing that has used this in the past was very buggy, and we've never revisied it. Let's remove it until we need it again.	2024-01-16 17:27:53 -08:00
KJ Tsanaktsidis	396e94666b	Revert "Pass down "stack start" variables from closer to the top of the stack" This reverts commit `4ba8f0dc99`.	2024-01-12 17:58:54 +11:00
KJ Tsanaktsidis	688a6ff510	Revert "Mark asan fake stacks during machine stack marking" This reverts commit `d10bc3a2b8`.	2024-01-12 17:58:54 +11:00
KJ Tsanaktsidis	d10bc3a2b8	Mark asan fake stacks during machine stack marking ASAN leaves a pointer to the fake frame on the stack; we can use the __asan_addr_is_in_fake_stack API to work out the extent of the fake stack and thus mark any VALUEs contained therein. [Bug #20001]	2024-01-12 17:29:48 +11:00
KJ Tsanaktsidis	4ba8f0dc99	Pass down "stack start" variables from closer to the top of the stack The implementation of `native_thread_init_stack` for the various threading models can use the address of a local variable as part of the calculation of the machine stack extents: * pthreads uses it as a lower-bound on the start of the stack, because glibc (and maybe other libcs) can store its own data on the stack before calling into user code on thread creation. * win32 uses it as an argument to VirtualQuery, which gets the extent of the memory mapping which contains the variable However, the local being used for this is actually allocated _inside_ the `native_thread_init_stack` frame; that means the caller might allocate a VALUE on the stack that actually lies outside the bounds stored in machine.stack_{start,end}. A local variable from one level above the topmost frame that stores VALUEs on the stack must be drilled down into the call to `native_thread_init_stack` to be used in the calculation. This probably doesn't _really_ matter for the win32 case (they'll be in the same memory mapping so VirtualQuery should return the same thing), but definitely could matter for the pthreads case. [Bug #20001]	2024-01-12 17:29:48 +11:00
Koichi Sasada	d65d2fb6b5	Do not `poll` first Before this patch, the MN scheduler waits for the IO with the following steps: 1. `poll(fd, timeout=0)` to check fd is ready or not. 2. if fd is not ready, waits with MN thread scheduler 3. call `func` to issue the blocking I/O call The advantage of advanced `poll()` is we can wait for the IO ready for any fds. However `poll()` becomes overhead for already ready fds. This patch changes the steps like: 1. call `func` to issue the blocking I/O call 2. if the `func` returns `EWOULDBLOCK` the fd is `O_NONBLOCK` and we need to wait for fd is ready so that waits with MN thread scheduler. In this case, we can wait only for `O_NONBLOCK` fds. Otherwise it waits with blocking operations such as `read()` system call. However we don't need to call `poll()` to check fd is ready in advance. With this patch we can observe performance improvement on microbenchmark which repeats blocking I/O (not `O_NONBLOCK` fd) with and without MN thread scheduler. ```ruby require 'benchmark' f = open('/dev/null', 'w') f.sync = true TN = 1 N = 1_000_000 / TN Benchmark.bm{\|x\| x.report{ TN.times.map{ Thread.new{ N.times{f.print '.'} } }.each(&:join) } } __END__ TN = 1 user system total real ruby32 0.393966 0.101122 0.495088 ( 0.495235) ruby33 0.493963 0.089521 0.583484 ( 0.584091) ruby33+MN 0.639333 0.200843 0.840176 ( 0.840291) <- Slow this+MN 0.512231 0.099091 0.611322 ( 0.611074) <- Good ```	2024-01-05 05:51:25 +09:00
KJ Tsanaktsidis	f8effa209a	Change the semantics of rb_postponed_job_register Our current implementation of rb_postponed_job_register suffers from some safety issues that can lead to interpreter crashes (see bug #1991). Essentially, the issue is that jobs can be called with the wrong arguments. We made two attempts to fix this whilst keeping the promised semantics, but: * The first one involved masking/unmasking when flushing jobs, which was believed to be too expensive * The second one involved a lock-free, multi-producer, single-consumer ringbuffer, which was too complex The critical insight behind this third solution is that essentially the only user of these APIs are a) internal, or b) profiling gems. For a), none of the usages actually require variable data; they will work just fine with the preregistration interface. For b), generally profiling gems only call a single callback with a single piece of data (which is actually usually just zero) for the life of the program. The ringbuffer is complex because it needs to support multi-word inserts of job & data (which can't be atomic); but nobody actually even needs that functionality, really. So, this comit: * Introduces a pre-registration API for jobs, with a GVL-requiring rb_postponed_job_prereigster, which returns a handle which can be used with an async-signal-safe rb_postponed_job_trigger. * Deprecates rb_postponed_job_register (and re-implements it on top of the preregister function for compatability) * Moves all the internal usages of postponed job register pre-registration	2023-12-10 15:00:37 +09:00
Koichi Sasada	352a885a0f	Thread specific storage APIs This patch introduces thread specific storage APIs for tools which use `rb_internal_thread_event_hook` APIs. * `rb_internal_thread_specific_key_create()` to create a tool specific thread local storage key and allocate the storage if not available. * `rb_internal_thread_specific_set()` sets a data to thread and tool specific storage. * `rb_internal_thread_specific_get()` gets a data in thread and tool specific storage. Note that `rb_internal_thread_specific_get\|set(thread_val, key)` can be called without GVL and safe for async signal and safe for multi-threading (native threads). So you can call it in any internal thread event hooks. Further more you can call it from other native threads. Of course `thread_val` should be living while accessing the data from this function. Note that you should not forget to clean up the set data.	2023-12-08 13:16:19 +09:00
Yuta Saito	295d648f76	[wasm] Use xmalloc/xfree for jmpbuf allocation to trigger GC properly `rb_vm_tag_jmpbuf_{init,deinit}` are safe to raise exception since the given tag is not yet pushed to `ec->tag` or already popped from it at the time, so `ec->tag` is always valid and it's safe to raise exception when xmalloc fails.	2023-11-23 02:18:53 +09:00
Nobuyoshi Nakada	8f1ec6e171	Adjust spaces [ci skip]	2023-11-15 19:05:10 +09:00
Yuta Saito	50a5b76dec	[wasm] allocate Asyncify setjmp buffer in heap `rb_jmpbuf_t` type is considerably large due to inline-allocated Asyncify buffer, and it leads to stack overflow even with small number of C-method call frames. This commit allocates the Asyncify buffer used by `rb_wasm_setjmp` in heap to mitigate the issue. This patch introduces a new type `rb_vm_tag_jmpbuf_t` to abstract the representation of a jump buffer, and init/deinit hook points to manage lifetime of the buffer. These changes are effectively NFC for non-wasm platforms.	2023-11-13 19:17:16 +09:00
Maxime Chevalier-Boisvert	b2e1ddffa5	YJIT: port call threshold logic from Rust to C for performance (#8628 ) * Port call threshold logic from Rust to C for performance * Prefix global/field names with yjit_ * Fix linker error * Fix preprocessor condition for rb_yjit_threshold_hit * Fix third linker issue * Exclude yjit_calls_at_interv from RJIT bindgen --------- Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>	2023-10-12 10:05:34 -04:00
Koichi Sasada	be1bbd5b7d	M:N thread scheduler for Ractors This patch introduce M:N thread scheduler for Ractor system. In general, M:N thread scheduler employs N native threads (OS threads) to manage M user-level threads (Ruby threads in this case). On the Ruby interpreter, 1 native thread is provided for 1 Ractor and all Ruby threads are managed by the native thread. From Ruby 1.9, the interpreter uses 1:1 thread scheduler which means 1 Ruby thread has 1 native thread. M:N scheduler change this strategy. Because of compatibility issue (and stableness issue of the implementation) main Ractor doesn't use M:N scheduler on default. On the other words, threads on the main Ractor will be managed with 1:1 thread scheduler. There are additional settings by environment variables: `RUBY_MN_THREADS=1` enables M:N thread scheduler on the main ractor. Note that non-main ractors use the M:N scheduler without this configuration. With this configuration, single ractor applications run threads on M:1 thread scheduler (green threads, user-level threads). `RUBY_MAX_CPU=n` specifies maximum number of native threads for M:N scheduler (default: 8). This patch will be reverted soon if non-easy issues are found. [Bug #19842]	2023-10-12 14:47:01 +09:00
Nobuyoshi Nakada	85984a53e8	Abort dumping when output failed	2023-09-25 22:57:28 +09:00
Nobuyoshi Nakada	ac244938e8	Dump backtraces to an arbitrary stream	2023-09-25 22:57:28 +09:00
Alan Wu	39ee3e22bd	Make Kernel#lambda raise when given non-literal block Previously, Kernel#lambda returned a non-lambda proc when given a non-literal block and issued a warning under the `:deprecated` category. With this change, Kernel#lambda will always return a lambda proc, if it returns without raising. Due to interactions with block passing optimizations, we previously had two separate code paths for detecting whether Kernel#lambda got a literal block. This change allows us to remove one path, the hack done with rb_control_frame_t::block_code introduced in `85a337f` for supporting situations where Kernel#lambda returned a non-lambda proc. [Feature #19777] Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>	2023-09-12 11:25:07 -04:00
Maxime Chevalier-Boisvert	30a5b94517	YJIT: implement side chain fallback for setlocal to avoid exiting (#8227 ) * YJIT: implement side chain fallback for setlocal to avoid exiting * Update yjit/src/codegen.rs Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com> --------- Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>	2023-08-17 10:11:17 -04:00
Takashi Kokubun	7740526b1c	Reorder bp_check and jit_return in cfp It's the actual cfp[6] in the default build, so it's confusing to say otherwise in the comment.	2023-08-11 17:57:04 -07:00
Takashi Kokubun	cd8d20cd1f	YJIT: Compile exception handlers (#8171 ) Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>	2023-08-08 16:06:22 -07:00
Koichi Sasada	280419d0e0	`calling->cd` instead of `calling->ci` `struct rb_calling_info::cd` is introduced and `rb_calling_info::ci` is replaced with it to manipulate the inline cache of iseq while method invocation process. So that `ci` can be acessed with `calling->cd->ci`. It adds one indirection but it can be justified by the following points: 1) `vm_search_method_fastpath()` doesn't need `ci` and also `vm_call_iseq_setup_normal()` doesn't need `ci`. It means reducing `cd->ci` access in `vm_sendish()` can make it faster. 2) most of method types need to access `ci` once in theory so that 1 additional indirection doesn't matter.	2023-07-31 17:13:43 +09:00
Takashi Kokubun	38be9a9b72	Clean up OPT_STACK_CACHING (#8132 )	2023-07-27 17:27:05 -07:00
Alan Wu	3211b70545	Fix off-by-one in comment [ci skip]	2023-07-18 17:52:04 -04:00
Alan Wu	f302e725e1	Remove __bp__ and speed-up bmethod calls (#8060 ) Remove rb_control_frame_t::__bp__ and optimize bmethod calls This commit removes the __bp__ field from rb_control_frame_t. It was introduced to help MJIT, but since MJIT was replaced by RJIT, we can use vm_base_ptr() to compute it from the SP of the previous control frame instead. Removing the field avoids needing to set it up when pushing new frames. Simply removing __bp__ would cause crashes since RJIT and YJIT used a slightly different stack layout for bmethod calls than the interpreter. At the moment of the call, the two layouts looked as follows: ┌────────────┐ ┌────────────┐ │ frame_base │ │ frame_base │ ├────────────┤ ├────────────┤ │ ... │ │ ... │ ├────────────┤ ├────────────┤ │ args │ │ args │ ├────────────┤ └────────────┘<─prev_frame_sp │ receiver │ prev_frame_sp─>└────────────┘ RJIT & YJIT interpreter Essentially, vm_base_ptr() needs to compute the address to frame_base given prev_frame_sp in the diagrams. The presence of the receiver created an off-by-one situation. Make the interpreter use the layout the JITs use for iseq-to-iseq bmethod calls. Doing so removes unnecessary argument shifting and vm_exec_core() re-entry from the interpreter, yielding a speed improvement visible through `benchmark/vm_defined_method.yml`: patched: 7578743.1 i/s master: 4796596.3 i/s - 1.58x slower C-to-iseq bmethod calls now store one more VALUE than before, but that should have negligible impact on overall performance. Note that re-entering vm_exec_core() used to be necessary for firing TracePoint events, but that's no longer the case since `9121e57a5f`. Closes ruby/ruby#6428	2023-07-17 13:57:58 -04:00
Nobuyoshi Nakada	1a6f3becbb	Fallback `rb_iseq_complete` For compilers that do not eliminate references to functions that are never called, such as SunC.	2023-07-01 15:14:27 +09:00
Nobuyoshi Nakada	0d0841ad4c	Compile code for lazy ISeq loding always	2023-06-30 23:59:05 +09:00
Samuel Williams	ab7bb38aca	Remove explicit SIGCHLD handling. (#7816 ) * Remove unused SIGCHLD handling. * Remove unused `init_sigchld`. * Remove unnecessary `#define RUBY_SIGCHLD (0)`. * Remove unused `SIGCHLD_LOSSY`.	2023-05-15 23:14:51 +09:00
Alan Wu	adaff1fc49	[Bug #19592 ] Fix ext/Setup support After [1], using ext/Setup to link some, but not all extensions failed during linking. I did not know about this option, and had assumed that only `--with-static-linked-ext` builds can include statically linked extensions. Include the support code for statically linked extensions in all configurations like before [1]. Initialize the table lazily to minimize footprint on builds that have no statically linked extensions. [1]: `790cf4b6d0` "Fix autoload status of statically linked extensions"	2023-04-26 15:02:23 -04:00
Jeremy Evans	99c6d19e50	Generalize cfunc large array splat fix to fix many additional cases raising SystemStackError Originally, when `2e7bceb34e` fixed cfuncs to no longer use the VM stack for large array splats, it was thought to have fully fixed Bug #4040, since the issue was fixed for methods defined in Ruby (iseqs) back in Ruby 2.2. After additional research, I determined that same issue affects almost all types of method calls, not just iseq and cfunc calls. There were two main types of remaining issues, important cases (where large array splat should work) and pedantic cases (where large array splat raised SystemStackError instead of ArgumentError). Important cases: ```ruby define_method(:a){\|a\|} a(1380888.times) def b(a); end send(:b, 1380888.times) :b.to_proc.call(self, 1380888.times) def d; yield(1380888.times) end d(&method(:b)) def self.method_missing(a); end not_a_method(1380888.times) ``` Pedantic cases: ```ruby def a; end a(1380888.times) def b(_); end b(1380888.times) def c(_=nil); end c(1380888.times) c = Class.new do attr_accessor :a alias b a= end.new c.a(1380888.times) c.b(1380888.times) c = Struct.new(:a) do alias b a= end.new c.a(1380888.times) c.b(*1380888.times) ``` This patch fixes all usage of CALLER_SETUP_ARG with splatting a large number of arguments, and required similar fixes to use a temporary hidden array in three other cases where the VM would use the VM stack for handling a large number of arguments. However, it is possible there may be additional cases where splatting a large number of arguments still causes a SystemStackError. This has a measurable performance impact, as it requires additional checks for a large number of arguments in many additional cases. This change is fairly invasive, as there were many different VM functions that needed to be modified to support this. To avoid too much API change, I modified struct rb_calling_info to add a heap_argv member for storing the array, so I would not have to thread it through many functions. This struct is always stack allocated, which helps ensure sure GC doesn't collect it early. Because of how invasive the changes are, and how rarely large arrays are actually splatted in Ruby code, the existing test/spec suites are not great at testing for correct behavior. To try to find and fix all issues, I tested this in CI with VM_ARGC_STACK_MAX to -1, ensuring that a temporary array is used for all array splat method calls. This was very helpful in finding breaking cases, especially ones involving flagged keyword hashes. Fixes [Bug #4040] Co-authored-by: Jimmy Miller <jimmy.miller@shopify.com>	2023-04-25 08:06:16 -07:00
Jeremy Evans	1f115f141d	Speed up rebuilding the loaded feature index Rebuilding the loaded feature index slowed down with the bug fix for #17885 in `79a4484a07`. The slowdown was extreme if realpath emulation was used, but even when not emulated, it could be about 10x slower. This adds loaded_features_realpath_map to rb_vm_struct. This is a hidden hash mapping loaded feature paths to realpaths. When rebuilding the loaded feature index, look at this hash to get cached realpath values, and skip calling rb_check_realpath if a cached value is found. Fixes [Bug #19246]	2023-04-13 20:22:36 -07:00
eileencodes	ce99e50ede	Move `catch_except_p` to `compile_data` The `catch_except_p` flag is used for communicating between parent and child iseq's that a throw instruction was emitted. So for example if a child iseq has a throw in it and the parent wants to catch the throw, we use this flag to communicate to the parent iseq that a throw instruction was emitted. This flag is only useful at compile time, it only impacts the compilation process so it seems to be fine to move it from the iseq body to the compile_data struct. Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>	2023-04-11 10:47:58 -07:00
Matt Valentine-House	879cda98a4	Remove dependancy of vm_core.h on shape.h so that now shape can happily include gc.h	2023-04-06 11:07:16 +01:00

1 2 3 4 5 ...

873 Коммитов