[Bug #20650]
The capture group allocates memory that is leaked when it times out.
For example:
re = Regexp.new("^#{"(a*)" * 10_000}x$", timeout: 0.000001)
str = "a" * 1000000 + "x"
10.times do
100.times do
re =~ str
rescue Regexp::TimeoutError
end
puts `ps -o rss= -p #{$$}`
end
Before:
34688
56416
78288
100368
120784
140704
161904
183568
204320
224800
After:
16288
16288
16880
16896
16912
16928
16944
17184
17184
17200
Since 730e3b2ce0
("Stop exposing `rb_str_chilled_p`"), we noticed a speed loss on a few
benchmarks that are string operations heavy. This is partially due to
routines no longer having the options to inline rb_check_frozen_inline()
in non-LTO builds. Make it an inlining candidate again to recover speed.
Testing this patch on my machine, the fannkuchredux benchmark gets a
1.15 speed-up with YJIT and 1.03 without YJIT.
This feature is no longer possible under current design; now that our GC
is pluggable, we cannot assume what was achieved by this compiler flag
is always possble by the dynamically-loaded GC implementation.
This function accepts flags:
RB_NO_KEYWORDS, RB_PASS_KEYWORDS, RB_PASS_CALLED_KEYWORDS:
Works as the same as rb_block_call_kw.
RB_BLOCK_NO_USE_PACKED_ARGS:
The given block ("bl_proc") does not use "yielded_arg" of rb_block_call_func_t.
Instead, the block accesses the yielded arguments via "argc" and "argv".
This flag allows the called method to yield arguments without allocating an Array.
Previously, RBIMPL_ASSERT_TYPE() used Check_Type() only in RUBY_DEBUG
builds. It raised TypeError, but only in debug builds. For people testing
type mismatch using debug builds looking for a Ruby exception, this can
be misleading -- the code could be missing a type check in non-debug builds
if it is relying on for example RSTRING_LEN() to raise.
Also, Check_Type() can obscure the true cause of error in debug mode.
When type check fails because the object is corrupt, instead of crashing
with a clear type assertion message, it can crash while trying to
construct an exception object to raise. You can see this for example in
<https://github.com/ruby/ruby/actions/runs/9489999591/job/26152506434?pr=10985>,
where RB_ENCODING_GET() is used on a corrupt object, but the crash
happens later and says "Assertion Failed:
../src/vm_method.c:1477:callable_method_entry_or_negative".
RBIMPL_ASSERT_TYPE() should assert right away.
RBIMPL_ASSERT_OR_ASSUME() asserts when RUBY_DEBUG and assumes in release
builds, as desired.
This should help investigate flaky CI failures that show up as TypeError
from `Kernel#require`, e.g.
"'Kernel#require': wrong argument type false (expected String) (TypeError)".
Same CI failure examples:
- https://github.com/ruby/ruby/actions/runs/9034787861/job/24828147431
- https://github.com/ruby/ruby/actions/runs/9418303667/job/25945492440
- https://github.com/ruby/ruby/actions/runs/9505650952/job/26201031314
The failure occurs with and without use of YJIT.
[Feature #20205]
Now that chilled strings no longer appear as frozen, there is no
need to offer an API to check for chilled strings.
We however need to change `rb_check_frozen_internal` to no
longer be a macro, as it needs to check for chilled strings.
[Feature #20507]
This was missed from the initial commit.
```
../../.././include/ruby/internal/value_type.h:446:27: error: implicit conversion changes signedness: 'enum ruby_value_type' to 'int' [-Werror,-Wsign-conversion]
rb_unexpected_type(v, t);
~~~~~~~~~~~~~~~~~~ ^
```
They were initially made frozen to avoid false positives for cases such
as:
str = str.dup if str.frozen?
But this may cause bugs and is generally confusing for users.
[Feature #20205]
Co-authored-by: Jean Boussier <byroot@ruby-lang.org>
and declare it will be removed soon.
ddtrace is still referes the API and build was failed.
See https://github.com/DataDog/dd-trace-rb/pull/3578
Maybe threre are only few users of this C-API now so we can remove
it soon.
Some extensions (like stringio) may need to differentiate between
chilled strings and frozen strings.
They can now use rb_str_chilled_p but must check for its presence since
the function will be removed when chilled strings are removed.
[Bug #20389]
[Feature #20205]
Co-authored-by: Jean Boussier <byroot@ruby-lang.org>
[Feature #20205]
As a path toward enabling frozen string literals by default in the future,
this commit introduce "chilled strings". From a user perspective chilled
strings pretend to be frozen, but on the first attempt to mutate them,
they lose their frozen status and emit a warning rather than to raise a
`FrozenError`.
Implementation wise, `rb_compile_option_struct.frozen_string_literal` is
no longer a boolean but a tri-state of `enabled/disabled/unset`.
When code is compiled with frozen string literals neither explictly enabled
or disabled, string literals are compiled with a new `putchilledstring`
instruction. This instruction is identical to `putstring` except it marks
the String with the `STR_CHILLED (FL_USER3)` and `FL_FREEZE` flags.
Chilled strings have the `FL_FREEZE` flag as to minimize the need to check
for chilled strings across the codebase, and to improve compatibility with
C extensions.
Notes:
- `String#freeze`: clears the chilled flag.
- `String#-@`: acts as if the string was mutable.
- `String#+@`: acts as if the string was mutable.
- `String#clone`: copies the chilled flag.
Co-authored-by: Jean Boussier <byroot@ruby-lang.org>
This frees FL_USER0 on both T_MODULE and T_CLASS.
Note: prior to this, FL_SINGLETON was never set on T_MODULE,
so checking for `FL_SINGLETON` without first checking that
`FL_TYPE` was `T_CLASS` was valid. That's no longer the case.
* Introduction of Happy Eyeballs Version 2 (RFC8305) in Socket.tcp
This is an implementation of Happy Eyeballs version 2 (RFC 8305) in Socket.tcp.
[Background]
Currently, `Socket.tcp` synchronously resolves names and makes connection attempts with `Addrinfo::foreach.`
This implementation has the following two problems.
1. In name resolution, the program stops until the DNS server responds to all DNS queries.
2. In a connection attempt, while an IP address is trying to connect to the destination host and is taking time, the program stops, and other resolved IP addresses cannot try to connect.
[Proposal]
"Happy Eyeballs" ([RFC 8305](https://datatracker.ietf.org/doc/html/rfc8305)) is an algorithm to solve this kind of problem. It avoids delays to the user whenever possible and also uses IPv6 preferentially.
I implemented it into `Socket.tcp` by using `Addrinfo.getaddrinfo` in each thread spawned per address family to resolve the hostname asynchronously, and using `Socket::connect_nonblock` to try to connect with multiple addrinfo in parallel.
[Outcome]
This change eliminates a fatal defect in the following cases.
Case 1. One of the A or AAAA DNS queries does not return
---
require 'socket'
class Addrinfo
class << self
# Current Socket.tcp depends on foreach
def foreach(nodename, service, family=nil, socktype=nil, protocol=nil, flags=nil, timeout: nil, &block)
getaddrinfo(nodename, service, Socket::AF_INET6, socktype, protocol, flags, timeout: timeout)
.concat(getaddrinfo(nodename, service, Socket::AF_INET, socktype, protocol, flags, timeout: timeout))
.each(&block)
end
def getaddrinfo(_, _, family, *_)
case family
when Socket::AF_INET6 then sleep
when Socket::AF_INET then [Addrinfo.tcp("127.0.0.1", 4567)]
end
end
end
end
Socket.tcp("localhost", 4567)
---
Because the current `Socket.tcp` cannot resolve IPv6 names, the program stops in this case. It cannot start to connect with IPv4 address.
Though `Socket.tcp` with HEv2 can promptly start a connection attempt with IPv4 address in this case.
Case 2. Server does not promptly return ack for syn of either IPv4 / IPv6 address family
---
require 'socket'
fork do
socket = Socket.new(Socket::AF_INET6, :STREAM)
socket.setsockopt(:SOCKET, :REUSEADDR, true)
socket.bind(Socket.pack_sockaddr_in(4567, '::1'))
sleep
socket.listen(1)
connection, _ = socket.accept
connection.close
socket.close
end
fork do
socket = Socket.new(Socket::AF_INET, :STREAM)
socket.setsockopt(:SOCKET, :REUSEADDR, true)
socket.bind(Socket.pack_sockaddr_in(4567, '127.0.0.1'))
socket.listen(1)
connection, _ = socket.accept
connection.close
socket.close
end
Socket.tcp("localhost", 4567)
---
The current `Socket.tcp` tries to connect serially, so when its first name resolves an IPv6 address and initiates a connection to an IPv6 server, this server does not return an ACK, and the program stops.
Though `Socket.tcp` with HEv2 starts to connect sequentially and in parallel so a connection can be established promptly at the socket that attempted to connect to the IPv4 server.
In exchange, the performance of `Socket.tcp` with HEv2 will be degraded.
---
100.times { Socket.tcp("www.ruby-lang.org", 80) }
---
This is due to the addition of the creation of IO objects, Thread objects, etc., and calls to `IO::select` in the implementation.
* Avoid NameError of Socket::EAI_ADDRFAMILY in MinGW
* Support Windows with SO_CONNECT_TIME
* Improve performance
I have additionally implemented the following patterns:
- If the host is single-stack, name resolution is performed in the main thread. This reduces the cost of creating threads.
- If an IP address is specified, name resolution is performed in the main thread. This also reduces the cost of creating threads.
- If only one IP address is resolved, connect is executed in blocking mode. This reduces the cost of calling IO::select.
Also, I have added a fast_fallback option for users who wish not to use HE.
Here are the results of each performance test.
```ruby
require 'socket'
require 'benchmark'
HOSTNAME = "www.ruby-lang.org"
PORT = 80
ai = Addrinfo.tcp(HOSTNAME, PORT)
Benchmark.bmbm do |x|
x.report("Domain name") do
30.times { Socket.tcp(HOSTNAME, PORT).close }
end
x.report("IP Address") do
30.times { Socket.tcp(ai.ip_address, PORT).close }
end
x.report("fast_fallback: false") do
30.times { Socket.tcp(HOSTNAME, PORT, fast_fallback: false).close }
end
end
```
```
user system total real
Domain name 0.015567 0.032511 0.048078 ( 0.325284)
IP Address 0.004458 0.014219 0.018677 ( 0.284361)
fast_fallback: false 0.005869 0.021511 0.027380 ( 0.321891)
````
And this is the measurement result when executed in a single stack environment.
```
user system total real
Domain name 0.007062 0.019276 0.026338 ( 1.905775)
IP Address 0.004527 0.012176 0.016703 ( 3.051192)
fast_fallback: false 0.005546 0.019426 0.024972 ( 1.775798)
```
The following is the result of the run on Ruby 3.3.0.
(on Dual stack environment)
```
user system total real
Ruby 3.3.0 0.007271 0.027410 0.034681 ( 0.472510)
```
(on Single stack environment)
```
user system total real
Ruby 3.3.0 0.005353 0.018898 0.024251 ( 1.774535)
```
* Do not cache `Socket.ip_address_list`
As mentioned in the comment at https://github.com/ruby/ruby/pull/9374#discussion_r1482269186, caching Socket.ip_address_list does not follow changes in network configuration.
But if we stop caching, it becomes necessary to check every time `Socket.tcp` is called whether it's a single stack or not, which could further degrade performance in the case of a dual stack.
From this, I've changed the approach so that when a domain name is passed, it doesn't check whether it's a single stack or not and resolves names in parallel each time.
The performance measurement results are as follows.
require 'socket'
require 'benchmark'
HOSTNAME = "www.ruby-lang.org"
PORT = 80
ai = Addrinfo.tcp(HOSTNAME, PORT)
Benchmark.bmbm do |x|
x.report("Domain name") do
30.times { Socket.tcp(HOSTNAME, PORT).close }
end
x.report("IP Address") do
30.times { Socket.tcp(ai.ip_address, PORT).close }
end
x.report("fast_fallback: false") do
30.times { Socket.tcp(HOSTNAME, PORT, fast_fallback: false).close }
end
end
user system total real
Domain name 0.004085 0.011873 0.015958 ( 0.330097)
IP Address 0.000993 0.004400 0.005393 ( 0.257286)
fast_fallback: false 0.001348 0.008266 0.009614 ( 0.298626)
* Wait forever if fallback addresses are unresolved, unless resolv_timeout
Changed from waiting only 3 seconds for name resolution when there is no fallback address available, to waiting as long as there is no resolv_timeout.
This is in accordance with the current `Socket.tcp` specification.
* Use exact pattern to match IPv6 address format for specify address family
This commit changes how stack extents are calculated for both the main
thread and other threads. Ruby uses the address of a local variable as
part of the calculation for machine stack extents:
* pthreads uses it as a lower-bound on the start of the stack, because
glibc (and maybe other libcs) can store its own data on the stack
before calling into user code on thread creation.
* win32 uses it as an argument to VirtualQuery, which gets the extent of
the memory mapping which contains the variable
However, the local being used for this is actually too low (too close to
the leaf function call) in both the main thread case and the new thread
case.
In the main thread case, we have the `INIT_STACK` macro, which is used
for pthreads to set the `native_main_thread->stack_start` value. This
value is correctly captured at the very top level of the program (in
main.c). However, this is _not_ what's used to set the execution context
machine stack (`th->ec->machine_stack.stack_start`); that gets set as
part of a call to `ruby_thread_init_stack` in `Init_BareVM`, using the
address of a local variable allocated _inside_ `Init_BareVM`. This is
too low; we need to use a local allocated closer to the top of the
program.
In the new thread case, the lolcal is allocated inside
`native_thread_init_stack`, which is, again, too low.
In both cases, this means that we might have VALUEs lying outside the
bounds of `th->ec->machine.stack_{start,end}`, which won't be marked
correctly by the GC machinery.
To fix this,
* In the main thread case: We already have `INIT_STACK` at the right
level, so just pass that local var to `ruby_thread_init_stack`.
* In the new thread case: Allocate the local one level above the call to
`native_thread_init_stack` in `call_thread_start_func2`.
[Bug #20001]
fix
The implementation of `native_thread_init_stack` for the various
threading models can use the address of a local variable as part of the
calculation of the machine stack extents:
* pthreads uses it as a lower-bound on the start of the stack, because
glibc (and maybe other libcs) can store its own data on the stack
before calling into user code on thread creation.
* win32 uses it as an argument to VirtualQuery, which gets the extent of
the memory mapping which contains the variable
However, the local being used for this is actually allocated _inside_
the `native_thread_init_stack` frame; that means the caller might
allocate a VALUE on the stack that actually lies outside the bounds
stored in machine.stack_{start,end}.
A local variable from one level above the topmost frame that stores
VALUEs on the stack must be drilled down into the call to
`native_thread_init_stack` to be used in the calculation. This probably
doesn't _really_ matter for the win32 case (they'll be in the same
memory mapping so VirtualQuery should return the same thing), but
definitely could matter for the pthreads case.
[Bug #20001]