The documentation for `rb_enc_interned_str_cstr` notes that `enc` can be
a null pointer, but this currently causes a segmentation fault when
trying to autoload the encoding. This commit fixes the issue by checking
for NULL before calling `rb_enc_autoload`.
ASAN greatly increases the memory footprint of Ruby, so these static
thresholds are not appropriate. There's no real need to run these tests
under ASAN.
[Bug #20274]
Found when compiling ruby for windows-arm64 using msys2
Missing return type for function Init_lock_native_thread
lock_native_thread.c:45:1: error: type specifier missing, defaults to 'int'; ISO C99 and later do not support implicit int [-Wimplicit-int]
45 | Init_lock_native_thread(void)
| ^
| int
integers
(https://github.com/ruby/strscan/pull/89)
This commit adds `scan_byte` and `peek_byte`. `scan_byte` will scan the
current byte, return it as an integer, and advance the cursor.
`peek_byte` will return the current byte as an integer without advancing
the cursor.
Currently `StringScanner#get_byte` returns a string, but I want to get
the current byte without allocating a string. I think this will help
with writing high performance lexers.
---------
https://github.com/ruby/strscan/commit/873aba2e5d
Co-authored-by: Sutou Kouhei <kou@clear-code.com>
* Introduction of Happy Eyeballs Version 2 (RFC8305) in Socket.tcp
This is an implementation of Happy Eyeballs version 2 (RFC 8305) in Socket.tcp.
[Background]
Currently, `Socket.tcp` synchronously resolves names and makes connection attempts with `Addrinfo::foreach.`
This implementation has the following two problems.
1. In name resolution, the program stops until the DNS server responds to all DNS queries.
2. In a connection attempt, while an IP address is trying to connect to the destination host and is taking time, the program stops, and other resolved IP addresses cannot try to connect.
[Proposal]
"Happy Eyeballs" ([RFC 8305](https://datatracker.ietf.org/doc/html/rfc8305)) is an algorithm to solve this kind of problem. It avoids delays to the user whenever possible and also uses IPv6 preferentially.
I implemented it into `Socket.tcp` by using `Addrinfo.getaddrinfo` in each thread spawned per address family to resolve the hostname asynchronously, and using `Socket::connect_nonblock` to try to connect with multiple addrinfo in parallel.
[Outcome]
This change eliminates a fatal defect in the following cases.
Case 1. One of the A or AAAA DNS queries does not return
---
require 'socket'
class Addrinfo
class << self
# Current Socket.tcp depends on foreach
def foreach(nodename, service, family=nil, socktype=nil, protocol=nil, flags=nil, timeout: nil, &block)
getaddrinfo(nodename, service, Socket::AF_INET6, socktype, protocol, flags, timeout: timeout)
.concat(getaddrinfo(nodename, service, Socket::AF_INET, socktype, protocol, flags, timeout: timeout))
.each(&block)
end
def getaddrinfo(_, _, family, *_)
case family
when Socket::AF_INET6 then sleep
when Socket::AF_INET then [Addrinfo.tcp("127.0.0.1", 4567)]
end
end
end
end
Socket.tcp("localhost", 4567)
---
Because the current `Socket.tcp` cannot resolve IPv6 names, the program stops in this case. It cannot start to connect with IPv4 address.
Though `Socket.tcp` with HEv2 can promptly start a connection attempt with IPv4 address in this case.
Case 2. Server does not promptly return ack for syn of either IPv4 / IPv6 address family
---
require 'socket'
fork do
socket = Socket.new(Socket::AF_INET6, :STREAM)
socket.setsockopt(:SOCKET, :REUSEADDR, true)
socket.bind(Socket.pack_sockaddr_in(4567, '::1'))
sleep
socket.listen(1)
connection, _ = socket.accept
connection.close
socket.close
end
fork do
socket = Socket.new(Socket::AF_INET, :STREAM)
socket.setsockopt(:SOCKET, :REUSEADDR, true)
socket.bind(Socket.pack_sockaddr_in(4567, '127.0.0.1'))
socket.listen(1)
connection, _ = socket.accept
connection.close
socket.close
end
Socket.tcp("localhost", 4567)
---
The current `Socket.tcp` tries to connect serially, so when its first name resolves an IPv6 address and initiates a connection to an IPv6 server, this server does not return an ACK, and the program stops.
Though `Socket.tcp` with HEv2 starts to connect sequentially and in parallel so a connection can be established promptly at the socket that attempted to connect to the IPv4 server.
In exchange, the performance of `Socket.tcp` with HEv2 will be degraded.
---
100.times { Socket.tcp("www.ruby-lang.org", 80) }
---
This is due to the addition of the creation of IO objects, Thread objects, etc., and calls to `IO::select` in the implementation.
* Avoid NameError of Socket::EAI_ADDRFAMILY in MinGW
* Support Windows with SO_CONNECT_TIME
* Improve performance
I have additionally implemented the following patterns:
- If the host is single-stack, name resolution is performed in the main thread. This reduces the cost of creating threads.
- If an IP address is specified, name resolution is performed in the main thread. This also reduces the cost of creating threads.
- If only one IP address is resolved, connect is executed in blocking mode. This reduces the cost of calling IO::select.
Also, I have added a fast_fallback option for users who wish not to use HE.
Here are the results of each performance test.
```ruby
require 'socket'
require 'benchmark'
HOSTNAME = "www.ruby-lang.org"
PORT = 80
ai = Addrinfo.tcp(HOSTNAME, PORT)
Benchmark.bmbm do |x|
x.report("Domain name") do
30.times { Socket.tcp(HOSTNAME, PORT).close }
end
x.report("IP Address") do
30.times { Socket.tcp(ai.ip_address, PORT).close }
end
x.report("fast_fallback: false") do
30.times { Socket.tcp(HOSTNAME, PORT, fast_fallback: false).close }
end
end
```
```
user system total real
Domain name 0.015567 0.032511 0.048078 ( 0.325284)
IP Address 0.004458 0.014219 0.018677 ( 0.284361)
fast_fallback: false 0.005869 0.021511 0.027380 ( 0.321891)
````
And this is the measurement result when executed in a single stack environment.
```
user system total real
Domain name 0.007062 0.019276 0.026338 ( 1.905775)
IP Address 0.004527 0.012176 0.016703 ( 3.051192)
fast_fallback: false 0.005546 0.019426 0.024972 ( 1.775798)
```
The following is the result of the run on Ruby 3.3.0.
(on Dual stack environment)
```
user system total real
Ruby 3.3.0 0.007271 0.027410 0.034681 ( 0.472510)
```
(on Single stack environment)
```
user system total real
Ruby 3.3.0 0.005353 0.018898 0.024251 ( 1.774535)
```
* Do not cache `Socket.ip_address_list`
As mentioned in the comment at https://github.com/ruby/ruby/pull/9374#discussion_r1482269186, caching Socket.ip_address_list does not follow changes in network configuration.
But if we stop caching, it becomes necessary to check every time `Socket.tcp` is called whether it's a single stack or not, which could further degrade performance in the case of a dual stack.
From this, I've changed the approach so that when a domain name is passed, it doesn't check whether it's a single stack or not and resolves names in parallel each time.
The performance measurement results are as follows.
require 'socket'
require 'benchmark'
HOSTNAME = "www.ruby-lang.org"
PORT = 80
ai = Addrinfo.tcp(HOSTNAME, PORT)
Benchmark.bmbm do |x|
x.report("Domain name") do
30.times { Socket.tcp(HOSTNAME, PORT).close }
end
x.report("IP Address") do
30.times { Socket.tcp(ai.ip_address, PORT).close }
end
x.report("fast_fallback: false") do
30.times { Socket.tcp(HOSTNAME, PORT, fast_fallback: false).close }
end
end
user system total real
Domain name 0.004085 0.011873 0.015958 ( 0.330097)
IP Address 0.000993 0.004400 0.005393 ( 0.257286)
fast_fallback: false 0.001348 0.008266 0.009614 ( 0.298626)
* Wait forever if fallback addresses are unresolved, unless resolv_timeout
Changed from waiting only 3 seconds for name resolution when there is no fallback address available, to waiting as long as there is no resolv_timeout.
This is in accordance with the current `Socket.tcp` specification.
* Use exact pattern to match IPv6 address format for specify address family
The extension library has each initialization function named "Init_" +
basename. If multiple extensions have the same base name (such as
cgi/escape and erb/escape), the same function will be registered for
both names.
To fix this conflict, rename the initialization functions under sub
directories using using parent names, when statically linking.
This commit changes `struct parser_params` lastline and nextline
from `VALUE` (String object) to `rb_parser_string_t *` so that
dependency on Ruby Object is reduced.
`parser_string_buffer_t string_buffer` is added to `struct parser_params`
to manage `rb_parser_string_t` pointers of each line. All allocated line
strings are freed in `rb_ruby_parser_free`.
In case of EAI_SYSTEM, getaddrinfo is supposed to set more detail in
errno; however, because we call getaddrinfo on a thread now, and errno
is threadlocal, that information is being lost. Instead, we just raise
whatever errno happens to be on the calling thread (which can be
something very confusing, like `ECHILD`).
Fix it by explicitly propagating errno back to the calling thread
through the getaddrinfo_arg structure.
[Bug #20198]
ASAN leaves a pointer to the fake frame on the stack; we can use the
__asan_addr_is_in_fake_stack API to work out the extent of the fake
stack and thus mark any VALUEs contained therein.
[Bug #20001]
(https://github.com/ruby/stringio/pull/77)
Followup of #79
`rb_str_resize()` was changed by b0b9f7201a .
```c
rb_str_resize(string, shorter) // clear ENC_CODERANGE in some case
rb_str_resize(string, longer) // does not clear ENC_CODERANGE anymore
```
```c
// rb_str_resize in string.c
if (slen > len && ENC_CODERANGE(str) != ENC_CODERANGE_7BIT) {
ENC_CODERANGE_CLEAR(str);
}
```
I think this change is based on an assumption that appending null bytes
will not change flag `ascii_only?`.
`strio_extend()` will make the string longer if needed, and update the
flags correctly for appending null bytes.
Before `memmove()`, we need to `rb_str_modify()` because updated flags are not
updated for `memmove()`.
https://github.com/ruby/stringio/commit/b31a538576
Both Red Hat and Debian-like systems configure the minimum TLS version
to be 1.2 by default, but allow users to change this via configs.
On Red Hat and derivatives this happens via crypto-policies[1], which in
writes settings in /etc/crypto-policies/back-ends/opensslcnf.config.
Most notably, it sets TLS.MinProtocol there. For Debian there's
MinProtocol in /etc/ssl/openssl.cnf. Both default to TLSv1.2, which is
considered a secure default.
In constrast, the SSLContext has a hard coded OpenSSL::SSL::TLS1_VERSION
for min_version. TLS 1.0 and 1.1 are considered insecure. By always
setting this in the default parameters, the system wide default can't be
respected, even if a developer wants to.
This takes the approach that's also done for ciphers: it's only set for
OpenSSL < 1.1.0.
[1]: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/security_hardening/using-the-system-wide-cryptographic-policies_security-hardeninghttps://github.com/ruby/openssl/commit/ae215a47ae
ASAN leaves a pointer to the fake frame on the stack; we can use the
__asan_addr_is_in_fake_stack API to work out the extent of the fake
stack and thus mark any VALUEs contained therein.
[Bug #20001]
It looks like `sched_getcpu(3)` returns a strange number on some
(virtual?) environments.
I decided to remove the setaffinity mechanism because the performance
does not appear to degrade on a quick benchmark even if removed.
[Bug #20172]
When opening a file with `File.open`, and then setting the encoding with
`IO#set_encoding`, it still correctly performs CRLF -> LF conversion on
Windows when reading files with a CRLF line ending in them (in text
mode).
However, the file is opened instead with either the `rb_io_fdopen` or
`rb_file_open` APIs from C, the CRLF conversion is _NOT_ set up
correctly; it works if the encoding is not specified, but if
`IO#set_encoding` is called, the conversion stops happening. This seems
to be because the encflags never get ECONV_DEFAULT_NEWLINE_DECORATOR
set in these codepaths.
Concretely, this means that the conversion doesn't happen in the
following circumstances:
* When loading ruby files with require (that calls rb_io_fdopen)
* When parsing ruuby files with RubyVM::AbstractSyntaxTree (that calls
rb_file_open).
This then causes the ErrorHighlight tests to fail on windows if git has
checked them out with CRLF line endings - the error messages it's
testing wind up with literal \r\n sequences in them because the iseq
text from the parser contains un-newline-converted strings.
This commit fixes the problem by copy-pasting the relevant snippet which
sets this up in `rb_io_extract_modeenc` (for the File.open path) into
the relevant codepaths for `rb_io_fdopen` and `rb_file_open`.
[Bug #20101]
The terminator is not actually getting filled in; we're simply passing
(two) bytes of empty memory as the NUL terminator. This can lead to
garbage characters getting written to registry values.
Fix this by explicitly putting a WCHAR_NUL character into the string to
be sent to the registry API, like we do in the MULTI_SZ case.
[Bug #20096]
This avoids pinning an id to the symbol used if a dynamic symbol is
passed in as a hash key.
rb_sym2str is available in Ruby 2.2+ and json depends on >= 2.3.
https://github.com/flori/json/commit/5cbafb8dbe
Object IDs became more expensive in Ruby 2.7. Using `Hash#compare_by_identity` let's us get the same effect, without needing to force all these objects to have object_ids assigned to them.
https://github.com/ruby/psych/commit/df69e4a12e
`:nodoc:` directive does not work at method definition in C, and must
be at the implementation function. That is, there is no way to make
one method visible and another method sharing the implementation
invisible at the same time.